Ruslan's Bloghttps://ruslanspivak.com/2020-03-19T10:00:00-04:00Let’s Build A Simple Interpreter. Part 19: Nested Procedure Calls2020-03-19T10:00:00-04:002020-03-19T10:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2020-03-19:/lsbasi-part19/<p>What I cannot create, I do not understand. —- Richard&nbsp;Feynman</p><blockquote> <p><em><span class="dquo">&#8220;</span>What I cannot create, I do not understand.&#8221; &#8212; Richard&nbsp;Feynman</em></p> </blockquote> <p>As I promised you last time, today we&#8217;re going to expand on the material covered in the previous article and talk about executing nested procedure calls. Just like last time, we will limit our focus today to procedures that can access their parameters and local variables only. We will cover accessing non-local variables in the next&nbsp;article.</p> <p>Here is the sample program for&nbsp;today:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Beta</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="n">b</span> <span class="o">*</span> <span class="mi">2</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="o">;</span> <span class="n">Beta</span><span class="p">(</span><span class="mi">5</span><span class="o">,</span> <span class="mi">10</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ procedure call }</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ procedure call }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>The nesting relationships diagram for the program looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part19/lsbasi_part19_nestingrel.png" width="240"></p> <p>Some things to note about the above&nbsp;program:</p> <ul> <li> <p>it has two procedure declarations, <em>Alpha</em> and <em>Beta</em></p> </li> <li> <p><em>Alpha</em> is declared inside the main program (the <em>global</em>&nbsp;scope)</p> </li> <li> <p>the <em>Beta</em> procedure is declared inside the <em>Alpha</em>&nbsp;procedure</p> </li> <li> <p>both <em>Alpha</em> and <em>Beta</em> have the same names for their formal parameters: integers <em>a</em> and <em>b</em></p> </li> <li> <p>and both <em>Alpha</em> and <em>Beta</em> have the same local variable <em>x</em></p> </li> <li> <p>the program has nested calls: the <em>Beta</em> procedure is called from the <em>Alpha</em> procedure, which, in turn, is called from the main&nbsp;program</p> </li> </ul> <p><br> Now, let&#8217;s do an experiment. Download the <a href="https://github.com/rspivak/lsbasi/blob/master/part19/part19.pas">part19.pas</a> file from <a href="https://github.com/rspivak/lsbasi/blob/master/part19">GitHub</a> and run the <a href="https://github.com/rspivak/lsbasi/blob/master/part18/spi.py">interpreter from the previous article</a> with the <a href="https://github.com/rspivak/lsbasi/blob/master/part19/part19.pas">part19.pas</a> file as its input to see what happens when the interpreter executes nested procedure calls (the main program calling <em>Alpha</em> calling <em>Beta</em>):</p> <div class="highlight"><pre><span></span>$ python spi.py part19.pas --stack ENTER: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main ... ENTER: PROCEDURE Beta CALL STACK <span class="m">2</span>: PROCEDURE Beta a : <span class="m">5</span> b : <span class="m">10</span> <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> x : <span class="m">30</span> <span class="m">1</span>: PROGRAM Main LEAVE: PROCEDURE Beta CALL STACK <span class="m">2</span>: PROCEDURE Beta a : <span class="m">5</span> b : <span class="m">10</span> x : <span class="m">70</span> <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> x : <span class="m">30</span> <span class="m">1</span>: PROGRAM Main ... LEAVE: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main </pre></div> <p><br> It just works! There are no errors. And if you study the contents of the ARs(activation records), you can see that the values stored in the activation records for the <em>Alpha</em> and <em>Beta</em> procedure calls are correct. So, what&#8217;s the catch then? There is one small issue. If you take a look at the output where it says &#8216;<span class="caps">ENTER</span>: <span class="caps">PROCEDURE</span> Beta&#8217;, you can see that the nesting level for the <em>Beta</em> and <em>Alpha</em> procedure call is the same, it&#8217;s 2 (two). The nesting level for <em>Alpha</em> should be 2 and the nesting level for <em>Beta</em> should be 3. That&#8217;s the issue that we need to fix. Right now the <em>nesting_level</em> value in the <em>visit_ProcedureCall</em> method is hardcoded to be 2&nbsp;(two):</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="o">...</span> </pre></div> <p><br> Let&#8217;s get rid of the hardcoded value. How do we determine a nesting level for a procedure call? In the method above we have a procedure symbol and it is stored in a scoped symbol table that has the right scope level that we can use as the value of the <em>nesting_level</em> parameter (see <a href="https://ruslanspivak.com/lsbasi-part14/">Part 14</a> for more details about scopes and scope&nbsp;levels).</p> <p>How do we get to the scoped symbol table&#8217;s scope level through the procedure&nbsp;symbol?</p> <p>Let&#8217;s look at the following parts of the <em>ScopedSymbolTable</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ScopedSymbolTable</span><span class="p">:</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">scope_name</span><span class="p">,</span> <span class="n">scope_level</span><span class="p">,</span> <span class="n">enclosing_scope</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="o">...</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">=</span> <span class="n">scope_level</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;Insert: {symbol.name}&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> </pre></div> <p>Looking at the code above, we can see that we could assign the scope level of a scoped symbol table to a symbol when we store the symbol in the scoped symbol table (scope) inside the <em>insert</em> method. This way we will have access to the procedure symbol&#8217;s scope level in the <em>visit_Procedure</em> method during the interpretation phase. And that&#8217;s exactly what we&nbsp;need.</p> <p>Let&#8217;s make the necessary&nbsp;changes:</p> <ul> <li> <p>First, let&#8217;s add a <em>scope_level</em> member to the <em>Symbol</em> class and give it a default value of&nbsp;zero:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Symbol</span><span class="p">:</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="o">...</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">=</span> <span class="mi">0</span> </pre></div> </li> <li> <p>Next, let&#8217;s assign the corresponding scope level to a symbol when storing the symbol in a scoped symbol&nbsp;table:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ScopedSymbolTable</span><span class="p">:</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;Insert: {symbol.name}&#39;</span><span class="p">)</span> <span class="n">symbol</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> </pre></div> </li> </ul> <p><br> Now, when creating an <span class="caps">AR</span> for a procedure call in the <em>visit_ProcedureCall</em> method, we have access to the scope level of the procedure symbol. All that&#8217;s left to do is use the scope level of the procedure symbol as the value of the <em>nesting_level</em>&nbsp;parameter:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="n">proc_symbol</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p>That&#8217;s great, no more hardcoded nesting levels. One thing worth mentioning is why we put <em>proc_symbol.scope_level + 1</em> as the value of the <em>nesting_level</em> parameter and not just <em>proc_symbol.scope_level.</em> In <a href="https://ruslanspivak.com/lsbasi-part17/">Part 17</a>, I mentioned that the nesting level of an <span class="caps">AR</span> corresponds to the scope level of the respective procedure or function declaration plus one. Let&#8217;s see&nbsp;why.</p> <p>In our sample program for today, the <em>Alpha</em> procedure symbol - the symbol that contains information about the <em>Alpha</em> procedure declaration - is stored in the <em>global</em> scope at level 1 (one). So 1 is the value of the <em>Alpha</em> procedure symbol&#8217;s <em>scope_level</em>. But as we know from <a href="https://ruslanspivak.com/lsbasi-part14/">Part14</a>, the scope level of the procedure declaration <em>Alpha</em> is one less than the level of the variables declared inside the procedure <em>Alpha</em>. So, to get the scope level of the scope where the <em>Alpha</em> procedure&#8217;s parameters and local variables are stored, we need to increment the procedure symbol&#8217;s scope level by 1. That&#8217;s the reason we use <em>proc_symbol.scope_level + 1</em> as the value of the <em>nesting_level</em> parameter when creating an <span class="caps">AR</span> for a procedure call and not simply <em>proc_symbol.scope_level</em>.</p> <p>Let&#8217;s see the changes we&#8217;ve made so far in action. Download <a href="https://github.com/rspivak/lsbasi/blob/master/part19">the updated interpreter</a> and test it again with the <a href="https://github.com/rspivak/lsbasi/blob/master/part19">part19.pas</a> file as its&nbsp;input:</p> <div class="highlight"><pre><span></span>$ python spi.py part19.pas --stack ENTER: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main ENTER: PROCEDURE Alpha CALL STACK <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> <span class="m">1</span>: PROGRAM Main ENTER: PROCEDURE Beta CALL STACK <span class="m">3</span>: PROCEDURE Beta a : <span class="m">5</span> b : <span class="m">10</span> <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> x : <span class="m">30</span> <span class="m">1</span>: PROGRAM Main LEAVE: PROCEDURE Beta CALL STACK <span class="m">3</span>: PROCEDURE Beta a : <span class="m">5</span> b : <span class="m">10</span> x : <span class="m">70</span> <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> x : <span class="m">30</span> <span class="m">1</span>: PROGRAM Main LEAVE: PROCEDURE Alpha CALL STACK <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> x : <span class="m">30</span> <span class="m">1</span>: PROGRAM Main LEAVE: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main </pre></div> <p>As you can see from the output above, the nesting levels of the activation records (<span class="caps">AR</span>) now have the correct&nbsp;values:</p> <ul> <li> <p>The <em>Main</em> program <span class="caps">AR</span>: nesting level&nbsp;1</p> </li> <li> <p>The <em>Alpha</em> procedure <span class="caps">AR</span>: nesting level&nbsp;2</p> </li> <li> <p>The <em>Beta</em> procedure <span class="caps">AR</span>: nesting level&nbsp;3</p> </li> </ul> <p><br> Let&#8217;s take a look at how the scope tree (scoped symbol tables) and the call stack look visually during the execution of the program. Here is how the call stack looks right after the message &#8220;<span class="caps">LEAVE</span>: <span class="caps">PROCEDURE</span> Beta&#8221; and before the <span class="caps">AR</span> for the <em>Beta</em> procedure call is popped off the&nbsp;stack:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part19/lsbasi_part19_callstack.png" width="640"></p> <p>And if we flip the call stack (so that the top of the stack is at the &#8220;bottom&#8221;), you can see how the call stack with activation records relates to the scope tree with scopes (scoped symbol tables). In fact, we can say that activation records are run-time equivalents of scopes. Scopes are created during semantic analysis of a source program (the source program is read, parsed, and analyzed at this stage, but not executed) and the call stack with activation records is created at run-time when the interpreter executes the source&nbsp;program:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part19/lsbasi_part19_scopes_callstack.png" width="640"></p> <p>As you&#8217;ve seen in this article, we haven&#8217;t made a lot of changes to support the execution of nested procedure calls. The only real change was to make sure the nesting level in ARs was correct. The rest of the codebase stayed the same. The main reason why our code continues to work pretty much unchanged with nested procedure calls is because the <em>Alpha</em> and <em>Beta</em> procedures in the sample program access the values of local variables only (including their own parameters). And because those values are stored in the <span class="caps">AR</span> at the top of the stack, this allows us to continue to use the methods <em>visit_Assignment</em> and <em>visit_Var</em> without any change, when executing the body of the procedures. Here is the source code of the methods&nbsp;again:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="n">ar</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="n">ar</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">var_value</span> <span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">ar</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="n">var_value</span> <span class="o">=</span> <span class="n">ar</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">return</span> <span class="n">var_value</span> </pre></div> <p><br> Okay, today we&#8217;ve been able to successfully execute nested procedure calls with our interpreter with very few changes. And now we&#8217;re one step closer to properly executing recursive procedure&nbsp;calls.</p> <p>That&#8217;s it for today. In the next article, we&#8217;ll talk about how procedures can access non-local variables during&nbsp;run-time.</p> <p><br/> <strong>Stay safe, stay healthy, and take care of each other! See you next&nbsp;time.</strong></p> <p><br/> <em>Resources used in preparation for this article (links are affiliate&nbsp;links):</em></p> <ol> <li><a target="_blank" href="https://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=5d5ca8c07bff5452ea443d8319e7703d">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=542d1267e34a529e0f69027af20e27f3">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0124104096/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0124104096&linkCode=as2&tag=russblo0b-20&linkId=8db1da254b12fe6da1379957dda717fc">Programming Language Pragmatics, Fourth Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0124104096" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> </ol> <p></p> <p></p>EOF is not a character2020-03-01T11:53:00-05:002020-03-01T11:53:00-05:00Ruslan Spivaktag:ruslanspivak.com,2020-03-01:/eofnotchar/<p>I was reading Computer Systems: A Programmer&#8217;s Perspective the other day and in the chapter on Unix I/O the authors mention that there is no explicit &#8220;<span class="caps">EOF</span> character&#8221; at the end of a&nbsp;file.</p><p><strong>Update Mar 14, 2020</strong>: I&#8217;m working on an update to the article based on all the feedback I&#8217;ve received so far. Stay tuned! <br/> <br/></p> <p>I was reading <em>Computer Systems: A Programmer&#8217;s Perspective</em> the other day and in the chapter on Unix I/O the authors mention that <strong><em>there is no explicit &#8220;<span class="caps">EOF</span> character&#8221; at the end of a file</em></strong>.</p> <p><img alt="" src="https://ruslanspivak.com/eofnotchar/eofnotchar_notachar.png" width="640"></p> <p>If you&#8217;ve spent some time reading and/or playing with Unix I/O and have written some C programs that read text files and run on Unix/Linux, that statement is probably obvious. But let&#8217;s take a closer look at the following two points related to the statement in the&nbsp;book:</p> <ol> <li><span class="caps">EOF</span> is not a&nbsp;character</li> <li><span class="caps">EOF</span> is not a character you find at the end of a&nbsp;file</li> </ol> <p><br/> 1. Why would anyone say or think that <span class="caps">EOF</span> is a character? I think it may be because in some C programs you can find code that explicitly checks for <span class="caps">EOF</span> using <em>getchar()</em> and <em>getc()</em>&nbsp;routines:</p> <div class="highlight"><pre><span></span> <span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span> <span class="p">...</span> <span class="k">while</span> <span class="p">((</span><span class="n">c</span> <span class="o">=</span> <span class="n">getchar</span><span class="p">())</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">)</span> <span class="n">putchar</span><span class="p">(</span><span class="n">c</span><span class="p">);</span> <span class="n">OR</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">;</span> <span class="kt">int</span> <span class="n">c</span><span class="p">;</span> <span class="p">...</span> <span class="k">while</span> <span class="p">((</span><span class="n">c</span> <span class="o">=</span> <span class="n">getc</span><span class="p">(</span><span class="n">fp</span><span class="p">))</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">)</span> <span class="n">putc</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span> </pre></div> <p>And if you check the <em>man</em> page for <em>getchar()</em> or <em>getc()</em>, you&#8217;ll read that both routines get the next character from the input stream. So that could be what leads to a confusion about the nature of <span class="caps">EOF</span>, but that&#8217;s just me speculating. Let&#8217;s get back to the point that <span class="caps">EOF</span> is not a&nbsp;character.</p> <p>What is a character anyway? A <em>character</em> is the smallest component of a text. &#8216;A&#8217;, &#8216;a&#8217;, &#8216;B&#8217;, &#8216;b&#8217; are all different characters. A character has a numeric value that is called a <a href="https://docs.python.org/3/howto/unicode.html"><em>code point</em> </a>in the Unicode standard. For example, the English character &#8216;A&#8217; has a numeric value of 65 in decimal. You can check this quickly in a Python&nbsp;shell:</p> <div class="highlight"><pre><span></span>$python &gt;&gt;&gt; ord(&#39;A&#39;) 65 &gt;&gt;&gt; chr(65) &#39;A&#39; </pre></div> <p><br/> Or you could look it up in the <span class="caps">ASCII</span> table on your Unix/Linux&nbsp;box:</p> <div class="highlight"><pre><span></span>$ man ascii </pre></div> <p><img alt="" src="https://ruslanspivak.com/eofnotchar/eofnotchar_asciitable.png" width="640"></p> <p><br/></p> <p>Let&#8217;s check the value of <span class="caps">EOF</span> by writing a little C program. In <span class="caps">ANSI</span> C, <span class="caps">EOF</span> is defined in <em>&lt;stdio.h></em> as part of the standard library. Its value is usually -1. Save the following code in file <em>printeof.c</em>, compile it, and run&nbsp;it:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span> <span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span> <span class="n">printf</span><span class="p">(</span><span class="s">&quot;EOF value on my system: %d</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">EOF</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ gcc -o printeof printeof.c $ ./printeof EOF value on my system: -1 </pre></div> <p>Okay, so on my system the value is -1 (I tested it both on Mac <span class="caps">OS</span> and Ubuntu Linux). Is there a character with a numerical value of -1? Again, you could check the available numeric values in the <span class="caps">ASCII</span> table or check the official Unicode page to find the legitimate range of numeric values for representing characters. But let&#8217;s fire up a Python shell and use the built-in <em>chr()</em> function to return a character for&nbsp;-1:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; chr<span class="o">(</span>-1<span class="o">)</span> Traceback <span class="o">(</span>most recent call last<span class="o">)</span>: File <span class="s2">&quot;&lt;stdin&gt;&quot;</span>, line <span class="m">1</span>, in &lt;module&gt; ValueError: chr<span class="o">()</span> arg not in range<span class="o">(</span>0x110000<span class="o">)</span> </pre></div> <p>As expected, there is no character with a numeric value of -1. Okay, so <span class="caps">EOF</span> (as seen in C programs) is not a&nbsp;character.</p> <p>Onto the second&nbsp;point.</p> <p><br/> 2. Is <span class="caps">EOF</span> a character that you can find at the end of a file? I think at this point you already know the answer, but let&#8217;s double check our&nbsp;assumption.</p> <p>Let&#8217;s take a simple text file <a href="https://github.com/rspivak/2x25/blob/master/eofnotchar/helloworld.txt">helloworld.txt</a> and get a hexdump of the contents of the file. We can use <em>xxd</em> for&nbsp;that:</p> <div class="highlight"><pre><span></span>$ cat helloworld.txt Hello world! $ xxd helloworld.txt <span class="m">00000000</span>: <span class="m">4865</span> 6c6c 6f20 776f 726c <span class="m">6421</span> 0a Hello world!. </pre></div> <p>As you can see, the last character at the end of the file is the hex <em>0a</em>. You can find in the <span class="caps">ASCII</span> table that <em>0a</em> represents <em>nl,</em> the newline character. Or you can check it in a Python&nbsp;shell:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; chr<span class="o">(</span>0x0a<span class="o">)</span> <span class="s1">&#39;\n&#39;</span> </pre></div> <p><br/> Okay. If <span class="caps">EOF</span> is not a character and it&#8217;s not a character that you find at the end of a file, what is it&nbsp;then?</p> <p><strong><em><span class="caps">EOF</span> (end-of-file)</em></strong> is a condition provided by the kernel that can be detected by an&nbsp;application.</p> <p>Let&#8217;s see how we can detect the <span class="caps">EOF</span> condition in various programming languages when reading a text file using high-level I/O routines provided by the languages. For this purpose, we&#8217;ll write a very simple <a href="https://en.wikipedia.org/wiki/Cat_(Unix)"><em>cat</em></a> version called <em>mcat</em> that reads an <span class="caps">ASCII</span>-encoded text file byte by byte (character by character) and explicitly checks for <span class="caps">EOF</span>. Let&#8217;s write our <em>cat</em> version in the following programming&nbsp;languages:</p> <ul> <li><span class="caps">ANSI</span>&nbsp;C</li> <li>Python</li> <li>Go</li> <li>JavaScript&nbsp;(node.js)</li> </ul> <p>You can find source code for all of the examples in this article on <a href="https://github.com/rspivak/2x25/tree/master/eofnotchar">GitHub</a>. Okay, let&#8217;s get started with the venerable C programming&nbsp;language.</p> <ol> <li> <p><span class="caps">ANSI</span> C (a modified <em>cat</em> version from <em>The C Programming Language</em>&nbsp;book)</p> <div class="highlight"><pre><span></span><span class="cm">/* mcat.c */</span> <span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp"></span> <span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span> <span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">;</span> <span class="kt">int</span> <span class="n">c</span><span class="p">;</span> <span class="k">if</span> <span class="p">((</span><span class="n">fp</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="o">*++</span><span class="n">argv</span><span class="p">,</span> <span class="s">&quot;r&quot;</span><span class="p">))</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="n">printf</span><span class="p">(</span><span class="s">&quot;mcat: can&#39;t open %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="o">*</span><span class="n">argv</span><span class="p">);</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="k">while</span> <span class="p">((</span><span class="n">c</span> <span class="o">=</span> <span class="n">getc</span><span class="p">(</span><span class="n">fp</span><span class="p">))</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">)</span> <span class="n">putc</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span> <span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Compile</p> <div class="highlight"><pre><span></span>$ gcc -o mcat mcat.c </pre></div> <p>Run</p> <div class="highlight"><pre><span></span>$ ./mcat helloworld.txt Hello world! </pre></div> <p><br/> Quick explanation of the code&nbsp;above:</p> <ul> <li>The program opens a file passed as a command line&nbsp;argument</li> <li>The <em>while</em> loop copies data from the file to the standard output one byte at a time until it reaches the end of the&nbsp;file.</li> <li>On reaching <span class="caps">EOF</span>, the program closes the file and&nbsp;terminates</li> </ul> </li> <li> <p>Python&nbsp;3</p> <p>Python doesn&#8217;t have a mechanism to explicitly check for <span class="caps">EOF</span> like in <span class="caps">ANSI</span> C, but if you read a text file one character at a time, you can determine the <em>end-of-file</em> condition by checking if the character read is&nbsp;empty:</p> <div class="highlight"><pre><span></span><span class="c1"># mcat.py</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">c</span> <span class="o">=</span> <span class="n">fin</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># read max 1 char</span> <span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="s1">&#39;&#39;</span><span class="p">:</span> <span class="c1"># EOF</span> <span class="k">break</span> <span class="k">print</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">&#39;&#39;</span><span class="p">)</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ python mcat.py helloworld.txt Hello world! </pre></div> <p>Python 3.8+ (a shorter version of the above using <a href="https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions">the walrus operator</a>):</p> <div class="highlight"><pre><span></span><span class="c1"># mcat38.py</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span> <span class="k">while</span> <span class="p">(</span><span class="n">c</span> <span class="p">:</span><span class="o">=</span> <span class="n">fin</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span> <span class="o">!=</span> <span class="s1">&#39;&#39;</span><span class="p">:</span> <span class="c1"># read max 1 char at a time until EOF</span> <span class="k">print</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">&#39;&#39;</span><span class="p">)</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ python3.8 mcat38.py helloworld.txt Hello world! </pre></div> </li> <li> <p>Go</p> <p>In Go we can explicitly check if the error returned by <a href="https://tour.golang.org/methods/21">Read()</a> is <span class="caps">EOF</span>.</p> <div class="highlight"><pre><span></span><span class="o">//</span> <span class="n">mcat</span><span class="o">.</span><span class="n">go</span> <span class="n">package</span> <span class="n">main</span> <span class="kn">import</span> <span class="p">(</span> <span class="s2">&quot;fmt&quot;</span> <span class="s2">&quot;os&quot;</span> <span class="s2">&quot;io&quot;</span> <span class="p">)</span> <span class="n">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span> <span class="nb">file</span><span class="p">,</span> <span class="n">err</span> <span class="p">:</span><span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">Open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="n">nil</span> <span class="p">{</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Fprintf</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Stderr</span><span class="p">,</span> <span class="s2">&quot;mcat: %v</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="p">}</span> <span class="nb">buffer</span> <span class="p">:</span><span class="o">=</span> <span class="n">make</span><span class="p">([]</span><span class="n">byte</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">//</span> <span class="mi">1</span><span class="o">-</span><span class="n">byte</span> <span class="nb">buffer</span> <span class="k">for</span> <span class="p">{</span> <span class="n">bytesread</span><span class="p">,</span> <span class="n">err</span> <span class="p">:</span><span class="o">=</span> <span class="nb">file</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="nb">buffer</span><span class="p">)</span> <span class="k">if</span> <span class="n">err</span> <span class="o">==</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span> <span class="p">{</span> <span class="k">break</span> <span class="p">}</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Print</span><span class="p">(</span><span class="n">string</span><span class="p">(</span><span class="nb">buffer</span><span class="p">[:</span><span class="n">bytesread</span><span class="p">]))</span> <span class="p">}</span> <span class="nb">file</span><span class="o">.</span><span class="n">Close</span><span class="p">()</span> <span class="p">}</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ go run mcat.go helloworld.txt Hello world! </pre></div> </li> <li> <p>JavaScript&nbsp;(node.js)</p> <p>There is no explicit check for <span class="caps">EOF</span>, but the <a href="https://nodejs.org/api/stream.html#stream_event_end"><em>end</em> event</a> on a stream is fired when the end of a file is reached and a <em>read</em> operation tries to read more&nbsp;data.</p> <div class="highlight"><pre><span></span><span class="cm">/* mcat.js */</span> <span class="kr">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="kr">const</span> <span class="nx">process</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;process&#39;</span><span class="p">);</span> <span class="kr">const</span> <span class="nx">fileName</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span> <span class="kd">var</span> <span class="nx">readable</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">createReadStream</span><span class="p">(</span><span class="nx">fileName</span><span class="p">,</span> <span class="p">{</span> <span class="nx">encoding</span><span class="o">:</span> <span class="s1">&#39;utf8&#39;</span><span class="p">,</span> <span class="nx">fd</span><span class="o">:</span> <span class="kc">null</span><span class="p">,</span> <span class="p">});</span> <span class="nx">readable</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;readable&#39;</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">chunk</span><span class="p">;</span> <span class="k">while</span> <span class="p">((</span><span class="nx">chunk</span> <span class="o">=</span> <span class="nx">readable</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span> <span class="o">!==</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span> <span class="nx">process</span><span class="p">.</span><span class="nx">stdout</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span> <span class="cm">/* chunk is one byte */</span> <span class="p">}</span> <span class="p">});</span> <span class="nx">readable</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;end&#39;</span><span class="p">,</span> <span class="p">()</span> <span class="p">=&gt;</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;\nEOF: There will be no more data.&#39;</span><span class="p">);</span> <span class="p">});</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ node mcat.js helloworld.txt Hello world! EOF: There will be no more data. </pre></div> </li> </ol> <p><br/> How do the high-level I/O routines in the examples above determine the <em>end-of-file</em> condition? On Linux systems the routines either directly or indirectly use the <a href="https://en.wikipedia.org/wiki/Read_(system_call)">read()</a> system call provided by the kernel. The <em>getc()</em> function (or macro) in C, for example, uses the <em>read()</em> system call and returns <span class="caps">EOF</span> if <em>read()</em> indicated the <em>end-of-file</em> condition. The <a href="https://en.wikipedia.org/wiki/Read_(system_call)">read()</a> system call returns 0 to indicate the <span class="caps">EOF</span>&nbsp;condition.</p> <p><img alt="" src="https://ruslanspivak.com/eofnotchar/eofnotchar_stdsysio.png" width="400"></p> <p>Let&#8217;s write a <em>cat</em> version called <em>syscat</em> using Unix system calls only, both for fun and potentially some profit. Let&#8217;s do that in C&nbsp;first:</p> <div class="highlight"><pre><span></span><span class="cm">/* syscat.c */</span> <span class="cp">#include</span> <span class="cpf">&lt;sys/types.h&gt;</span><span class="cp"></span> <span class="cp">#include</span> <span class="cpf">&lt;sys/stat.h&gt;</span><span class="cp"></span> <span class="cp">#include</span> <span class="cpf">&lt;fcntl.h&gt;</span><span class="cp"></span> <span class="cp">#include</span> <span class="cpf">&lt;unistd.h&gt;</span><span class="cp"></span> <span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">fd</span><span class="p">;</span> <span class="kt">char</span> <span class="n">c</span><span class="p">;</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">O_RDONLY</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="k">while</span> <span class="p">(</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">write</span><span class="p">(</span><span class="n">STDOUT_FILENO</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ gcc -o syscat syscat.c $ ./syscat helloworld.txt Hello world! </pre></div> <p>In the code above, you can see that we use the fact that the <em>read()</em> function returns 0 to indicate <span class="caps">EOF</span>.</p> <p>And the same in Python&nbsp;3:</p> <div class="highlight"><pre><span></span><span class="c1"># syscat.py</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="kn">import</span> <span class="nn">os</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">os</span><span class="o">.</span><span class="n">O_RDONLY</span><span class="p">)</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">c</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">c</span><span class="p">:</span> <span class="c1"># EOF</span> <span class="k">break</span> <span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span> <span class="n">c</span><span class="p">)</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ python syscat.py helloworld.txt Hello world! </pre></div> <p>And in Python3.8+ using <a href="https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions">the walrus operator</a>:</p> <div class="highlight"><pre><span></span><span class="c1"># syscat38.py</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="kn">import</span> <span class="nn">os</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">os</span><span class="o">.</span><span class="n">O_RDONLY</span><span class="p">)</span> <span class="k">while</span> <span class="n">c</span> <span class="p">:</span><span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span> <span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span> <span class="n">c</span><span class="p">)</span> </pre></div> <p><br/></p> <div class="highlight"><pre><span></span>$ python3.8 syscat38.py helloworld.txt Hello world! </pre></div> <p><br/> Let&#8217;s recap the main points about <span class="caps">EOF</span>&nbsp;again:</p> <ul> <li><span class="caps">EOF</span> is not a&nbsp;character</li> <li><span class="caps">EOF</span> is not a character that you find at the end of a&nbsp;file</li> <li><span class="caps">EOF</span> is a condition provided by the kernel that can be detected by an application <s>when a <em>read</em> operation reaches the end of a file</s></li> </ul> <p><strong>Update Mar 3, 2020</strong> Let&#8217;s recap the main points about <span class="caps">EOF</span> with added details for more&nbsp;clarity:</p> <ul> <li><span class="caps">EOF</span> in <span class="caps">ANSI</span> C is not a character. It&#8217;s a constant defined in <em>&lt;stdio.h></em> and its value is usually&nbsp;-1</li> <li><span class="caps">EOF</span> is not a character in the <span class="caps">ASCII</span> or Unicode character&nbsp;set</li> <li><span class="caps">EOF</span> is not a character that you find at the end of a file on Unix/Linux&nbsp;systems</li> <li>There is no explicit &#8220;<span class="caps">EOF</span> character&#8221; at the end of a file on Unix/Linux&nbsp;systems</li> <li><span class="caps">EOF</span>(end-of-file) is a condition provided by the kernel that can be detected by an application <s>when a <em>read</em> operation reaches the end of a file</s> (if <em>k</em> is the current file position and <em>m</em> is the size of a file, performing a <em>read()</em> when <em>k &gt;= m</em> triggers the&nbsp;condition)</li> </ul> <p><strong>Update Mar 14, 2020</strong>: I&#8217;m working on an update to the article based on all the feedback I&#8217;ve received so far. Stay&nbsp;tuned!</p> <p><br/> Happy learning and have a great&nbsp;day!</p> <p><br/> <em>Resources used in preparation for this article (some links are affiliate&nbsp;links):</em></p> <ol> <li><a target="_blank" href="https://www.amazon.com/gp/product/013409266X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=013409266X&linkCode=as2&tag=russblo0b-20&linkId=ec2bfa5062cddb0c6f86266ba481c625">Computer Systems: A Programmer&#8217;s Perspective (3rd Edition)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=013409266X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0131103628/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0131103628&linkCode=as2&tag=russblo0b-20&linkId=97a792c45446683f7235710c2f8c899d">C Programming Language, 2nd Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0131103628" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/013937681X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=013937681X&linkCode=as2&tag=russblo0b-20&linkId=b8b462e767809ac396966bbb3e79af76">The Unix Programming Environment (Prentice-Hall Software Series)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=013937681X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0321637739/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321637739&linkCode=as2&tag=russblo0b-20&linkId=f9fc233797afcaf2c103f7aac24d717d">Advanced Programming in the <span class="caps">UNIX</span> Environment, 3rd Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0321637739" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0134190440/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0134190440&linkCode=as2&tag=russblo0b-20&linkId=3e0104678e6eb68f11fb29e4cda46bd1">Go Programming Language, The (Addison-Wesley Professional Computing Series)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0134190440" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a href="https://docs.python.org/3/howto/unicode.html">Unicode <span class="caps">HOWTO</span></a></li> <li><a href="https://nodejs.org/api/stream.html">Node.js Stream&nbsp;module</a></li> <li><a href="https://golang.org/pkg/io/">Go io&nbsp;package</a></li> <li><a href="https://en.wikipedia.org/wiki/Cat_(Unix)">cat&nbsp;(Unix)</a></li> <li><a href="https://en.wikipedia.org/wiki/End-of-file">End-of-file</a></li> <li><a href="https://en.wikipedia.org/wiki/End-of-Transmission_character">End-of-Transmission&nbsp;character</a></li> </ol> <p></p>Let’s Build A Simple Interpreter. Part 18: Executing Procedure Calls2020-02-20T08:00:00-05:002020-02-20T08:00:00-05:00Ruslan Spivaktag:ruslanspivak.com,2020-02-20:/lsbasi-part18/<p>Do the best you can until you know better. Then when you know better, do better. ― Maya&nbsp;Angelou</p><blockquote> <p><em><span class="dquo">&#8220;</span>Do the best you can until you know better. Then when you know better, do better.&#8221; ― Maya&nbsp;Angelou</em></p> </blockquote> <p>It&#8217;s a huge milestone for us today! Because today we will extend our interpreter to execute procedure calls. If that&#8217;s not exciting, I don&#8217;t know what is.&nbsp;:)</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part18/lsbasi_part18_milestones.png" width="640"></p> <p>Are you ready? Let&#8217;s get to&nbsp;it!</p> <p><br/> Here is the sample program we&#8217;ll focus on in this&nbsp;article:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ procedure call }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>It has one procedure declaration and one procedure call<em>.</em> We will limit our focus today to procedures that can access their parameters and local variables only. We will cover nested procedure calls and accessing non-local variables in the next two&nbsp;articles.</p> <p><br/> Let&#8217;s describe an algorithm that our interpreter needs to implement to be able to execute the <em>Alpha(3 + 5, 7)</em> procedure call in the program&nbsp;above.</p> <p>Here is the algorithm for executing a procedure call, step by&nbsp;step:</p> <ol> <li> <p>Create an activation&nbsp;record</p> </li> <li> <p>Save procedure arguments (actual parameters) in the activation&nbsp;record</p> </li> <li> <p>Push the activation record onto the call&nbsp;stack</p> </li> <li> <p>Execute the body of the&nbsp;procedure</p> </li> <li> <p>Pop the activation record off the&nbsp;stack</p> </li> </ol> <p>Procedure calls in our interpreter are handled by the <em>visit_ProcedureCall</em> method. The method is currently&nbsp;empty:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p>Let&#8217;s go over each step in the algorithm and write code for the <em>visit_ProcedureCall</em> method to execute procedure&nbsp;calls.</p> <p>Let&#8217;s get&nbsp;started!</p> <p><br/> <em>Step 1. Create an activation&nbsp;record</em></p> <p>If you remember from the <a href="https://ruslanspivak.com/lsbasi-part17">previous article</a>, an <em>activation record (<span class="caps">AR</span>)</em> is a dictionary-like object for maintaining information about the currently executing invocation of a procedure or function, and also the program itself. The activation record for a procedure, for example, contains the current values of its formal parameters and the current values of its local variables. So, to store the procedure&#8217;s arguments and local variables, we need to create an <span class="caps">AR</span> first. Recall that the <em>ActivationRecord</em> constructor takes 3 parameters: <em>name</em>, <em>type</em>, and <em>nesting_level</em>. And here&#8217;s what we need to pass to the constructor when creating an <span class="caps">AR</span> for a procedure&nbsp;call:</p> <ul> <li> <p>We need to pass the procedure&#8217;s name as the <em>name</em> parameter to the&nbsp;constructor</p> </li> <li> <p>We also need to specify <span class="caps">PROCEDURE</span> as the <em>type</em> of the <span class="caps">AR</span></p> </li> <li> <p>And we need to pass 2 as the <em>nesting_level</em> for the procedure call because the program&#8217;s nesting level is set to 1 (You can see that in the <em>visit_Program</em> method of the&nbsp;interpreter)</p> </li> </ul> <p>Before we extend the <em>visit_ProcedureCall</em> method to create an activation record for a procedure call, we need to add the <span class="caps">PROCEDURE</span> type to the <em>ARType</em> enumeration. Let&#8217;s do this&nbsp;first:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ARType</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span> <span class="n">PROGRAM</span> <span class="o">=</span> <span class="s1">&#39;PROGRAM&#39;</span> <span class="n">PROCEDURE</span> <span class="o">=</span> <span class="s1">&#39;PROCEDURE&#39;</span> </pre></div> <p>Now, let&#8217;s update the <em>visit_ProcedureCall</em> method to create an activation record with the appropriate arguments that we described earlier in the&nbsp;text:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p>Writing code to create an activation record was easy once we figured out what to pass to the <em>ActivationRecord</em> constructor as&nbsp;arguments.</p> <p><br/> <em>Step 2. Save procedure arguments in the activation&nbsp;record</em></p> <blockquote> <p><span class="caps">ASIDE</span>: <em>Formal parameters</em> are parameters that show up in the declaration of a procedure. <em>Actual parameters</em> (also known as <em>arguments</em>) are different variables and expressions passed to the procedure in a particular procedure&nbsp;call.</p> </blockquote> <p>Here is a list of steps that describes the high-level actions the interpreter needs to take to save procedure arguments in the activation&nbsp;record:</p> <ol type="a"> <li>Get a list of the procedure&#8217;s formal&nbsp;parameters</li> <li>Get a list of the procedure&#8217;s actual parameters&nbsp;(arguments)</li> <li>For each formal parameter, get the corresponding actual parameter and save the pair in the procedure&#8217;s activation record by using the formal parameter&#8217;s name as a key and the actual parameter (argument), after having evaluated it, as the&nbsp;value</li> </ol> <p>If we have the following procedure declaration and procedure&nbsp;call:</p> <div class="highlight"><pre><span></span><span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> </pre></div> <p>Then after the above three steps have been executed, the procedure&#8217;s <span class="caps">AR</span> contents should look like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="mi">2</span><span class="o">:</span> <span class="n">PROCEDURE</span> <span class="n">Alpha</span> <span class="n">a</span> <span class="o">:</span> <span class="mi">8</span> <span class="n">b</span> <span class="o">:</span> <span class="mi">7</span> </pre></div> <p>Here is the code that implements the steps&nbsp;above:</p> <div class="highlight"><pre><span></span><span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">formal_params</span> <span class="o">=</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="k">for</span> <span class="n">param_symbol</span><span class="p">,</span> <span class="n">argument_node</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">formal_params</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">):</span> <span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> </pre></div> <p>Let&#8217;s take a closer look at the steps and the&nbsp;code.</p> <p><br/> a) First, we need to get a list of the procedure&#8217;s formal parameters. Where can we get them from? They are available in the respective procedure symbol created during the semantic analysis phase. To jog your memory, here is the definition of the <em>ProcedureSymbol</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Symbol</span><span class="p">:</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="k">class</span> <span class="nc">ProcedureSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">formal_params</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># a list of VarSymbol objects</span> <span class="bp">self</span><span class="o">.</span><span class="n">formal_params</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">if</span> <span class="n">formal_params</span> <span class="ow">is</span> <span class="bp">None</span> <span class="k">else</span> <span class="n">formal_params</span> </pre></div> <p>And here&#8217;s the contents of the <em>global</em> scope (program level), which shows a string representation of the <em>Alpha</em> procedure symbol with its formal&nbsp;parameters:</p> <div class="highlight"><pre><span></span><span class="gh">SCOPE (SCOPED SYMBOL TABLE)</span> <span class="gh">===========================</span> Scope name : global Scope level : 1 Enclosing scope: None <span class="gh">Scope (Scoped symbol table) contents</span> <span class="gh">------------------------------------</span> INTEGER: &lt;BuiltinTypeSymbol(name=&#39;INTEGER&#39;)&gt; REAL: &lt;BuiltinTypeSymbol(name=&#39;REAL&#39;)&gt; Alpha: &lt;ProcedureSymbol(name=Alpha, parameters=[<span class="nt">&lt;VarSymbol(name=&#39;a&#39;, type=&#39;INTEGER&#39;)&gt;</span>, &lt;VarSymbol(name=&#39;b&#39;, type=&#39;INTEGER&#39;)&gt;])&gt; </pre></div> <p>Okay, we now know where to get the formal parameters from. How do we get to the procedure symbol from the <em>ProcedureCall</em> <span class="caps">AST</span> <em>node</em> variable? Let&#8217;s take a look at the <em>visit_ProcedureCall</em> method code that we&#8217;ve written so&nbsp;far:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p>We can get access to the procedure symbol by adding the following statement to the code&nbsp;above:</p> <div class="highlight"><pre><span></span><span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> </pre></div> <p>But if you look at the definition of the <em>ProcedureCall</em> class from the <a href="https://ruslanspivak.com/lsbasi-part17">previous article</a>, you can see that the class doesn&#8217;t have <em>proc_symbol</em> as a&nbsp;member:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureCall</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proc_name</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc_name</span> <span class="o">=</span> <span class="n">proc_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">actual_params</span> <span class="o">=</span> <span class="n">actual_params</span> <span class="c1"># a list of AST nodes</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> </pre></div> <p>Let&#8217;s fix that and extend the <em>ProcedureCall</em> class to have the <em>proc_symbol</em>&nbsp;field:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureCall</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proc_name</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc_name</span> <span class="o">=</span> <span class="n">proc_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">actual_params</span> <span class="o">=</span> <span class="n">actual_params</span> <span class="c1"># a list of AST nodes</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="c1"># a reference to procedure declaration symbol</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="o">=</span> <span class="bp">None</span> </pre></div> <p>That was easy. Now, where should we set the <em>proc_symbol</em> so that it has the right value (a reference to the respective procedure symbol) for the interpretation phase? As I&#8217;ve mentioned earlier, the procedure symbol gets created during the semantic analysis phase. We can store it in the <em>ProcedureCall</em> <span class="caps">AST</span> node during the node traversal done by the semantic analyzer&#8217;s <em>visit_ProcedureCall</em>&nbsp;method.</p> <p>Here is the original&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SemanticAnalyzer</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">param_node</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">param_node</span><span class="p">)</span> </pre></div> <p>Because we have access to the current scope when traversing the <span class="caps">AST</span> tree in the semantic analyzer, we can look up the procedure symbol by a procedure name and then store the procedure symbol in the <em>proc_symbol</em> variable of the <em>ProcedureCall</em> <span class="caps">AST</span> node. Let&#8217;s do&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SemanticAnalyzer</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">param_node</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">param_node</span><span class="p">)</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">proc_name</span><span class="p">)</span> <span class="c1"># accessed by the interpreter when executing procedure call</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">proc_symbol</span> </pre></div> <p>In the code above, we simply resolve a procedure name to its procedure symbol, which is stored in one of the scoped symbol tables (in our case in the <em>global</em> scope, to be exact), and then assign the procedure symbol to the <em>proc_symbol</em> field of the <em>ProcedureCall</em> <span class="caps">AST</span>&nbsp;node.</p> <p>For our sample program, after the semantic analysis phase and the actions described above, the <span class="caps">AST</span> tree will have a link to the <em>Alpha</em> procedure symbol in the global&nbsp;scope:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part18/lsbasi_part18_astsymbollink.png" width="640"></p> <p>As you can see in the picture above, this setup allows us to get the procedure&#8217;s formal parameters from the interpreter&#8217;s <em>visit_ProcedureCall</em> method - when evaluating a <em>ProcedureCall</em> node - by simply accessing the <em>formal_params</em> field of the <em>proc_symbol</em> variable stored in the <em>ProcedureCall</em> <span class="caps">AST</span>&nbsp;node:</p> <div class="highlight"><pre><span></span><span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="c1"># aka parameters</span> </pre></div> <p><br/> b) After we get the list of formal parameters, we need to get a list of the procedure&#8217;s actual parameters (arguments). Getting the list of arguments is easy because they are readily available from the <em>ProcedureCall</em> <span class="caps">AST</span> <em>node</em>&nbsp;itself:</p> <div class="highlight"><pre><span></span><span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="c1"># aka arguments</span> </pre></div> <p><br/> c) And the last step. For each formal parameter, we need to get the corresponding actual parameter and save the pair in the procedure&#8217;s activation record by using the formal parameter&#8217;s name as the key and the actual parameter (argument), after having evaluated it, as the&nbsp;value</p> <p>Let&#8217;s take a look at the code that does building of the key-value pairs using the Python <a href="https://docs.python.org/3/library/functions.html#zip">zip()</a>&nbsp;function:</p> <div class="highlight"><pre><span></span><span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">formal_params</span> <span class="o">=</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="k">for</span> <span class="n">param_symbol</span><span class="p">,</span> <span class="n">argument_node</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">formal_params</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">):</span> <span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> </pre></div> <p>Once you know how the Python <a href="https://docs.python.org/3/library/functions.html#zip">zip()</a> function works, the <em>for</em> loop above should be easy to understand. Here&#8217;s a Python shell demonstration of the <a href="https://docs.python.org/3/library/functions.html#zip">zip()</a> function in&nbsp;action:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; formal_params = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;] &gt;&gt;&gt; actual_params = [1, 2, 3] &gt;&gt;&gt; &gt;&gt;&gt; zipped = zip(formal_params, actual_params) &gt;&gt;&gt; &gt;&gt;&gt; list(zipped) [(&#39;a&#39;, 1), (&#39;b&#39;, 2), (&#39;c&#39;, 3)] </pre></div> <p>The statement to store the key-value pairs in the activation record is very&nbsp;straightforward:</p> <div class="highlight"><pre><span></span><span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> </pre></div> <p>The key is the name of a formal parameter, and the value is the evaluated value of the argument passed to the procedure&nbsp;call.</p> <p>Here is the interpreter&#8217;s <em>visit_ProcedureCall</em> method with all the modifications we&#8217;ve done so&nbsp;far:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">formal_params</span> <span class="o">=</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="k">for</span> <span class="n">param_symbol</span><span class="p">,</span> <span class="n">argument_node</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">formal_params</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">):</span> <span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> </pre></div> <p><br/> <em>Step 3. Push the activation record onto the call&nbsp;stack</em></p> <p>After we&#8217;ve created the <span class="caps">AR</span> and put all the procedure&#8217;s parameters into the <span class="caps">AR</span>, we need to push the <span class="caps">AR</span> onto the stack. It&#8217;s super easy to do. We need to add just one line of&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> </pre></div> <p>Remember: an <span class="caps">AR</span> of a currently executing procedure is always at the top of the stack. This way the currently executing procedure has easy access to its parameters and local variables. Here is the updated <em>visit_ProcedureCall</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">formal_params</span> <span class="o">=</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="k">for</span> <span class="n">param_symbol</span><span class="p">,</span> <span class="n">argument_node</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">formal_params</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">):</span> <span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> </pre></div> <p><br/> <em>Step 4. Execute the body of the&nbsp;procedure</em></p> <p>Now that everything has been set up, let&#8217;s execute the body of the procedure. The only problem is that neither the <em>ProcedureCall</em> <span class="caps">AST</span> <em>node</em> nor the procedure symbol <em>proc_symbol</em> knows anything about the body of the respective procedure&nbsp;declaration.</p> <p>How do we get access to the body of the procedure declaration during execution of a procedure call? In other words, when traversing the <span class="caps">AST</span> tree and visiting the <em>ProcedureCall</em> <span class="caps">AST</span> node during the interpretation phase, we need to get access to the <em>block_node</em> variable of the corresponding <em>ProcedureDecl</em> node. The <em>block_node</em> variable holds a reference to an <span class="caps">AST</span> sub-tree that represents the body of the procedure. How can we access that variable from the <em>visit_ProcedureCall</em> method of the <em>Interpreter</em> class? Let&#8217;s think about&nbsp;it.</p> <p>We already have access to the procedure symbol that contains information about the procedure declaration, like the procedure&#8217;s formal parameters, so let&#8217;s find a way to store a reference to the <em>block_node</em> in the procedure symbol itself. The right spot to do that is the semantic analyzer&#8217;s <em>visit_ProcedureDecl</em> method. In this method we have access to both the procedure symbol and the procedure&#8217;s body, the <em>block_node</em> field of the <em>ProcedureDecl</em> <span class="caps">AST</span> node that points to the procedure body&#8217;s <span class="caps">AST</span>&nbsp;sub-tree.</p> <p>We have a procedure symbol, and we have a <em>block_node</em>. Let&#8217;s store a pointer to the <em>block_node</em> in the <em>block_ast</em> field of the <em>proc_symbol</em>:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SemanticAnalyzer</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="nf">visit_ProcedureDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">ProcedureSymbol</span><span class="p">(</span><span class="n">proc_name</span><span class="p">)</span> <span class="o">...</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;LEAVE scope: {proc_name}&#39;</span><span class="p">)</span> <span class="c1"># accessed by the interpreter when executing procedure call</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">block_ast</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">block_node</span> </pre></div> <p>And to make it explicit, let&#8217;s also extend the <em>ProcedureSymbol</em> class and add the <em>block_ast</em> field to&nbsp;it:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">formal_params</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="o">...</span> <span class="c1"># a reference to procedure&#39;s body (AST sub-tree)</span> <span class="bp">self</span><span class="o">.</span><span class="n">block_ast</span> <span class="o">=</span> <span class="bp">None</span> </pre></div> <p>In the picture below you can see the extended <em>ProcedureSymbol</em> instance that stores a reference to the corresponding procedure&#8217;s body (a <em>Block</em> node in the <span class="caps">AST</span>):</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part18/lsbasi_part18_symbolastlink.png" width="640"></p> <p>With all the above, executing the body of the procedure in the procedure call becomes as simple as visiting the procedure declaration&#8217;s <em>Block</em> <span class="caps">AST</span> node accessible through the <em>block_ast</em> field of the procedure&#8217;s <em>proc_symbol</em>:</p> <div class="highlight"><pre><span></span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">proc_symbol</span><span class="o">.</span><span class="n">block_ast</span><span class="p">)</span> </pre></div> <p><br/> Here is the fully updated <em>visit_ProcedureCall</em> method of the <em>Interpreter</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">formal_params</span> <span class="o">=</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="k">for</span> <span class="n">param_symbol</span><span class="p">,</span> <span class="n">argument_node</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">formal_params</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">):</span> <span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="c1"># evaluate procedure body</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">proc_symbol</span><span class="o">.</span><span class="n">block_ast</span><span class="p">)</span> </pre></div> <p>If you remember from the <a href="https://ruslanspivak.com/lsbasi-part17">previous article</a>, the <em>visit_Assignment</em> and <em>visit_Var</em> methods use an <span class="caps">AR</span> at the top of the call stack to access and store&nbsp;variables:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="n">ar</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="n">ar</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">var_value</span> <span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">ar</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="n">var_value</span> <span class="o">=</span> <span class="n">ar</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">return</span> <span class="n">var_value</span> </pre></div> <p>These methods stay unchanged. When interpreting the body of a procedure, these methods will store and access values from the <span class="caps">AR</span> of the currently executing procedure, which will be at the top of the stack. We&#8217;ll see shortly how it all fits and works&nbsp;together.</p> <p><br/> <em>Step 5. Pop the activation record off the&nbsp;stack</em></p> <p>After we&#8217;re done evaluating the body of the procedure, we no longer need the procedure&#8217;s <span class="caps">AR</span>, so we pop it off the call stack right before leaving the <em>visit_ProcedureCall</em> method. Remember, the top of the call stack contains an <span class="caps">AR</span> for a currently executing procedure, function, or program, so once we&#8217;re done evaluating one of those routines, we need to pop their respective <span class="caps">AR</span> off the call stack using the call stack&#8217;s <em>pop()</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> </pre></div> <p>Let&#8217;s put it all together and also add some logging to the <em>visit_ProcedureCall</em> method to log the contents of the <em>call stack</em> right after pushing the procedure&#8217;s <span class="caps">AR</span> onto the <em>call stack</em> and right before popping it off the&nbsp;stack:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_symbol</span> <span class="n">formal_params</span> <span class="o">=</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">formal_params</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span> <span class="k">for</span> <span class="n">param_symbol</span><span class="p">,</span> <span class="n">argument_node</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">formal_params</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">):</span> <span class="n">ar</span><span class="p">[</span><span class="n">param_symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">argument_node</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;ENTER: PROCEDURE {proc_name}&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="p">))</span> <span class="c1"># evaluate procedure body</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">proc_symbol</span><span class="o">.</span><span class="n">block_ast</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;LEAVE: PROCEDURE {proc_name}&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> </pre></div> <p><br/> Let&#8217;s take our modified interpreter for a ride and see how it executes procedure calls. Download the following sample program from <a href="https://github.com/rspivak/lsbasi/tree/master/part18">GitHub</a> or save it as <a href="https://github.com/rspivak/lsbasi/blob/master/part18/part18.pas">part18.pas</a>:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ procedure call }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Download the interpreter file <a href="https://github.com/rspivak/lsbasi/blob/master/part18/spi.py">spi.py</a> from <a href="https://github.com/rspivak/lsbasi/tree/master/part18/">GitHub</a> and run it on the command line with the following&nbsp;arguments:</p> <div class="highlight"><pre><span></span>$ python spi.py part18.pas --stack ENTER: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main ENTER: PROCEDURE Alpha CALL STACK <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> <span class="m">1</span>: PROGRAM Main LEAVE: PROCEDURE Alpha CALL STACK <span class="m">2</span>: PROCEDURE Alpha a : <span class="m">8</span> b : <span class="m">7</span> x : <span class="m">30</span> <span class="m">1</span>: PROGRAM Main LEAVE: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main </pre></div> <p>So far, so good. Let&#8217;s take a closer look at the output and inspect the contents of the call stack during program and procedure&nbsp;execution.</p> <p>1. The interpreter first&nbsp;prints</p> <div class="highlight"><pre><span></span><span class="n">ENTER</span><span class="o">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="mi">1</span><span class="o">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> </pre></div> <p>when visiting the <em>Program</em> <span class="caps">AST</span> node before executing the body of the program. At this point the <em>call stack</em> has one <em>activation record.</em> This activation record is at the top of the call stack and it&#8217;s used for storing global variables. Because we don&#8217;t have any global variables in our sample program, there is nothing in the activation&nbsp;record.</p> <p>2. Next, the interpreter&nbsp;prints</p> <div class="highlight"><pre><span></span><span class="n">ENTER</span><span class="o">:</span> <span class="n">PROCEDURE</span> <span class="n">Alpha</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="mi">2</span><span class="o">:</span> <span class="n">PROCEDURE</span> <span class="n">Alpha</span> <span class="n">a</span> <span class="o">:</span> <span class="mi">8</span> <span class="n">b</span> <span class="o">:</span> <span class="mi">7</span> <span class="mi">1</span><span class="o">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> </pre></div> <p>when it visits the <em>ProcedureCall</em> <span class="caps">AST</span> node for the <em>Alpha(3 + 5, 7)</em> procedure call. At this point the body of the <em>Alpha</em> procedure hasn&#8217;t been evaluated yet and the <em>call stack</em> has two activation records: one for the <em>Main</em> program at the bottom of the stack (nesting level 1) and one for the <em>Alpha</em> procedure call, at the top of the stack (nesting level 2). The <span class="caps">AR</span> at the top of the stack holds the values of the procedure arguments <em>a</em> and <em>b</em> only; there is no value for the local variable <em>x</em> in the <span class="caps">AR</span> because the body of the procedure hasn&#8217;t been evaluated&nbsp;yet.</p> <p>3. Up next, the interpreter&nbsp;prints</p> <div class="highlight"><pre><span></span><span class="n">LEAVE</span><span class="o">:</span> <span class="n">PROCEDURE</span> <span class="n">Alpha</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="mi">2</span><span class="o">:</span> <span class="n">PROCEDURE</span> <span class="n">Alpha</span> <span class="n">a</span> <span class="o">:</span> <span class="mi">8</span> <span class="n">b</span> <span class="o">:</span> <span class="mi">7</span> <span class="n">x</span> <span class="o">:</span> <span class="mi">30</span> <span class="mi">1</span><span class="o">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> </pre></div> <p>when it&#8217;s about to leave the <em>ProcedureCall</em> <span class="caps">AST</span> node for the <em>Alpha(3 + 5, 7)</em> procedure call but before popping off the <span class="caps">AR</span> for the <em>Alpha</em>&nbsp;procedure.</p> <p>From the output above, you can see that in addition to the procedure arguments, the <span class="caps">AR</span> for the currently executing procedure <em>Alpha</em> now also contains the result of the assignment to the local variable <em>x</em>, the result of executing the <em>x := (a + b ) * 2;</em> statement in the body of the procedure. At this point the call stack visually looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part18/lsbasi_part18_callstack.png" width="260"></p> <p>4. And finally the interpreter&nbsp;prints</p> <div class="highlight"><pre><span></span><span class="n">LEAVE</span><span class="o">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="mi">1</span><span class="o">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> </pre></div> <p>when it leaves the <em>Program</em> <span class="caps">AST</span> node but before it pops off the <span class="caps">AR</span> for the main program. As you can see, the activation record for the main program is the only <span class="caps">AR</span> left in the stack because the <span class="caps">AR</span> for the <em>Alpha</em> procedure call got popped off the stack earlier, right before finishing executing the <em>Alpha</em> procedure&nbsp;call.</p> <p><br/> That&#8217;s it. Our interpreter successfully executed a procedure call. If you&#8217;ve reached this far,&nbsp;congratulations!</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part18/lsbasi_part18_congrats.png" width="200"></p> <p>It is a huge milestone for us. Now you know how to execute procedure calls. And if you&#8217;ve been waiting for this article for a long time, thank you for your&nbsp;patience.</p> <p>That&#8217;s all for today. In the next article, we&#8217;ll expand on the current material and talk about executing nested procedure calls. So stay tuned and see you next&nbsp;time!</p> <p><br/> <em>Resources used in preparation for this article (links are affiliate&nbsp;links):</em></p> <ol> <li><a target="_blank" href="https://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=5d5ca8c07bff5452ea443d8319e7703d">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=542d1267e34a529e0f69027af20e27f3">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0124104096/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0124104096&linkCode=as2&tag=russblo0b-20&linkId=8db1da254b12fe6da1379957dda717fc">Programming Language Pragmatics, Fourth Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0124104096" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 17: Call Stack and Activation Records2019-08-28T11:00:00-04:002019-08-28T11:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2019-08-28:/lsbasi-part17/<p>You may have to fight a battle more than once to win it. - Margaret&nbsp;Thatcher</p><blockquote> <p><em><span class="dquo">&#8220;</span>You may have to fight a battle more than once to win it.” &#8212; Margaret&nbsp;Thatcher</em></p> </blockquote> <p>In 1968 during the Mexico City Summer Olympics, a marathon runner named John Stephen Akhwari found himself thousands miles away from his home country of Tanzania, in East Africa. While running the marathon at the high altitude of Mexico City he got hit by other athletes jockeying for position and fell to the ground, badly wounding his knee and causing a dislocation. After receiving medical attention, instead of pulling out of the competition after such a bad injury, he stood up and continued the&nbsp;race.</p> <p>Mamo Wolde of Ethiopia, at 2:20:26 into the race, crossed the finish line in first place. More than an hour later at 3:25:27, after the sun had set, Akhwari, hobbling, with a bloody leg and his bandages dangling and flapping in the wind, crossed the finish line, in last&nbsp;place.</p> <p>When a small crowd saw Akhwari crossing the line, they cheered him in disbelief, and the few remaining reporters rushed onto the track to ask him why he continued to run the race with his injuries. His response went down in history: “My country did not send me 5,000 miles to start the race. They sent me 5,000 miles to finish the&nbsp;race.”</p> <p>This story has since inspired many athletes and non-athletes alike. You might be thinking at this point, “That’s great, it’s an inspiring story, but what does it have to do with me?” The main message for you and me is this: “Keep going!” This has been a long series spun over a long period of time and at times it may feel daunting to go along with it, but we’re approaching an important milestone in the series, so we need to keep&nbsp;going.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part17/lsbasi_part17_keepgoing.png" width="480"></p> <p>Okay, let’s get to it! <br/> <br/></p> <p>We have a couple of goals for&nbsp;today:</p> <ol> <li> <p>Implement a new memory system that can support programs, procedure calls, and function&nbsp;calls.</p> </li> <li> <p>Replace the interpreter’s current memory system, represented by the <em>GLOBAL_MEMORY</em> dictionary, with the new memory system. <br/> <br/></p> </li> </ol> <p>Let’s start by answering the following&nbsp;questions:</p> <ol> <li> <p>What is a memory&nbsp;system?</p> </li> <li> <p>Why do we need a new memory&nbsp;system?</p> </li> <li> <p>What does the new memory system look&nbsp;like?</p> </li> <li> <p>Why would we want to replace the <em>GLOBAL_MEMORY</em>&nbsp;dictionary?</p> </li> </ol> <p><br/></p> <p>1. <em>What is a memory&nbsp;system?</em></p> <p>To put it simply, it is a system for storing and accessing data in memory. At the hardware level, it is the physical memory (<span class="caps">RAM</span>) where values are stored at particular physical addresses. At the interpreter level, because our interpreter stores values according to their variable names and not physical addresses, we represent memory with a dictionary that maps names to values. Here is a simple demonstration where we store the value of 7 by the variable name <em>y</em>, and then immediately access the value associated with the name <em>y</em>:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; GLOBAL_MEMORY = {} &gt;&gt;&gt; &gt;&gt;&gt; GLOBAL_MEMORY[&#39;y&#39;] = 7 # store value by name &gt;&gt;&gt; &gt;&gt;&gt; GLOBAL_MEMORY[&#39;y&#39;] # access value by name 7 &gt;&gt;&gt; </pre></div> <p><br/> We’ve been using this dictionary approach to represent global memory for a while now. We’ve been storing and accessing variables at the <span class="caps">PROGRAM</span> level (the global level) using the <em>GLOBAL_MEMORY</em> dictionary. Here are the parts of the interpreter concerned with the “memory” creation, handling assignments of values to variables in memory and accessing values by their&nbsp;names:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tree</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">tree</span> <span class="o">=</span> <span class="n">tree</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_MEMORY</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_MEMORY</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">var_value</span> <span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_MEMORY</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">return</span> <span class="n">var_value</span> </pre></div> <p>Now that we’ve described how we currently represent memory in our interpreter, let’s find out an answer to the next question. <br/> <br/></p> <p>2. <em>Why do we need a new memory system for our&nbsp;interpreter?</em></p> <p>It turns out that having just one dictionary to represent global memory is not enough to support procedure and function calls, including recursive&nbsp;calls.</p> <p>To support nested calls, and a special case of nested calls, recursive calls, we need multiple dictionaries to store information about each procedure and function invocation. And we need those dictionaries organized in a particular way. That’s the reason we need a new memory system. Having this memory system in place is a stepping-stone for executing procedure calls, which we will implement in future articles. <br/> <br/></p> <p>3. <em>What does the new memory system look&nbsp;like?</em></p> <p>At its core, the new memory system is a stack data structure that holds dictionary-like objects as its elements. This stack is called the “<strong><em>call stack</em></strong>” because it’s used to track what procedure/function call is being currently executed. The <em>call stack</em> is also known as the run-time stack, execution stack, program stack, or just “the stack”. The dictionary-like objects that the <em>call stack</em> holds are called <strong><em>activation records</em></strong>. You may know them by another name: “stack frames”, or just&nbsp;“frames”.</p> <p>Let’s go into more detail about the <em>call stack</em> and <em>activation records</em>.</p> <p>What is a <strong><em>stack</em></strong>? A <strong><em>stack</em></strong> is a data structure that is based on a “<em>last-in-first-out</em>” policy (<em><span class="caps">LIFO</span></em>), which means that the most recent item added to the stack is the first one that comes out. It’s like a collection of plates where you put (&#8220;push&#8221;) a plate on the top of the plate stack and, if you need to take a plate, you take one off the top of the plate stack (you &#8220;pop&#8221; the&nbsp;plate):</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part17/lsbasi_part17_stackofplates.png" width="180"></p> <p>Our stack implementation will have the following&nbsp;methods:</p> <p>- <em>push</em> (to push an item onto the&nbsp;stack)</p> <p>- <em>pop</em> (to pop an item off the&nbsp;stack)</p> <p>- <em>peek</em> (to return an item at the top of the stack without removing&nbsp;it)</p> <p><br/> And by our convention our stack will be growing&nbsp;upwards:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part17/lsbasi_part17_stackgrowth.png" width="360"></p> <p>How would we implement a stack in code? A very basic implementation could look like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Stack</span><span class="p">:</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">items</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">def</span> <span class="nf">push</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">item</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">item</span><span class="p">)</span> <span class="k">def</span> <span class="nf">pop</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">items</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> <span class="k">def</span> <span class="nf">peek</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">items</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> </pre></div> <p>That’s pretty much how our call stack implementation will look as well. We’ll change some variable names to reflect the fact that the <em>call stack</em> will store <em>activation records</em> and add a __<em>str__()</em> method to print the contents of the&nbsp;stack:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">CallStack</span><span class="p">:</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_records</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">def</span> <span class="nf">push</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">ar</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_records</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="k">def</span> <span class="nf">pop</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_records</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> <span class="k">def</span> <span class="nf">peek</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_records</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="k">for</span> <span class="n">ar</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_records</span><span class="p">))</span> <span class="n">s</span> <span class="o">=</span> <span class="n">f</span><span class="s1">&#39;CALL STACK</span><span class="se">\n</span><span class="s1">{s}</span><span class="se">\n</span><span class="s1">&#39;</span> <span class="k">return</span> <span class="n">s</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> </pre></div> <p>The __<em>str__()</em> method generates a string representation of the contents of the <em>call stack</em> by iterating over <em>activation records</em> in reverse order and concatenating a string representation of each record to produce the final result. The __<em>str__()</em> method prints the contents in the reverse order so that the standard output shows our stack growing&nbsp;up.</p> <p>Now, what is an <strong><em>activation record</em></strong>? For our purposes, an <em>activation record</em> is a dictionary-like object for maintaining information about the currently executing invocation of a procedure or function, and also the program itself. The activation record for a procedure invocation, for example, will contain the current values of its formal parameters and its local&nbsp;variables.</p> <p>Let’s take a look at how we will represent <em>activation records</em> in&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ARType</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span> <span class="n">PROGRAM</span> <span class="o">=</span> <span class="s1">&#39;PROGRAM&#39;</span> <span class="k">class</span> <span class="nc">ActivationRecord</span><span class="p">:</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">nesting_level</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="bp">self</span><span class="o">.</span><span class="n">nesting_level</span> <span class="o">=</span> <span class="n">nesting_level</span> <span class="bp">self</span><span class="o">.</span><span class="n">members</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">def</span> <span class="fm">__setitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">members</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__getitem__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">members</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="k">def</span> <span class="nf">get</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">members</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">lines</span> <span class="o">=</span> <span class="p">[</span> <span class="s1">&#39;{level}: {type} {name}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">level</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">nesting_level</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="o">.</span><span class="n">value</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="p">)</span> <span class="p">]</span> <span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">members</span><span class="o">.</span><span class="n">items</span><span class="p">():</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39; {name:&lt;20}: {val}&#39;</span><span class="p">)</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> </pre></div> <p>There are a few things worth&nbsp;mentioning:</p> <p>a. The <em>ActivationRecord</em> class constructor takes three&nbsp;parameters:</p> <ul> <li> <p>the <em>name</em> of the activation record (<span class="caps">AR</span> for short); we’ll use a program name as well as a procedure/function name as the name for the corresponding <span class="caps">AR</span></p> </li> <li> <p>the <em>type</em> of the activation record (for example, <span class="caps">PROGRAM</span>); these are defined in a separate enumeration class called <em>ARType (activation record&nbsp;type)</em></p> </li> <li> <p>the <em>nesting_level</em> of the activation record; the nesting level of an <span class="caps">AR</span> corresponds to the scope level of the respective procedure or function declaration plus one; the nesting level will always be set to 1 for programs, which you’ll see&nbsp;shortly</p> </li> </ul> <p>b. The <em>members</em> dictionary represents memory that will be used for keeping information about a particular invocation of a routine. We’ll cover this in more detail in the next&nbsp;article</p> <p>c. The <em>ActivationRecord</em> class implements special <em>__setitem__()</em> and <em>__getitem__()</em> methods to give activation record objects a dictionary-like interface for storing key-value pairs and for accessing values by keys: <em>ar[‘x’] = 7</em> and <em>ar[‘x’]</em></p> <p>d. The <em>get()</em> method is another way to get a value by key, but instead of raising an exception, the method will return <em>None</em> if the key doesn’t exist in the <em>members</em> dictionary&nbsp;yet.</p> <p>e. The <em>__str__()</em> method returns a string representation of the contents of an activation&nbsp;record</p> <p>Let’s see the call stack and activation records in action using a Python&nbsp;shell:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">CallStack</span><span class="p">,</span> <span class="n">ActivationRecord</span><span class="p">,</span> <span class="n">ARType</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">stack</span> <span class="o">=</span> <span class="n">CallStack</span><span class="p">()</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">stack</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;Main&#39;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROGRAM</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">ar</span> <span class="mi">1</span><span class="p">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">ar</span><span class="p">[</span><span class="s1">&#39;y&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="mi">7</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">ar</span> <span class="mi">1</span><span class="p">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> <span class="n">y</span> <span class="p">:</span> <span class="mi">7</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">stack</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">stack</span> <span class="n">CALL</span> <span class="n">STACK</span> <span class="mi">1</span><span class="p">:</span> <span class="n">PROGRAM</span> <span class="n">Main</span> <span class="n">y</span> <span class="p">:</span> <span class="mi">7</span> <span class="o">&gt;&gt;&gt;</span> </pre></div> <p><br/> In the picture below, you can see the description of the contents of the activation record from the interactive session&nbsp;above:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part17/lsbasi_part17_arcontents.png" width="640"></p> <p><em><span class="caps">AR</span>:Main1</em> denotes an activation record for the program named <em>Main</em> at nesting level <em>1</em>.</p> <p>Now that we’ve covered the new memory system, let’s answer the following&nbsp;question.</p> <p><br/> 4. <em>Why would we want to replace the GLOBAL_MEMORY dictionary with the call stack</em>?</p> <p>The reason is to simplify our implementation and to have unified access to global variables defined at the <span class="caps">PROGRAM</span> level as well as to procedure and function parameters and their local&nbsp;variables.</p> <p>In the next article we’ll see how it all fits together, but for now let’s get to the <em>Interpreter</em> class changes where we put the <em>call stack</em> and <em>activation records</em> described earlier to good&nbsp;use.</p> <p><br/> <br/> <em>Here are all the interpreter changes we’re going to make today</em>:</p> <p>1. Replace the <em>GLOBAL_MEMORY</em> dictionary with the <em>call&nbsp;stack</em></p> <p>2. Update the <em>visit_Program</em> method to use the <em>call stack</em> to push and pop an <em>activation record</em> that will hold the values of global&nbsp;variables</p> <p>3. Update the <em>visit_Assign</em> method to store a key-value pair in the activation record at the top of the call&nbsp;stack</p> <p>4. Update the <em>visit_Var</em> method to access a value by its name from the activation record at the top of the call&nbsp;stack</p> <p>5. Add a <em>log</em> method and update the <em>visit_Program</em> method to use it to print the contents of the <em>call stack</em> when interpreting a&nbsp;program</p> <p>Let’s get started, shall&nbsp;we?</p> <p>1. First things first, let’s replace the <em>GLOBAL_MEMORY</em> dictionary with our <em>call stack</em> implementation. All we need to do is change the <em>Interpreter</em> constructor from&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tree</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">tree</span> <span class="o">=</span> <span class="n">tree</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_MEMORY</span> <span class="o">=</span> <span class="p">{}</span> </pre></div> <p>to&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tree</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">tree</span> <span class="o">=</span> <span class="n">tree</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span> <span class="o">=</span> <span class="n">CallStack</span><span class="p">()</span> </pre></div> <p>2. Now, let’s update the <em>visit_Program</em>&nbsp;method:</p> <p>Old&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> </pre></div> <p>New&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">program_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">program_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROGRAM</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> </pre></div> <p>Let’s unpack what’s going on in the updated method&nbsp;above:</p> <ul> <li> <p>First, we create an activation record, giving it the name of the program, the <span class="caps">PROGRAM</span> type, and the nesting level&nbsp;1</p> </li> <li> <p>Then we push the activation record onto the call stack; we do this before anything else so that the rest of the interpreter can use the call stack with the single activation record at the top of the stack to store and access global&nbsp;variables</p> </li> <li> <p>Then we evaluate the body of the program as usual. Again, as our interpreter evaluates the body of the program, it uses the activation record at the top of the call stack to store and access global&nbsp;variables</p> </li> <li> <p>Next, right before exiting the <em>visit_Program</em> method, we pop the activation record off the call stack; we don’t need it anymore because at this point the execution of the program by the interpreter is over and we can safely discard the activation record that is no longer&nbsp;used</p> </li> </ul> <p>3. Up next, let’s update the <em>visit_Assign</em> method to store a key-value pair in the activation record at the top of the call&nbsp;stack:</p> <p>Old&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_MEMORY</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">var_value</span> </pre></div> <p>New&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="n">ar</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="n">ar</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="n">var_value</span> </pre></div> <p>In the code above we use the <em>peek()</em> method to get the activation record at the top of the stack (the one that was pushed onto the stack by the <em>visit_Program</em> method) and then use the record to store the value <em>var_value</em> using <em>var_name</em> as a&nbsp;key.</p> <p>4. Next, let’s update the <em>visit_Var</em> method to access a value by its name from the activation record at the top of the call&nbsp;stack:</p> <p>Old&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_MEMORY</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">return</span> <span class="n">var_value</span> </pre></div> <p>New&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">ar</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="n">var_value</span> <span class="o">=</span> <span class="n">ar</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">return</span> <span class="n">var_value</span> </pre></div> <p>Again as you can see, we use the <em>peek()</em> method to get the top (and only) activation record - the one that was pushed onto the stack by the <em>visit_Program</em> method to hold all the global variables and their values - and then get a value associated with the <em>var_name</em>&nbsp;key.</p> <p>5. And the last change in the <em>Interpreter</em> class that we’re going to make is to add a <em>log</em> method and use the <em>log</em> method to print the contents of the call stack when the interpreter evaluates a&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">):</span> <span class="k">if</span> <span class="n">_SHOULD_LOG_STACK</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">program_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;ENTER: PROGRAM {program_name}&#39;</span><span class="p">)</span> <span class="n">ar</span> <span class="o">=</span> <span class="n">ActivationRecord</span><span class="p">(</span> <span class="n">name</span><span class="o">=</span><span class="n">program_name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="n">ARType</span><span class="o">.</span><span class="n">PROGRAM</span><span class="p">,</span> <span class="n">nesting_level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">ar</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;LEAVE: PROGRAM {program_name}&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">call_stack</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span> </pre></div> <p>The messages will be logged only if the global variable _SHOULD_LOG_STACK is set to true. The variable’s value will be controlled by the &#8220;&#8212;stack&#8221; command line option. First, let’s update the main function and add the &#8220;&#8212;stack&#8221; command line option to turn the logging of the call stack contents on and&nbsp;off:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">(</span> <span class="n">description</span><span class="o">=</span><span class="s1">&#39;SPI - Simple Pascal Interpreter&#39;</span> <span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;inputfile&#39;</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Pascal source file&#39;</span><span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> <span class="s1">&#39;--scope&#39;</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Print scope information&#39;</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span> <span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> <span class="s1">&#39;--stack&#39;</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Print call stack&#39;</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span> <span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span> <span class="k">global</span> <span class="n">_SHOULD_LOG_SCOPE</span><span class="p">,</span> <span class="n">_SHOULD_LOG_STACK</span> <span class="n">_SHOULD_LOG_SCOPE</span><span class="p">,</span> <span class="n">_SHOULD_LOG_STACK</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">scope</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">stack</span> </pre></div> <p><br/> Now, let’s take our updated interpreter for a test drive. Download the interpreter from <a href="https://github.com/rspivak/lsbasi/tree/master/part17">GitHub</a> and run it with the <em>-h</em> command line option to see available command line&nbsp;options:</p> <div class="highlight"><pre><span></span>$ python spi.py -h usage: spi.py <span class="o">[</span>-h<span class="o">]</span> <span class="o">[</span>--scope<span class="o">]</span> <span class="o">[</span>--stack<span class="o">]</span> inputfile SPI - Simple Pascal Interpreter positional arguments: inputfile Pascal <span class="nb">source</span> file optional arguments: -h, --help show this <span class="nb">help</span> message and <span class="nb">exit</span> --scope Print scope information --stack Print call stack </pre></div> <p>Download the following sample program from <a href="https://github.com/rspivak/lsbasi/tree/master/part17">GitHub</a> or save it to file&nbsp;part17.pas</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="n">y</span> <span class="o">:=</span> <span class="mi">7</span><span class="o">;</span> <span class="n">x</span> <span class="o">:=</span> <span class="p">(</span><span class="n">y</span> <span class="o">+</span> <span class="mi">3</span><span class="p">)</span> <span class="o">*</span> <span class="mi">3</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Run the interpreter with the <em>part17.pas</em> file as its input file and the &#8220;&#8212;stack&#8221; command line option to see the contents of the call stack as the interpreter executes the source&nbsp;program:</p> <div class="highlight"><pre><span></span>$ python spi.py part17.pas --stack ENTER: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main LEAVE: PROGRAM Main CALL STACK <span class="m">1</span>: PROGRAM Main y : <span class="m">7</span> x : <span class="m">30</span> </pre></div> <p><br/> Mission accomplished! We have implemented a new memory system that can support programs, procedure calls, and function calls. And we’ve replaced the interpreter’s current memory system, represented by the <em>GLOBAL_MEMORY</em> dictionary, with the new system based on the call stack and activation&nbsp;records.</p> <p><br/> That’s all for today. In the next article we’ll extend the interpreter to execute procedure calls using the call stack and activation records. This will be a huge milestone for us. So stay tuned and see you next&nbsp;time!</p> <p><br/> <em>Resources used in preparation for this article (some links are affiliate&nbsp;links):</em></p> <ol> <li><a target="_blank" href="https://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=5d5ca8c07bff5452ea443d8319e7703d">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=542d1267e34a529e0f69027af20e27f3">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0124104096/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0124104096&linkCode=as2&tag=russblo0b-20&linkId=8db1da254b12fe6da1379957dda717fc">Programming Language Pragmatics, Fourth Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0124104096" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0814420303/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0814420303&linkCode=as2&tag=russblo0b-20&linkId=bee8bb0ac4fa2fb1ce587e093b6cfe6c">Lead with a Story</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0814420303" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li>A <a href="https://en.wikipedia.org/wiki/John_Stephen_Akhwari">Wikipedia article</a> on John Stephen&nbsp;Akhwari</li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 16: Recognizing Procedure Calls2019-07-23T08:20:00-04:002019-07-23T08:20:00-04:00Ruslan Spivaktag:ruslanspivak.com,2019-07-23:/lsbasi-part16/<p>Learning is like rowing upstream: not to advance is to drop back. — Chinese&nbsp;proverb</p><blockquote> <p><em><span class="dquo">&#8220;</span>Learning is like rowing upstream: not to advance is to drop back.&#8221; — Chinese&nbsp;proverb</em></p> </blockquote> <p>Today we’re going to extend our interpreter to recognize procedure calls. I hope by now you’ve flexed your coding muscles and are ready to tackle this step. This is a necessary step for us before we can learn how to execute procedure calls, which will be a topic that we will cover in great detail in future&nbsp;articles.</p> <p>The goal for today is to make sure that when our interpreter reads a program with a procedure call, the parser constructs an Abstract Syntax Tree (<span class="caps">AST</span>) with a new tree node for the procedure call, and the semantic analyzer and the interpreter don’t throw any errors when walking the <span class="caps">AST</span>.</p> <p>Let’s take a look at a sample program that contains a procedure call <em>Alpha(3 + 5, 7)</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part16/lsbasi_part16_img01.png" width="640"></p> <p>Making our interpreter recognize programs like the one above will be our focus for&nbsp;today.</p> <p>As with any new feature, we need to update various components of the interpreter to support this feature. Let’s dive into each of those components one by&nbsp;one.</p> <p><br/> First, we need to update the parser. Here is a list of all the parser changes that we need to make to be able to parse procedure calls and build the right <span class="caps">AST</span>:</p> <ol> <li>We need to add a new <span class="caps">AST</span> node to represent a procedure&nbsp;call</li> <li>We need to add a new grammar rule for procedure call statements; then we need to implement the rule in&nbsp;code</li> <li>We need to extend the <em>statement</em> grammar rule to include the rule for procedure call statements and update the <em>statement</em> method to reflect the changes in the&nbsp;grammar</li> </ol> <p>1. Let’s start by creating a separate class to represent a procedure call <span class="caps">AST</span> node. Let’s call the class <em>ProcedureCall</em>:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureCall</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proc_name</span><span class="p">,</span> <span class="n">actual_params</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc_name</span> <span class="o">=</span> <span class="n">proc_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">actual_params</span> <span class="o">=</span> <span class="n">actual_params</span> <span class="c1"># a list of AST nodes</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> </pre></div> <p>The <em>ProcedureCall</em> class constructor takes three parameters: a procedure name, a list of actual parameters (a.k.a arguments), and a token. Nothing really special here, just enough information for us to capture a particular procedure&nbsp;call.</p> <p>2. The next step that we need to take is to extend our grammar and add a grammar rule for procedure calls. Let’s call the rule <em>proccall_statement</em>:</p> <div class="highlight"><pre><span></span><span class="n">proccall_statement</span> <span class="o">:</span> <span class="n">ID</span> <span class="n">LPAREN</span> <span class="o">(</span><span class="n">expr</span> <span class="o">(</span><span class="n">COMMA</span> <span class="n">expr</span><span class="o">)*)?</span> <span class="n">RPAREN</span> </pre></div> <p>Here is a corresponding syntax diagram for the&nbsp;rule:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part16/lsbasi_part16_img02.png" width="640"></p> <p>From the diagram above you can see that a procedure call is an <span class="caps">ID</span> token followed by a left parenthesis, followed by zero or more expressions separated by commas, followed by a right parenthesis. Here are some of the procedure call examples that fit the&nbsp;rule:</p> <div class="highlight"><pre><span></span><span class="n">Alpha</span><span class="p">()</span><span class="o">;</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="o">;</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> </pre></div> <p>Next, let’s implement the rule in our parser by adding a <em>proccall_statement</em>&nbsp;method</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">proccall_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;proccall_statement : ID LPAREN (expr (COMMA expr)*)? RPAREN&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">value</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">actual_params</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">!=</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">RPAREN</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="n">actual_params</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">COMMA</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">COMMA</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="n">actual_params</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">RPAREN</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">ProcedureCall</span><span class="p">(</span> <span class="n">proc_name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="n">actual_params</span><span class="o">=</span><span class="n">actual_params</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="p">)</span> <span class="k">return</span> <span class="n">node</span> </pre></div> <p>The implementation is pretty straightforward and follows the grammar rule: the method parses a procedure call and returns a new <em>ProcedureCall</em> <span class="caps">AST</span>&nbsp;node.</p> <p>3. And the last changes to the parser that we need to make are: extend the <em>statement</em> grammar rule by adding the <em>proccall_statement</em> rule and update the <em>statement</em> method to call the <em>proccall_statement</em>&nbsp;method.</p> <p>Here is the updated <em>statement</em> grammar rule, which includes the <em>proccall_statement</em>&nbsp;rule:</p> <div class="highlight"><pre><span></span><span class="n">statement</span> <span class="o">:</span> <span class="n">compound_statement</span> <span class="o">|</span> <span class="n">proccall_statement</span> <span class="o">|</span> <span class="n">assignment_statement</span> <span class="o">|</span> <span class="n">empty</span> </pre></div> <p>Now, we have a tricky situation on hand where we have two grammar rules - <em>proccall_statement</em> and <em>assignment_statement</em> - that start with the same token, the <span class="caps">ID</span> token. Here are their complete grammar rules put together for&nbsp;comparison:</p> <div class="highlight"><pre><span></span><span class="n">proccall_statement</span> <span class="o">:</span> <span class="n">ID</span> <span class="n">LPAREN</span> <span class="o">(</span><span class="n">expr</span> <span class="o">(</span><span class="n">COMMA</span> <span class="n">expr</span><span class="o">)*)?</span> <span class="n">RPAREN</span> <span class="n">assignment_statement</span> <span class="o">:</span> <span class="n">variable</span> <span class="n">ASSIGN</span> <span class="n">expr</span> <span class="n">variable</span><span class="o">:</span> <span class="n">ID</span> </pre></div> <p>How do you distinguish between a procedure call and an assignment in a case like that? They are both statements and they both start with an <span class="caps">ID</span> token. In the fragment of code below, the <span class="caps">ID</span> token’s value(lexeme) for both statements is <em>foo</em>:</p> <div class="highlight"><pre><span></span><span class="n">foo</span><span class="p">()</span><span class="o">;</span> <span class="cm">{ procedure call }</span> <span class="n">foo</span> <span class="o">:=</span> <span class="mi">5</span><span class="o">;</span> <span class="cm">{ assignment }</span> </pre></div> <p>The parser should recognize <em>foo();</em> above as a procedure call and <em>foo := 5;</em> as an assignment. But what can we do to help the parser to distinguish between procedure calls and assignments? According to our new <em>proccall_statement</em> grammar rule, procedure calls start with an <span class="caps">ID</span> token followed by a left parenthesis. And that’s what we are going to rely on in the parser to distinguish between procedure calls and assignments to variables - the presence of a left parenthesis after the <span class="caps">ID</span>&nbsp;token:</p> <div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;(&#39;</span> <span class="p">):</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proccall_statement</span><span class="p">()</span> <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">assignment_statement</span><span class="p">()</span> </pre></div> <p>As you can see in the code above, first we check if the current token is an <span class="caps">ID</span> token and then we check if it’s followed by a left parenthesis. If it is, we parse a procedure call, otherwise we parse an assignment&nbsp;statement.</p> <p>Here is the full updated version of the <em>statement</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> statement : compound_statement</span> <span class="sd"> | proccall_statement</span> <span class="sd"> | assignment_statement</span> <span class="sd"> | empty</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">()</span> <span class="k">elif</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;(&#39;</span> <span class="p">):</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">proccall_statement</span><span class="p">()</span> <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">assignment_statement</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">empty</span><span class="p">()</span> <span class="k">return</span> <span class="n">node</span> </pre></div> <p><br/> So far so good. The parser can now parse procedure calls. One thing to keep in mind though is that Pascal procedures don’t have return statements, so we can’t use procedure calls in expressions. For example, the following example will not work if <em>Alpha</em> is a&nbsp;procedure:</p> <div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> </pre></div> <p>That’s why we added <em>proccall_statement</em> to the <em>statements</em> method only and nowhere else. Not to worry, later in the series we’ll learn about Pascal functions that can return values and also can be used in expressions and&nbsp;assignments.</p> <p>These are all the changes for our parser. Next up is the semantic analyzer&nbsp;changes.</p> <p><br/> The only change we need to make in our semantic analyzer to support procedure calls is to add a <em>visit_ProcedureCall</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">param_node</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">actual_params</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">param_node</span><span class="p">)</span> </pre></div> <p>All the method does is iterate over a list of actual parameters passed to a procedure call and visit each parameter node in turn. It’s important not to forget to visit each parameter node because each parameter node is an <span class="caps">AST</span> sub-tree in&nbsp;itself.</p> <p>That was easy, wasn’t it? Okay, now moving on to interpreter&nbsp;changes.</p> <p><br/> The interpreter changes, compared to the changes to the semantic analyzer, are even simpler - we only need to add an empty <em>visit_ProcedureCall</em> method to the <em>Interpreter</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureCall</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p>With all the above changes in place, we now have an interpreter that can recognize procedure calls. And by that I mean the interpreter can parse procedure calls and create an <span class="caps">AST</span> with <em>ProcedureCall</em> nodes corresponding to those procedure calls. Here is the sample Pascal program we saw at the beginning of the article that we want our interpreter to be tested&nbsp;on:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">7</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ procedure call }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Download the above program from <a href="https://github.com/rspivak/lsbasi/tree/master/part16">GitHub</a> or save the code to the file&nbsp;part16.pas</p> <p>See for yourself that running our <a href="https://github.com/rspivak/lsbasi/tree/master/part16">updated interpreter</a> with the part16.pas as its input file does not generate any&nbsp;errors:</p> <div class="highlight"><pre><span></span>$ python spi.py part16.pas $ </pre></div> <p>So far so good, but no output is not that exciting. :) Let’s get a bit visual and generate an <span class="caps">AST</span> for the above program and then visualize the <span class="caps">AST</span> using an updated version of the <a href="https://github.com/rspivak/lsbasi/tree/master/part16/genastdot.py">genastdot.py</a>&nbsp;utility:</p> <div class="highlight"><pre><span></span>$ python genastdot.py part16.pas &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part16/lsbasi_part16_img03.png" width="640"></p> <p>That’s better. In the picture above you can see our new <em>ProcCall</em> <span class="caps">AST</span> node labeled <em>ProcCall:Alpha</em> for the <em>Alpha(3 + 5, 7)</em> procedure call. The two children of the <em>ProcCall:Alpha</em> node are the subtrees for the arguments <em>3 + 5</em> and <em>7</em> passed to the <em>Alpha(3 + 5, 7)</em> procedure&nbsp;call.</p> <p>Okay, we have accomplished our goal for today: when encountering a procedure call, the parser constructs an <span class="caps">AST</span> with a <em>ProcCall</em> node for the procedure call, and the semantic analyzer and the interpreter don’t throw any errors when walking the <span class="caps">AST</span>.</p> <p><br/> Now, it’s time for an&nbsp;exercise.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part16/lsbasi_part16_img04.png" width="180"></p> <p>Exercise: Add a check to the semantic analyzer that verifies that the number of arguments (actual parameters) passed to a procedure call equals the number of formal parameters defined in the corresponding procedure declaration. Let’s take the <em>Alpha</em> procedure declaration we used earlier in the article as an&nbsp;example:</p> <div class="highlight"><pre><span></span><span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="p">)</span> <span class="o">*</span> <span class="mi">2</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> </pre></div> <p>The number of formal parameters in the procedure declaration above is two (integers <em>a</em> and <em>b</em>). Your check should throw an error if you try to call the procedure with a number of arguments other than&nbsp;two:</p> <div class="highlight"><pre><span></span><span class="n">Alpha</span><span class="p">()</span><span class="o">;</span> <span class="cm">{ 0 arguments —&gt; ERROR }</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ 1 argument —&gt; ERROR }</span> <span class="n">Alpha</span><span class="p">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="p">)</span><span class="o">;</span> <span class="cm">{ 3 arguments —&gt; ERROR }</span> </pre></div> <p>You can find a solution to the exercise in the file <em>solutions.txt</em> on <a href="https://github.com/rspivak/lsbasi/tree/master/part16">GitHub</a>, but try to work out your own solution first before peeking into the&nbsp;file.</p> <p><br/> That’s all for today. In the next article we’ll begin to learn how to interpret procedure calls. We will cover topics like call stack and activation records. It is going to be a wild ride :) So stay tuned and see you next&nbsp;time!</p> <p><br/> <em>Resources used in preparation for this article (some links are affiliate&nbsp;links):</em></p> <ol> <li><a target="_blank" href="https://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=5d5ca8c07bff5452ea443d8319e7703d">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=542d1267e34a529e0f69027af20e27f3">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li> <li><a target="_blank" href="https://www.freepascal.org/docs-html/current/ref/ref.html">Free Pascal Reference&nbsp;guide</a></li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 15.2019-06-21T05:45:00-04:002019-06-21T05:45:00-04:00Ruslan Spivaktag:ruslanspivak.com,2019-06-21:/lsbasi-part15/<blockquote> <p><em><span class="dquo">&#8220;</span>I am a slow walker, but I never walk back.&#8221; — Abraham&nbsp;Lincoln</em></p> </blockquote> <p>And we’re back to our regularly scheduled programming!&nbsp;:)</p> <p>Before moving on to topics of recognizing and interpreting procedure calls, let’s make some changes to improve our error reporting a bit. Up until now, if there was …</p><blockquote> <p><em><span class="dquo">&#8220;</span>I am a slow walker, but I never walk back.&#8221; — Abraham&nbsp;Lincoln</em></p> </blockquote> <p>And we’re back to our regularly scheduled programming!&nbsp;:)</p> <p>Before moving on to topics of recognizing and interpreting procedure calls, let’s make some changes to improve our error reporting a bit. Up until now, if there was a problem getting a new token from text, parsing source code, or doing semantic analysis, a stack trace would be thrown right into your face with a very generic message. We can do better than&nbsp;that.</p> <p>To provide better error messages pinpointing where in the code an issue happened, we need to add some features to our interpreter. Let’s do that and make some other changes along the way. This will make the interpreter more user friendly and give us an opportunity to flex our muscles after a “short” break in the series. It will also give us a chance to prepare for new features that we will be adding in future&nbsp;articles.</p> <p>Goals for&nbsp;today:</p> <ul> <li>Improve error reporting in the lexer, parser, and semantic analyzer. Instead of stack traces with very generic messages like <em>&#8220;Invalid syntax&#8221;</em>, we would like to see something more useful like <em>&#8220;SyntaxError: Unexpected token -> Token(TokenType.<span class="caps">SEMI</span>, &#8216;;&#8217;,&nbsp;position=23:13)&#8221;</em></li> <li>Add a &#8220;&#8212;scope&#8221; command line option to turn scope output&nbsp;on/off</li> <li>Switch to Python 3. From here on out, all code will be tested on Python 3.7+&nbsp;only</li> </ul> <p>Let’s get cracking and start flexing our coding muscles by changing our lexer&nbsp;first.</p> <p><br/> Here is a list of the changes we are going to make in our lexer&nbsp;today:</p> <ol> <li>We will add error codes and custom exceptions: <em>LexerError</em>, <em>ParserError</em>, and <em>SemanticError</em></li> <li>We will add new members to the <em>Lexer</em> class to help to track tokens’ positions: <em>lineno</em> and <em>column</em></li> <li>We will modify the <em>advance</em> method to update the lexer’s <em>lineno</em> and <em>column</em>&nbsp;variables</li> <li>We will update the <em>error</em> method to raise a <em>LexerError</em> exception with information about the current line and&nbsp;column</li> <li>We will define token types in the <em>TokenType</em> enumeration class (Support for enumerations was added in Python&nbsp;3.4)</li> <li>We will add code to automatically create reserved keywords from the <em>TokenType</em> enumeration&nbsp;members</li> <li>We will add new members to the <em>Token</em> class: <em>lineno</em> and <em>column</em> to keep track of the token’s line number and column number, correspondingly, in the&nbsp;text</li> <li>We will refactor the <em>get_next_token</em> method code to make it shorter and have a generic code that handles single-character&nbsp;tokens</li> </ol> <p><br/> 1. Let’s define some error codes first. These codes will be used by our parser and semantic analyzer. Let’s also define the following error classes: <em>LexerError</em>, <em>ParserError</em>, and <em>SemanticError</em> for lexical, syntactic, and, correspondingly, semantic&nbsp;errors:</p> <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">enum</span> <span class="kn">import</span> <span class="n">Enum</span> <span class="k">class</span> <span class="nc">ErrorCode</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span> <span class="n">UNEXPECTED_TOKEN</span> <span class="o">=</span> <span class="s1">&#39;Unexpected token&#39;</span> <span class="n">ID_NOT_FOUND</span> <span class="o">=</span> <span class="s1">&#39;Identifier not found&#39;</span> <span class="n">DUPLICATE_ID</span> <span class="o">=</span> <span class="s1">&#39;Duplicate id found&#39;</span> <span class="k">class</span> <span class="nc">Error</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">error_code</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">message</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">error_code</span> <span class="o">=</span> <span class="n">error_code</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="c1"># add exception class name before the message</span> <span class="bp">self</span><span class="o">.</span><span class="n">message</span> <span class="o">=</span> <span class="n">f</span><span class="s1">&#39;{self.__class__.__name__}: {message}&#39;</span> <span class="k">class</span> <span class="nc">LexerError</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span> <span class="k">pass</span> <span class="k">class</span> <span class="nc">ParserError</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span> <span class="k">pass</span> <span class="k">class</span> <span class="nc">SemanticError</span><span class="p">(</span><span class="n">Error</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p><br/> <em>ErrorCode</em> is an enumeration class, where each member has a name and a&nbsp;value:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">enum</span> <span class="kn">import</span> <span class="n">Enum</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="k">class</span> <span class="nc">ErrorCode</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span> <span class="o">...</span> <span class="n">UNEXPECTED_TOKEN</span> <span class="o">=</span> <span class="s1">&#39;Unexpected token&#39;</span> <span class="o">...</span> <span class="n">ID_NOT_FOUND</span> <span class="o">=</span> <span class="s1">&#39;Identifier not found&#39;</span> <span class="o">...</span> <span class="n">DUPLICATE_ID</span> <span class="o">=</span> <span class="s1">&#39;Duplicate id found&#39;</span> <span class="o">...</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">ErrorCode</span> <span class="o">&lt;</span><span class="n">enum</span> <span class="s1">&#39;ErrorCode&#39;</span><span class="o">&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">ErrorCode</span><span class="o">.</span><span class="n">ID_NOT_FOUND</span> <span class="o">&lt;</span><span class="n">ErrorCode</span><span class="o">.</span><span class="n">ID_NOT_FOUND</span><span class="p">:</span> <span class="s1">&#39;Identifier not found&#39;</span><span class="o">&gt;</span> </pre></div> <p><br/> The <em>Error</em> base class constructor takes three&nbsp;arguments:</p> <ul> <li> <p><em>error_code</em>: ErrorCode.ID_NOT_FOUND,&nbsp;etc</p> </li> <li> <p><em>token</em>: an instance of the <em>Token</em>&nbsp;class</p> </li> <li> <p><em>message</em>: a message with more detailed information about the&nbsp;problem</p> </li> </ul> <p>As I&#8217;ve mentioned before, <em>LexerError</em> is used to indicate an error encountered in the lexer, <em>ParserError</em> is for syntax related errors during the parsing phase, and <em>SemanticError</em> is for semantic&nbsp;errors.</p> <p><br/> 2. To provide better error messages, we want to display the position in the source text where the problem happened. To be able do that, we need to start tracking the current line number and column in our lexer as we generate tokens. Let’s add <em>lineno</em> and <em>column</em> fields to the <em>Lexer</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Lexer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="o">...</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="c1"># token line number and column number</span> <span class="bp">self</span><span class="o">.</span><span class="n">lineno</span> <span class="o">=</span> <span class="mi">1</span> <span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">=</span> <span class="mi">1</span> </pre></div> <p><br/> 3. The next change that we need to make is to reset <em>lineno</em> and <em>column</em> in the <em>advance</em> method when encountering a new line and also increase the <em>column</em> value on each advance of the <em>self.pos</em>&nbsp;pointer:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the `pos` pointer and set the `current_char` variable.&quot;&quot;&quot;</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">lineno</span> <span class="o">+=</span> <span class="mi">1</span> <span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">=</span> <span class="mi">0</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">+=</span> <span class="mi">1</span> </pre></div> <p>With those changes in place, every time we create a token we will pass the current <em>lineno</em> and <em>column</em> from the lexer to the newly created&nbsp;token.</p> <p><br/> 4. Let’s update the <em>error</em> method to throw a <em>LexerError</em> exception with a more detailed error message telling us the current character that the lexer choked on and its location in the&nbsp;text.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">s</span> <span class="o">=</span> <span class="s2">&quot;Lexer error on &#39;{lexeme}&#39; line: {lineno} column: {column}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">lexeme</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="p">,</span> <span class="n">lineno</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">lineno</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">,</span> <span class="p">)</span> <span class="k">raise</span> <span class="n">LexerError</span><span class="p">(</span><span class="n">message</span><span class="o">=</span><span class="n">s</span><span class="p">)</span> </pre></div> <p><br/> 5. Instead of having token types defined as module level variables, we are going to move them into a dedicated enumeration class called <em>TokenType</em>. This will help us simplify certain operations and make some parts of our code a bit&nbsp;shorter.</p> <p>Old&nbsp;style:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="n">PLUS</span> <span class="o">=</span> <span class="s1">&#39;PLUS&#39;</span> <span class="n">MINUS</span> <span class="o">=</span> <span class="s1">&#39;MINUS&#39;</span> <span class="n">MUL</span> <span class="o">=</span> <span class="s1">&#39;MUL&#39;</span> <span class="o">...</span> </pre></div> <p>New&nbsp;style:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">TokenType</span><span class="p">(</span><span class="n">Enum</span><span class="p">):</span> <span class="c1"># single-character token types</span> <span class="n">PLUS</span> <span class="o">=</span> <span class="s1">&#39;+&#39;</span> <span class="n">MINUS</span> <span class="o">=</span> <span class="s1">&#39;-&#39;</span> <span class="n">MUL</span> <span class="o">=</span> <span class="s1">&#39;*&#39;</span> <span class="n">FLOAT_DIV</span> <span class="o">=</span> <span class="s1">&#39;/&#39;</span> <span class="n">LPAREN</span> <span class="o">=</span> <span class="s1">&#39;(&#39;</span> <span class="n">RPAREN</span> <span class="o">=</span> <span class="s1">&#39;)&#39;</span> <span class="n">SEMI</span> <span class="o">=</span> <span class="s1">&#39;;&#39;</span> <span class="n">DOT</span> <span class="o">=</span> <span class="s1">&#39;.&#39;</span> <span class="n">COLON</span> <span class="o">=</span> <span class="s1">&#39;:&#39;</span> <span class="n">COMMA</span> <span class="o">=</span> <span class="s1">&#39;,&#39;</span> <span class="c1"># block of reserved words</span> <span class="n">PROGRAM</span> <span class="o">=</span> <span class="s1">&#39;PROGRAM&#39;</span> <span class="c1"># marks the beginning of the block</span> <span class="n">INTEGER</span> <span class="o">=</span> <span class="s1">&#39;INTEGER&#39;</span> <span class="n">REAL</span> <span class="o">=</span> <span class="s1">&#39;REAL&#39;</span> <span class="n">INTEGER_DIV</span> <span class="o">=</span> <span class="s1">&#39;DIV&#39;</span> <span class="n">VAR</span> <span class="o">=</span> <span class="s1">&#39;VAR&#39;</span> <span class="n">PROCEDURE</span> <span class="o">=</span> <span class="s1">&#39;PROCEDURE&#39;</span> <span class="n">BEGIN</span> <span class="o">=</span> <span class="s1">&#39;BEGIN&#39;</span> <span class="n">END</span> <span class="o">=</span> <span class="s1">&#39;END&#39;</span> <span class="c1"># marks the end of the block</span> <span class="c1"># misc</span> <span class="n">ID</span> <span class="o">=</span> <span class="s1">&#39;ID&#39;</span> <span class="n">INTEGER_CONST</span> <span class="o">=</span> <span class="s1">&#39;INTEGER_CONST&#39;</span> <span class="n">REAL_CONST</span> <span class="o">=</span> <span class="s1">&#39;REAL_CONST&#39;</span> <span class="n">ASSIGN</span> <span class="o">=</span> <span class="s1">&#39;:=&#39;</span> <span class="n">EOF</span> <span class="o">=</span> <span class="s1">&#39;EOF&#39;</span> </pre></div> <p><br/> 6. We used to manually add items to the <em>RESERVED_KEYWORDS</em> dictionary whenever we had to add a new token type that was also a reserved keyword. If we wanted to add a new <span class="caps">STRING</span> token type, we would have&nbsp;to</p> <ul> <li>(a) create a new module level variable <span class="caps">STRING</span> = &#8216;<span class="caps">STRING</span>&#8217;</li> <li>(b) manually add it to the <em>RESERVED_KEYWORDS</em>&nbsp;dictionary</li> </ul> <p>Now that we have the <em>TokenType</em> enumeration class, we can remove the manual step <strong>(b)</strong> above and keep token types in one place only. This is the &#8220;<a href="https://www.codesimplicity.com/post/two-is-too-many/">two is too many</a>&#8221; rule in action - going forward, the only change you need to make to add a new keyword token type is to put the keyword between <span class="caps">PROGRAM</span> and <span class="caps">END</span> in the <em>TokenType</em> enumeration class, and the <em>_build_reserved_keywords</em> function will take care of the&nbsp;rest:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">_build_reserved_keywords</span><span class="p">():</span> <span class="sd">&quot;&quot;&quot;Build a dictionary of reserved keywords.</span> <span class="sd"> The function relies on the fact that in the TokenType</span> <span class="sd"> enumeration the beginning of the block of reserved keywords is</span> <span class="sd"> marked with PROGRAM and the end of the block is marked with</span> <span class="sd"> the END keyword.</span> <span class="sd"> Result:</span> <span class="sd"> {&#39;PROGRAM&#39;: &lt;TokenType.PROGRAM: &#39;PROGRAM&#39;&gt;,</span> <span class="sd"> &#39;INTEGER&#39;: &lt;TokenType.INTEGER: &#39;INTEGER&#39;&gt;,</span> <span class="sd"> &#39;REAL&#39;: &lt;TokenType.REAL: &#39;REAL&#39;&gt;,</span> <span class="sd"> &#39;DIV&#39;: &lt;TokenType.INTEGER_DIV: &#39;DIV&#39;&gt;,</span> <span class="sd"> &#39;VAR&#39;: &lt;TokenType.VAR: &#39;VAR&#39;&gt;,</span> <span class="sd"> &#39;PROCEDURE&#39;: &lt;TokenType.PROCEDURE: &#39;PROCEDURE&#39;&gt;,</span> <span class="sd"> &#39;BEGIN&#39;: &lt;TokenType.BEGIN: &#39;BEGIN&#39;&gt;,</span> <span class="sd"> &#39;END&#39;: &lt;TokenType.END: &#39;END&#39;&gt;}</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="c1"># enumerations support iteration, in definition order</span> <span class="n">tt_list</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">TokenType</span><span class="p">)</span> <span class="n">start_index</span> <span class="o">=</span> <span class="n">tt_list</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">PROGRAM</span><span class="p">)</span> <span class="n">end_index</span> <span class="o">=</span> <span class="n">tt_list</span><span class="o">.</span><span class="n">index</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">END</span><span class="p">)</span> <span class="n">reserved_keywords</span> <span class="o">=</span> <span class="p">{</span> <span class="n">token_type</span><span class="o">.</span><span class="n">value</span><span class="p">:</span> <span class="n">token_type</span> <span class="k">for</span> <span class="n">token_type</span> <span class="ow">in</span> <span class="n">tt_list</span><span class="p">[</span><span class="n">start_index</span><span class="p">:</span><span class="n">end_index</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="p">}</span> <span class="k">return</span> <span class="n">reserved_keywords</span> <span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="n">_build_reserved_keywords</span><span class="p">()</span> </pre></div> <p><br/> As you can see from the function&#8217;s documentation string, the function relies on the fact that a block of reserved keywords in the <em>TokenType</em> enum is marked by <span class="caps">PROGRAM</span> and <span class="caps">END</span>&nbsp;keywords.</p> <p>The function first turns <em>TokenType</em> into a list (the definition order is preserved), and then it gets the starting index of the block (marked by the <span class="caps">PROGRAM</span> keyword) and the end index of the block (marked by the <span class="caps">END</span> keyword). Next, it uses dictionary comprehension to build a dictionary where the keys are string values of the enum members and the values are the <em>TokenType</em> members&nbsp;themselves.</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">_build_reserved_keywords</span> <span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">pprint</span><span class="p">(</span><span class="n">_build_reserved_keywords</span><span class="p">())</span> <span class="c1"># &#39;pprint&#39; sorts the keys</span> <span class="p">{</span><span class="s1">&#39;BEGIN&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">:</span> <span class="s1">&#39;BEGIN&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">INTEGER_DIV</span><span class="p">:</span> <span class="s1">&#39;DIV&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;END&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">END</span><span class="p">:</span> <span class="s1">&#39;END&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">:</span> <span class="s1">&#39;INTEGER&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;PROCEDURE&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">:</span> <span class="s1">&#39;PROCEDURE&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;PROGRAM&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">PROGRAM</span><span class="p">:</span> <span class="s1">&#39;PROGRAM&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;REAL&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">REAL</span><span class="p">:</span> <span class="s1">&#39;REAL&#39;</span><span class="o">&gt;</span><span class="p">,</span> <span class="s1">&#39;VAR&#39;</span><span class="p">:</span> <span class="o">&lt;</span><span class="n">TokenType</span><span class="o">.</span><span class="n">VAR</span><span class="p">:</span> <span class="s1">&#39;VAR&#39;</span><span class="o">&gt;</span><span class="p">}</span> </pre></div> <p><br/> 7. The next change is to add new members to the <em>Token</em> class, namely <em>lineno</em> and <em>column,</em> to keep track of a token&#8217;s line number and column number in a&nbsp;text</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">lineno</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="bp">self</span><span class="o">.</span><span class="n">lineno</span> <span class="o">=</span> <span class="n">lineno</span> <span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">=</span> <span class="n">column</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Example:</span> <span class="sd"> &gt;&gt;&gt; Token(TokenType.INTEGER, 7, lineno=5, column=10)</span> <span class="sd"> Token(TokenType.INTEGER, 7, position=5:10)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value}, position={lineno}:{column})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">),</span> <span class="n">lineno</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">lineno</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">,</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> </pre></div> <p><br/> 8. Now, onto <em>get_next_token</em> method changes. Thanks to enums, we can reduce the amount of code that deals with single character tokens by writing a generic code that generates single character tokens and doesn&#8217;t need to change when we add a new single character token&nbsp;type:</p> <p>Instead of a lot of code blocks like&nbsp;these:</p> <div class="highlight"><pre><span></span><span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;;&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">SEMI</span><span class="p">,</span> <span class="s1">&#39;;&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;:&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">COLON</span><span class="p">,</span> <span class="s1">&#39;:&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;,&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">COMMA</span><span class="p">,</span> <span class="s1">&#39;,&#39;</span><span class="p">)</span> <span class="o">...</span> </pre></div> <p>We can now use this generic code to take care of all current and future single-character&nbsp;tokens</p> <div class="highlight"><pre><span></span><span class="c1"># single-character token</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># get enum member by value, e.g.</span> <span class="c1"># TokenType(&#39;;&#39;) --&gt; TokenType.SEMI</span> <span class="n">token_type</span> <span class="o">=</span> <span class="n">TokenType</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="p">)</span> <span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span> <span class="c1"># no enum member with value equal to self.current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># create a token with a single-character lexeme as its value</span> <span class="n">token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="n">token_type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">token_type</span><span class="o">.</span><span class="n">value</span><span class="p">,</span> <span class="c1"># e.g. &#39;;&#39;, &#39;.&#39;, etc</span> <span class="n">lineno</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">lineno</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">,</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">token</span> </pre></div> <p>Arguably it&#8217;s less readable than a bunch of <em>if</em> blocks, but it&#8217;s pretty straightforward once you understand what&#8217;s going on here. Python enums allow us to access enum members by values and that&#8217;s what we use in the code above. It works like&nbsp;this:</p> <ul> <li>First we try to get a <em>TokenType</em> member by the value of <em>self.current_char</em></li> <li>If the operation throws a <em>ValueError</em> exception, that means we don&#8217;t support that token&nbsp;type</li> <li>Otherwise we create a correct token with the corresponding token type and&nbsp;value.</li> </ul> <p>This block of code will handle all current and new single character tokens. All we need to do to support a new token type is to add the new token type to the <em>TokenType</em> definition and that&#8217;s it. The code above will stay&nbsp;unchanged.</p> <p>The way I see it, it&#8217;s a win-win situation with this generic code: we learned a bit more about Python enums, specifically how to access enumeration members by values; we wrote some generic code to handle all single character tokens, and, as a side effect, we reduced the amount of repetitive code to handle those single character&nbsp;tokens.</p> <p>The next stop is parser&nbsp;changes.</p> <p><br/> Here is a list of changes we&#8217;ll make in our parser&nbsp;today:</p> <ol> <li>We will update the parser&#8217;s <em>error</em> method to throw a <em>ParserError</em> exception with an error code and current&nbsp;token</li> <li>We will update the <em>eat</em> method to call the modified <em>error</em>&nbsp;method</li> <li>We will refactor the <em>declarations</em> method and move the code that parses a procedure declaration into a separate&nbsp;method.</li> </ol> <p>1. Let&#8217;s update the parser&#8217;s <em>error</em> method to throw a <em>ParserError</em> exception with some useful&nbsp;information</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">error_code</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="k">raise</span> <span class="n">ParserError</span><span class="p">(</span> <span class="n">error_code</span><span class="o">=</span><span class="n">error_code</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">message</span><span class="o">=</span><span class="n">f</span><span class="s1">&#39;{error_code.value} -&gt; {token}&#39;</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p><br/> 2. And now let&#8217;s modify the <em>eat</em> method to call the updated <em>error</em>&nbsp;method</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">(</span> <span class="n">error_code</span><span class="o">=</span><span class="n">ErrorCode</span><span class="o">.</span><span class="n">UNEXPECTED_TOKEN</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p><br/> 3. Next, let&#8217;s update the <em>declaration</em>&#8216;s documentation string and move the code that parses a procedure declaration into a separate method, <em>procedure_declaration</em>:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">declarations</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> declarations : (VAR (variable_declaration SEMI)+)? procedure_declaration*</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">declarations</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">VAR</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">VAR</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span><span class="p">:</span> <span class="n">var_decl</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable_declaration</span><span class="p">()</span> <span class="n">declarations</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">var_decl</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">SEMI</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">:</span> <span class="n">proc_decl</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">procedure_declaration</span><span class="p">()</span> <span class="n">declarations</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">proc_decl</span><span class="p">)</span> <span class="k">return</span> <span class="n">declarations</span> <span class="k">def</span> <span class="nf">procedure_declaration</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;procedure_declaration :</span> <span class="sd"> PROCEDURE ID (LPAREN formal_parameter_list RPAREN)? SEMI block SEMI</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">PROCEDURE</span><span class="p">)</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">value</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">ID</span><span class="p">)</span> <span class="n">params</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">TokenType</span><span class="o">.</span><span class="n">LPAREN</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">params</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">formal_parameter_list</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">RPAREN</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">SEMI</span><span class="p">)</span> <span class="n">block_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">block</span><span class="p">()</span> <span class="n">proc_decl</span> <span class="o">=</span> <span class="n">ProcedureDecl</span><span class="p">(</span><span class="n">proc_name</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">block_node</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">TokenType</span><span class="o">.</span><span class="n">SEMI</span><span class="p">)</span> <span class="k">return</span> <span class="n">proc_decl</span> </pre></div> <p>These are all the changes in the parser. Now, we&#8217;ll move onto the semantic&nbsp;analyzer.</p> <p><br/> And finally here is a list of changes we&#8217;ll make in our semantic&nbsp;analyzer:</p> <ol> <li>We will add a new <em>error</em> method to the <em>SemanticAnalyzer</em> class to throw a <em>SemanticError</em> exception with some additional&nbsp;information</li> <li>We will update <em>visit_VarDecl</em> to signal an error by calling the <em>error</em> method with a relevant error code and&nbsp;token</li> <li>We will also update <em>visit_Var</em> to signal an error by calling the <em>error</em> method with a relevant error code and&nbsp;token</li> <li>We will add a <em>log</em> method to both the <em>ScopedSymbolTable</em> and <em>SemanticAnalyzer</em>, and replace all <em>print</em> statements with calls to <em>self.log</em> in the corresponding&nbsp;classes</li> <li>We will add a command line option &#8220;&#8212;-scope&#8221; to turn scope logging on and off (it will be off by default) to control how &#8220;noisy&#8221; we want our interpreter to&nbsp;be</li> <li>We will add empty <em>visit_Num</em> and <em>visit_UnaryOp</em>&nbsp;methods</li> </ol> <p><br/> 1. First things first. Let&#8217;s add the <em>error</em> method to throw a <em>SemanticError</em> exception with a corresponding error code, token and&nbsp;message:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">error_code</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="k">raise</span> <span class="n">SemanticError</span><span class="p">(</span> <span class="n">error_code</span><span class="o">=</span><span class="n">error_code</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">message</span><span class="o">=</span><span class="n">f</span><span class="s1">&#39;{error_code.value} -&gt; {token}&#39;</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p><br/> 2. Next, let&#8217;s update <em>visit_VarDecl</em> to signal an error by calling the <em>error</em> method with a relevant error code and&nbsp;token</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="c1"># Signal an error if the table already has a symbol</span> <span class="c1"># with the same name</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">current_scope_only</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">(</span> <span class="n">error_code</span><span class="o">=</span><span class="n">ErrorCode</span><span class="o">.</span><span class="n">DUPLICATE_ID</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">token</span><span class="p">,</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p><br/> 3. We also need to update the <em>visit_Var</em> method to signal an error by calling the <em>error</em> method with a relevant error code and&nbsp;token</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">if</span> <span class="n">var_symbol</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="n">error_code</span><span class="o">=</span><span class="n">ErrorCode</span><span class="o">.</span><span class="n">ID_NOT_FOUND</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">node</span><span class="o">.</span><span class="n">token</span><span class="p">)</span> </pre></div> <p>Now semantic errors will be reported as&nbsp;follows:</p> <div class="highlight"><pre><span></span>SemanticError: Duplicate id found -&gt; Token(TokenType.ID, &#39;a&#39;, position=21:4) </pre></div> <p>Or</p> <div class="highlight"><pre><span></span>SemanticError: Identifier not found -&gt; Token(TokenType.ID, &#39;b&#39;, position=22:9) </pre></div> <p><br/> 4. Let&#8217;s add the <em>log</em> method to both the <em>ScopedSymbolTable</em> and <em>SemanticAnalyzer</em>, and replace all <em>print</em> statements with calls to <em>self.log</em>:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">):</span> <span class="k">if</span> <span class="n">_SHOULD_LOG_SCOPE</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span> </pre></div> <p>As you can see, the message will be printed only if the global variable _SHOULD_LOG_SCOPE is set to true. The <em>&#8212;scope</em> command line option that we will add in the next step will control the value of the _SHOULD_LOG_SCOPE&nbsp;variable.</p> <p><br/> 5. Now, let&#8217;s update the <em>main</em> function and add a command line option &#8220;&#8212;scope&#8221; to turn scope logging on and off (it&#8217;s off by&nbsp;default)</p> <div class="highlight"><pre><span></span><span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">(</span> <span class="n">description</span><span class="o">=</span><span class="s1">&#39;SPI - Simple Pascal Interpreter&#39;</span> <span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;inputfile&#39;</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Pascal source file&#39;</span><span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> <span class="s1">&#39;--scope&#39;</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Print scope information&#39;</span><span class="p">,</span> <span class="n">action</span><span class="o">=</span><span class="s1">&#39;store_true&#39;</span><span class="p">,</span> <span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span> <span class="k">global</span> <span class="n">_SHOULD_LOG_SCOPE</span> <span class="n">_SHOULD_LOG_SCOPE</span> <span class="o">=</span> <span class="n">args</span><span class="o">.</span><span class="n">scope</span> </pre></div> <p>Here is an example with the switch&nbsp;on:</p> <div class="highlight"><pre><span></span>$ python spi.py idnotfound.pas --scope ENTER scope: global Insert: INTEGER Insert: REAL Lookup: INTEGER. (Scope name: global) Lookup: a. (Scope name: global) Insert: a Lookup: b. (Scope name: global) SemanticError: Identifier not found -&gt; Token(TokenType.ID, &#39;b&#39;, position=6:9) </pre></div> <p>And with scope logging off&nbsp;(default):</p> <div class="highlight"><pre><span></span>$ python spi.py idnotfound.pas SemanticError: Identifier not found -&gt; Token(TokenType.ID, &#39;b&#39;, position=6:9) </pre></div> <p><br/> 6. Add empty <em>visit_Num</em> and <em>visit_UnaryOp</em>&nbsp;methods</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Num</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> <span class="k">def</span> <span class="nf">visit_UnaryOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p>These are all the changes to our semantic analyzer for&nbsp;now.</p> <p>See <a href="https://github.com/rspivak/lsbasi/tree/master/part15/">GitHub</a> for Pascal files with different errors to try your updated interpreter on and see what error messages the interpreter&nbsp;generates.</p> <p><br/> That is all for today. You can find the full source code for today&#8217;s article interpreter on <a href="https://github.com/rspivak/lsbasi/tree/master/part15/">GitHub</a>. In the next article we&#8217;ll talk about how to recognize (i.e. how to parse) procedure calls. Stay tuned and see you next&nbsp;time!</p> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 14: Nested Scopes and a Source-to-Source Compiler.2017-05-08T05:45:00-04:002017-05-08T05:45:00-04:00Ruslan Spivaktag:ruslanspivak.com,2017-05-08:/lsbasi-part14/<blockquote> <p><em>Only dead fish go with the&nbsp;flow.</em></p> </blockquote> <p>As I promised in <a href="/lsbasi-part13">the last article</a>, today we&#8217;re finally going to do a deep dive into the topic of&nbsp;scopes.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img01.png"></p> <p>This is what we&#8217;re going to learn&nbsp;today:</p> <ul> <li>We&#8217;re going to learn about <em>scopes</em>, why they are useful, and …</li></ul><blockquote> <p><em>Only dead fish go with the&nbsp;flow.</em></p> </blockquote> <p>As I promised in <a href="/lsbasi-part13">the last article</a>, today we&#8217;re finally going to do a deep dive into the topic of&nbsp;scopes.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img01.png"></p> <p>This is what we&#8217;re going to learn&nbsp;today:</p> <ul> <li>We&#8217;re going to learn about <em>scopes</em>, why they are useful, and how to implement them in code with symbol&nbsp;tables.</li> <li>We&#8217;re going to learn about <em>nested scopes</em> and how <em>chained scoped symbol tables</em> are used to implement nested&nbsp;scopes.</li> <li>We&#8217;re going to learn how to parse procedure declarations with formal parameters and how to represent a procedure symbol in&nbsp;code.</li> <li>We&#8217;re going to learn how to extend our <em>semantic analyzer</em> to do semantic checks in the presence of nested&nbsp;scopes.</li> <li>We&#8217;re going to learn more about <em>name resolution</em> and how the semantic analyzer resolves names to their declarations when a program has nested&nbsp;scopes.</li> <li>We&#8217;re going to learn how to build a <em>scope tree</em>.</li> <li>We&#8217;re also going to learn how to write our very own <em><strong>source-to-source compiler</strong></em> today! We will see later in the article how relevant it is to our discussion of&nbsp;scopes.</li> </ul> <p>Let&#8217;s get started! Or should I say, let&#8217;s dive&nbsp;in!</p> <p><br/></p> <blockquote> <div class="toc"><span class="toctitle">Table of Contents</span><ul> <li><a href="#scopes-and-scoped-symbol-tables">Scopes and scoped symbol&nbsp;tables</a></li> <li><a href="#procedure-declarations-with-formal-parameters">Procedure declarations with formal&nbsp;parameters</a></li> <li><a href="#procedure-symbols">Procedure&nbsp;symbols</a></li> <li><a href="#nested-scopes">Nested&nbsp;scopes</a></li> <li><a href="#scope-tree-chaining-scoped-symbol-tables">Scope tree: Chaining scoped symbol&nbsp;tables</a></li> <li><a href="#nested-scopes-and-name-resolution">Nested scopes and name&nbsp;resolution</a></li> <li><a href="#source-to-source-compiler">Source-to-source&nbsp;compiler</a></li> <li><a href="#summary">Summary</a></li> <li><a href="#exercises">Exercises</a></li> </ul> </div> </blockquote> <p><br/></p> <h3 id="scopes-and-scoped-symbol-tables">Scopes and scoped symbol&nbsp;tables</h3> <p>What is a <em>scope</em>? A <em><strong>scope</strong></em> is a textual region of a program where a name can be used. Let&#8217;s take a look at the following sample program, for&nbsp;example:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p><br/> In Pascal, the <em><span class="caps">PROGRAM</span></em> keyword (case insensitive, by the way) introduces a new scope which is commonly called a <em>global scope</em>, so the program above has one <em>global scope</em> and the declared variables <strong>x</strong> and <strong>y</strong> are visible and accessible in the whole program. In the case above, the textual region starts with the keyword <em>program</em> and ends with the keyword <em>end</em> and a dot. In that textual region both names <strong>x</strong> and <strong>y</strong> can be used, so the scope of those variables (variable declarations) is the whole&nbsp;program:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img02.png"></p> <p>When you look at the source code above and specifically at the expression <strong>x := x + y</strong>, you intuitively know that it should compile (or get interpreted) without a problem, because the scope of the variables <strong>x</strong> and <strong>y</strong> in the expression is the <em>global scope</em> and the variable references <strong>x</strong> and <strong>y</strong> in the expression <strong>x := x + y</strong> resolve to the declared integer variables <strong>x</strong> and <strong>y</strong>. If you&#8217;ve programmed before in any mainstream programming language, there shouldn&#8217;t be any surprises&nbsp;here.</p> <p>When we talk about the scope of a variable, we actually talk about the scope of its&nbsp;declaration:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img03.png"></p> <p>In the picture above, the vertical lines show the scope of the declared variables, the textual region where the declared names <strong>x</strong> and <strong>y</strong> can be used, that is, the text area where they are visible. And as you can see, the scope of <strong>x</strong> and <strong>y</strong> is the whole program, as shown by the vertical&nbsp;lines.</p> <p>Pascal programs are said to be <em><strong>lexically scoped</strong></em> (or <em><strong>statically scoped</strong></em>) because you can look at the source code, and without even executing the program, determine purely based on the textual rules which names (references) resolve or refer to which declarations. In Pascal, for example, lexical keywords like <em>program</em> and <em>end</em> demarcate the textual boundaries of a&nbsp;scope:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img04.png"></p> <p>Why are scopes&nbsp;useful?</p> <ul> <li>Every scope creates an isolated name space, which means that variables declared in a scope cannot be accessed from outside of&nbsp;it.</li> <li>You can re-use the same name in different scopes and know exactly, just by looking at the program source code, what declaration the name refers to at every point in the&nbsp;program.</li> <li>In a nested scope you can re-declare a variable with the same name as in the outer scope, thus effectively hiding the outer declaration, which gives you control over access to different variables from the outer&nbsp;scope.</li> </ul> <p>In addition to the <em>global scope</em>, Pascal supports nested procedures, and every procedure declaration introduces a new scope, which means that Pascal supports nested&nbsp;scopes.</p> <p>When we talk about nested scopes, it&#8217;s convenient to talk about scope levels to show their nesting relationships. It&#8217;s also convenient to refer to scopes by name. We&#8217;ll use both scope levels and scope names when we start our discussion of nested&nbsp;scopes.</p> <p><br/> Let&#8217;s take a look at the following sample program and subscript every name in the program to make it&nbsp;clear:</p> <ol> <li>At what level each variable (symbol) is&nbsp;declared</li> <li>To which declaration and at what level a variable name refers&nbsp;to:</li> </ol> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img05.png"></p> <p>From the picture above we can see several&nbsp;things:</p> <ul> <li>We have a single scope, the <em>global scope</em>, introduced by the <span class="caps">PROGRAM</span>&nbsp;keyword</li> <li><em>Global scope</em> is at level&nbsp;1</li> <li>Variables (symbols) <strong>x</strong> and <strong>y</strong> are declared at level 1 (the <em>global scope</em>).</li> <li><em>integer</em> built-in type is also declared at level&nbsp;1</li> <li>The program name <strong>Main</strong> has a subscript 0. Why is the program&#8217;s name at level zero, you might wonder? This is to make it clear that the program&#8217;s name is not in the <em>global scope</em> and it&#8217;s in some other outer scope, that has level&nbsp;zero.</li> <li>The scope of the variables <strong>x</strong> and <strong>y</strong> is the whole program, as shown by the vertical&nbsp;lines</li> <li>The <em>scope information table</em> shows for every level in the program the corresponding scope level, scope name, and names declared in the scope. The purpose of the table is to summarize and visually show different information about scopes in a&nbsp;program.</li> </ul> <p>How do we implement the concept of a scope in code? To represent a scope in code, we&#8217;ll need a <em>scoped symbol table</em>. We already know about symbol tables, but what is a <em>scoped symbol table</em>? A <em><strong>scoped symbol table</strong></em> is basically a symbol table with a few modifications, as you&#8217;ll see&nbsp;shortly.</p> <p>From now on, we&#8217;ll use the word <em>scope</em> both to mean the concept of a scope as well as to refer to the scoped symbol table, which is an implementation of the scope in&nbsp;code.</p> <p>Even though in our code a scope is represented by an instance of the <em>ScopedSymbolTable</em> class, we&#8217;ll use the variable named <em>scope</em> throughout the code for convenience. So when you see a variable <em>scope</em> in the code of our interpreter, you should know that it actually refers to a <em>scoped symbol table</em>.</p> <p>Okay, let&#8217;s enhance our <em>SymbolTable</em> class by renaming it to <em>ScopedSymbolTable</em> class, adding two new fields <em>scope_level</em> and <em>scope_name</em>, and updating the scoped symbol table&#8217;s constructor. And at the same time, let&#8217;s update the <em>__str__</em> method to print additional information, namely the <em>scope_level</em> and <em>scope_name</em>. Here is a new version of the symbol table, the <em>ScopedSymbolTable</em>:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ScopedSymbolTable</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">scope_name</span><span class="p">,</span> <span class="n">scope_level</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_name</span> <span class="o">=</span> <span class="n">scope_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">=</span> <span class="n">scope_level</span> <span class="bp">self</span><span class="o">.</span><span class="n">_init_builtins</span><span class="p">()</span> <span class="k">def</span> <span class="nf">_init_builtins</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;REAL&#39;</span><span class="p">))</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">h1</span> <span class="o">=</span> <span class="s1">&#39;SCOPE (SCOPED SYMBOL TABLE)&#39;</span> <span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">h1</span><span class="p">,</span> <span class="s1">&#39;=&#39;</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">h1</span><span class="p">)]</span> <span class="k">for</span> <span class="n">header_name</span><span class="p">,</span> <span class="n">header_value</span> <span class="ow">in</span> <span class="p">(</span> <span class="p">(</span><span class="s1">&#39;Scope name&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_name</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;Scope level&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span><span class="p">),</span> <span class="p">):</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">%-15s</span><span class="s1">: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">header_name</span><span class="p">,</span> <span class="n">header_value</span><span class="p">))</span> <span class="n">h2</span> <span class="o">=</span> <span class="s1">&#39;Scope (Scoped symbol table) contents&#39;</span> <span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">h2</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">h2</span><span class="p">)])</span> <span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span> <span class="p">(</span><span class="s1">&#39;</span><span class="si">%7s</span><span class="s1">: </span><span class="si">%r</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="p">)</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> <span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Insert: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> <span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or None</span> <span class="k">return</span> <span class="n">symbol</span> </pre></div> <p><br/> Let&#8217;s also update the semantic analyzer&#8217;s code to use the variable <em>scope</em> instead of <em>symtab</em>, and remove the semantic check that was checking source programs for duplicate identifiers from the <em>visit_VarDecl</em> method to reduce the noise in the program&nbsp;output.</p> <p>Here is a piece of code that shows how our semantic analyzer instantiates the <em>ScopedSymbolTable</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SemanticAnalyzer</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope</span> <span class="o">=</span> <span class="n">ScopedSymbolTable</span><span class="p">(</span><span class="n">scope_name</span><span class="o">=</span><span class="s1">&#39;global&#39;</span><span class="p">,</span> <span class="n">scope_level</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">...</span> </pre></div> <p>You can find all the changes in the file <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope01.py">scope01.py</a>. Download the file, run it on the command line, and inspect the output. Here is what I&nbsp;got:</p> <div class="highlight"><pre><span></span>$ python scope01.py Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: x Lookup: INTEGER Insert: y Lookup: x Lookup: y Lookup: x SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p>Most of the output should look very familiar to&nbsp;you.</p> <p>Now that you know about the concept of scope and how to implement the scope in code by using a scoped symbol table, it&#8217;s time we talked about nested scopes and more dramatic modifications to the scoped symbol table than just adding two simple&nbsp;fields.</p> <p><br/></p> <h3 id="procedure-declarations-with-formal-parameters">Procedure declarations with formal&nbsp;parameters</h3> <p>Let&#8217;s take a look at a sample program in the file <a href="https://github.com/rspivak/lsbasi/blob/master/part14/nestedscopes02.pas">nestedscopes02.pas</a> that contains a procedure&nbsp;declaration:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>The first thing that we notice here is that we have a procedure with a parameter, and we haven&#8217;t learned how to handle that yet. Let&#8217;s fill that gap by making a quick detour and learning how to handle formal procedure parameters before continuing with&nbsp;scopes.*</p> <blockquote> <p>*<span class="caps">ASIDE</span>: <em>Formal parameters</em> are parameters that show up in the declaration of a procedure. <em>Arguments</em> (also called <em>actual parameters</em>) are different variables and expressions passed to a procedure in a particular procedure&nbsp;call.</p> </blockquote> <p>Here is a list of changes we need to make to support procedure declarations with&nbsp;parameters:</p> <ol> <li> <p>Add the <em>Param</em> <span class="caps">AST</span>&nbsp;node</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Param</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">var_node</span><span class="p">,</span> <span class="n">type_node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">var_node</span> <span class="o">=</span> <span class="n">var_node</span> <span class="bp">self</span><span class="o">.</span><span class="n">type_node</span> <span class="o">=</span> <span class="n">type_node</span> </pre></div> </li> <li> <p>Update the <em>ProcedureDecl</em> node&#8217;s constructor to take an additional argument: <em>params</em></p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureDecl</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proc_name</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">block_node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc_name</span> <span class="o">=</span> <span class="n">proc_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="n">params</span> <span class="c1"># a list of Param nodes</span> <span class="bp">self</span><span class="o">.</span><span class="n">block_node</span> <span class="o">=</span> <span class="n">block_node</span> </pre></div> </li> <li> <p>Update the <em>declarations</em> rule to reflect changes in the procedure declaration&nbsp;sub-rule</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">declarations</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;declarations : (VAR (variable_declaration SEMI)+)*</span> <span class="sd"> | (PROCEDURE ID (LPAREN formal_parameter_list RPAREN)? SEMI block SEMI)*</span> <span class="sd"> | empty</span> <span class="sd"> &quot;&quot;&quot;</span> </pre></div> </li> <li> <p>Add the <em>formal_parameter_list</em> rule and&nbsp;method</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">formal_parameter_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot; formal_parameter_list : formal_parameters</span> <span class="sd"> | formal_parameters SEMI formal_parameter_list</span> <span class="sd"> &quot;&quot;&quot;</span> </pre></div> </li> <li> <p>Add the <em>formal_parameters</em> rule and&nbsp;method</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">formal_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot; formal_parameters : ID (COMMA ID)* COLON type_spec &quot;&quot;&quot;</span> <span class="n">param_nodes</span> <span class="o">=</span> <span class="p">[]</span> </pre></div> </li> </ol> <p>With the addition of the above methods and rules our parser will be able to parse procedure declarations like these (I&#8217;m not showing the body of declared procedures for&nbsp;brevity):</p> <div class="highlight"><pre><span></span><span class="k">procedure</span> <span class="nf">Foo</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Foo</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Foo</span><span class="p">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Foo</span><span class="p">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">c</span> <span class="o">:</span> <span class="kt">REAL</span><span class="p">)</span><span class="o">;</span> </pre></div> <p>Let&#8217;s generate an <span class="caps">AST</span> for our sample program. Download <a href="https://github.com/rspivak/lsbasi/blob/master/part14/genastdot.py">genastdot.py</a> and run the following command on the command&nbsp;line:</p> <div class="highlight"><pre><span></span>$ python genastdot.py nestedscopes02.pas &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p>Here is a picture of the generated <span class="caps">AST</span>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img06.png"></p> <p>You can see now that the <em>ProcedureDecl</em> node in the picture has the <em>Param</em> node as its&nbsp;child.</p> <p>You can find the complete changes in the <a href="https://github.com/rspivak/lsbasi/blob/master/part14/spi.py">spi.py</a> file. Spend some time and study the changes. You&#8217;ve done similar changes before; they should be pretty easy to understand and you should be able to implement them by&nbsp;yourself.</p> <h3 id="procedure-symbols">Procedure&nbsp;symbols</h3> <p>While we&#8217;re on the topic of procedure declarations, let&#8217;s also talk about procedure&nbsp;symbols.</p> <p>As with variable declarations, and built-in type declarations, there is a separate category of symbols for procedures. Let&#8217;s create a separate symbol class for procedure&nbsp;symbols:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="nb">super</span><span class="p">(</span><span class="n">ProcedureSymbol</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># a list of formal parameters</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="n">params</span> <span class="k">if</span> <span class="n">params</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="k">else</span> <span class="p">[]</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="s1">&#39;&lt;{class_name}(name={name}, parameters={params})&gt;&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">class_name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="p">)</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> </pre></div> <p>Procedure symbols have a name (it&#8217;s a procedure&#8217;s name), their category is procedure (it&#8217;s encoded in the class name), and the type is <em>None</em> because in Pascal procedures don&#8217;t return&nbsp;anything.</p> <p>Procedure symbols also carry additional information about procedure declarations, namely they contain information about the procedure&#8217;s formal parameters as you can see in the code&nbsp;above.</p> <p>With the addition of procedure symbols, our new symbol hierarchy looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img07.png"></p> <p><br/></p> <h3 id="nested-scopes">Nested&nbsp;scopes</h3> <p>After that quick detour let&#8217;s get back to our program and the discussion of nested&nbsp;scopes:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Things are actually getting more interesting here. By declaring a new procedure, we introduce a new scope, and this scope is nested within the <em>global scope</em> introduced by the <em><span class="caps">PROGRAM</span></em> statement, so this is a case where we have nested scopes in a Pascal&nbsp;program.</p> <p>The scope of a procedure is the whole body of the procedure. The beginning of the procedure scope is marked by the <em><span class="caps">PROCEDURE</span></em> keyword and the end is marked by the <em><span class="caps">END</span></em> keyword and a&nbsp;semicolon.</p> <p>Let&#8217;s subscript names in the program and show some additional&nbsp;information:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img08.png"> <img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img09.png"></p> <p>Some observations from the picture&nbsp;above:</p> <ul> <li>This Pascal program has two scope levels: level 1 and level&nbsp;2</li> <li>The <em>nesting relationships</em> diagram visually shows that the scope <em>Alpha</em> is nested within the <em>global scope</em>, hence there are two levels: the <em>global scope</em> at level 1, and the <em>Alpha</em> scope at level&nbsp;2.</li> <li>The scope level of the procedure declaration <em>Alpha</em> is one less than the level of the variables declared inside the procedure <em>Alpha</em>. You can see that the scope level of the procedure declaration <em>Alpha</em> is 1 and the scope level of the variables <strong>a</strong> and <strong>y</strong> inside the procedure is&nbsp;2.</li> <li>The variable declaration of <strong>y</strong> inside <em>Alpha</em> hides the declaration of <strong>y</strong> in the <em>global scope</em>. You can see the hole in the vertical bar for <strong>y1</strong> (by the way, 1 is a subscript, it&#8217;s not part of the variable name, the variable name is just <strong>y</strong>) and you can see that the scope of the <strong>y2</strong> variable declaration is the <em>Alpha</em> procedure&#8217;s whole&nbsp;body.</li> <li>The scope information table, as you are already aware, shows scope levels, scope names for those levels, and respective names declared in those scopes (at those&nbsp;levels).</li> <li>In the picture, you can also see that I omitted showing the scope of the <em>integer</em> and <em>real</em> types (except in the scope information table) because they are always declared at scope level 1, the <em>global scope</em>, so I won&#8217;t be subscripting the <em>integer</em> and <em>real</em> types anymore to save visual space, but you will see the types again and again in the contents of the scoped symbol table representing the <em>global scope</em>.</li> </ul> <p>The next step is to discuss implementation&nbsp;details.</p> <p>First, let&#8217;s focus on variable and procedure declarations. Then, we&#8217;ll discuss variable references and how <em>name resolution</em> works in the presence of nested&nbsp;scopes.</p> <p>For our discussion, we&#8217;ll use a stripped down version of the program. The following version does not have variable references: it only has variable and procedure&nbsp;declarations:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>You already know how to represent a scope in code with a scoped symbol table. Now we have two scopes: the <em>global scope</em> and the scope introduced by the procedure <em>Alpha</em>. Following our approach we should now have two scoped symbol tables: one for the <em>global scope</em> and one for the <em>Alpha</em> scope. How do we implement that in code? We&#8217;ll extend the semantic analyzer to create a separate scoped symbol table for every scope instead of just for the <em>global scope</em>. The scope construction will happen, as usual, when walking the <span class="caps">AST</span>.</p> <p>First, we need to decide where in the semantic analyzer we&#8217;re going to create our scoped symbol tables. Recall that <em><span class="caps">PROGRAM</span></em> and <em><span class="caps">PROCEDURE</span></em> keywords introduce new scope. In <span class="caps">AST</span>, the corresponding nodes are <em>Program</em> and <em>ProcedureDecl</em>. So we&#8217;re going to update our <em>visit_Program</em> method and add the <em>visit_ProcedureDecl</em> method to create scoped symbol tables. Let&#8217;s start with the <em>visit_Program</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;ENTER scope: global&#39;</span><span class="p">)</span> <span class="n">global_scope</span> <span class="o">=</span> <span class="n">ScopedSymbolTable</span><span class="p">(</span> <span class="n">scope_name</span><span class="o">=</span><span class="s1">&#39;global&#39;</span><span class="p">,</span> <span class="n">scope_level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="n">global_scope</span> <span class="c1"># visit subtree</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">global_scope</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;LEAVE scope: global&#39;</span><span class="p">)</span> </pre></div> <p>The method has quite a few&nbsp;changes:</p> <ol> <li>When visiting the node in <span class="caps">AST</span>, we first print what scope we&#8217;re entering, in this case <em>global</em>.</li> <li>We create a separate <em>scoped symbol table</em> to represent the <em>global scope</em>. When we construct an instance of <em>ScopedSymbolTable</em>, we explicitly pass the scope name and scope level arguments to the class&nbsp;constructor.</li> <li>We assign the newly created scope to the instance variable <em>current_scope</em>. Other visitor methods that insert and look up symbols in scoped symbol tables will use the <em>current_scope</em>.</li> <li>We visit a subtree (block). This is the old&nbsp;part.</li> <li>Before leaving the <em>global scope</em> we print the contents of the <em>global scope</em> (scoped symbol&nbsp;table)</li> <li>We also print the message that we&#8217;re leaving the <em>global&nbsp;scope</em></li> </ol> <p>Now let&#8217;s add the <em>visit_ProcedureDecl</em> method. Here is the complete source code for&nbsp;it:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">ProcedureSymbol</span><span class="p">(</span><span class="n">proc_name</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">proc_symbol</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;ENTER scope: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">proc_name</span><span class="p">)</span> <span class="c1"># Scope for parameters and local variables</span> <span class="n">procedure_scope</span> <span class="o">=</span> <span class="n">ScopedSymbolTable</span><span class="p">(</span> <span class="n">scope_name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="n">scope_level</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="n">procedure_scope</span> <span class="c1"># Insert parameters into the procedure scope</span> <span class="k">for</span> <span class="n">param</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">params</span><span class="p">:</span> <span class="n">param_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">param</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">param_name</span> <span class="o">=</span> <span class="n">param</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">param_name</span><span class="p">,</span> <span class="n">param_type</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">params</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block_node</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">procedure_scope</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;LEAVE scope: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">proc_name</span><span class="p">)</span> </pre></div> <p>Let&#8217;s go over the contents of the&nbsp;method:</p> <ol> <li>The first thing that the method does is create a procedure symbol and insert it into the current scope, which is the <em>global scope</em> for our sample&nbsp;program.</li> <li>Then the method prints the message about entering the procedure&nbsp;scope.</li> <li>Then we create a new scope for the procedure&#8217;s parameters and variable&nbsp;declarations.</li> <li>We assign the procedure scope to the <em>self.current_scope</em> variable indicating that this is our current scope and all symbol operations (<em>insert</em> and <em>lookup</em>) will use the current&nbsp;scope.</li> <li>Then we handle procedure formal parameters by inserting them into the current scope and adding them to the procedure&nbsp;symbol.</li> <li>Then we visit the rest of the <span class="caps">AST</span> subtree - the body of the&nbsp;procedure.</li> <li>And, finally, we print the message about leaving the scope before leaving the node and moving to another <span class="caps">AST</span> node, if&nbsp;any.</li> </ol> <p>Now, what we need to do is update other semantic analyzer visitor methods to use <em>self.current_scope</em> when inserting and looking up symbols. Let&#8217;s do&nbsp;that:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">if</span> <span class="n">var_symbol</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span> <span class="s2">&quot;Error: Symbol(identifier) not found &#39;</span><span class="si">%s</span><span class="s2">&#39;&quot;</span> <span class="o">%</span> <span class="n">var_name</span> <span class="p">)</span> </pre></div> <p>Both the <em>visit_VarDecl</em> and <em>visit_Var</em> will now use the <em>current_scope</em> to insert and/or look up symbols. Specifically, for our sample program, the <em>current_scope</em> can point either to the <em>global scope</em> or the <em>Alpha</em>&nbsp;scope.</p> <p>We also need to update the semantic analyzer and set the <em>current_scope</em> to <em>None</em> in the&nbsp;constructor:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SemanticAnalyzer</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="bp">None</span> </pre></div> <p>Clone the <a href="https://github.com/rspivak/lsbasi">GitHub repository for the article</a>, run <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope02.py">scope02.py</a> (it has all the changes we just discussed), inspect the output, and make sure you understand why every line is&nbsp;generated:</p> <div class="highlight"><pre><span></span>$ python scope02.py ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL Insert: x Lookup: REAL Insert: y Insert: Alpha ENTER scope: Alpha Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: a Lookup: INTEGER Insert: y SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : Alpha Scope level : <span class="m">2</span> Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: Alpha SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; Alpha: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>Alpha, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; LEAVE scope: global </pre></div> <p>Some things about the output above that I think are worth&nbsp;mentioning:</p> <ol> <li>You can see that the two lines <em>Insert: <span class="caps">INTEGER</span></em> and <em>Insert: <span class="caps">REAL</span></em> are repeated twice in the output and the keys <span class="caps">INTEGER</span> and <span class="caps">REAL</span> are present in both scopes (scoped symbol tables): <em>global</em> and <em>Alpha</em>. The reason is that we create a separate scoped symbol table for every scope and the table initializes the built-in type symbols every time we create its instance. We&#8217;ll change it later when we discuss nesting relationships and how they are expressed in&nbsp;code.</li> <li>See how the line <em>Insert: Alpha</em> is printed before the line <em><span class="caps">ENTER</span> scope: Alpha</em>. This is just a reminder that a name of a procedure is declared at a level that is one less than the level of the variables declared in the procedure&nbsp;itself.</li> <li>You can see by inspecting the printed contents of the scoped symbol tables above what declarations they contain. See, for example, that <em>global scope</em> has the <em>Alpha</em> symbol in&nbsp;it.</li> <li>From the contents of the <em>global scope</em> you can also see that the procedure symbol for the <em>Alpha</em> procedure also contains the procedure&#8217;s formal&nbsp;parameters.</li> </ol> <p>After we run the program, our scopes in memory would look something like this, just two separate scoped symbol&nbsp;tables:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img10.png"></p> <p><br/></p> <h3 id="scope-tree-chaining-scoped-symbol-tables">Scope tree: Chaining scoped symbol&nbsp;tables</h3> <p>Okay, now every scope is represented by a separate scoped symbol table, but how do we represent the nesting relationship between the <em>global scope</em> and the scope <em>Alpha</em> as we showed in the nesting relationship diagram before? In other words, how do we express in code that the scope <em>Alpha</em> is nested within the <em>global scope</em>? The answer is chaining the tables&nbsp;together.</p> <p>We&#8217;ll chain the scoped symbol tables together by creating a link between them. In a way it&#8217;ll be like a tree (we&#8217;ll call it a <em>scope tree</em>), just an unusual one, because in this tree a child will be pointing to a parent, and not the other way around. Let&#8217;s take a look the following <em>scope tree</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img11.png"></p> <p>In the <em>scope tree</em> above you can see that the scope <em>Alpha</em> is linked to the <em>global scope</em> by pointing to it. To put it differently, the scope <em>Alpha</em> is pointing to its <em>enclosing scope</em>, which is the <em>global scope</em>. It all means that the scope <em>Alpha</em> is nested within the <em>global scope</em>.</p> <p>How do we implement scope chaining/linking? There are two&nbsp;steps:</p> <ol> <li>We need to update the <em>ScopedSymbolTable</em> class and add a variable <em>enclosing_scope</em> that will hold a pointer to the scope&#8217;s enclosing scope. This will be the link between scopes in the picture&nbsp;above.</li> <li>We need to update the <em>visit_Program</em> and <em>visit_ProcedureDecl</em> methods to create an actual link to the scope&#8217;s enclosing scope using the updated version of the <em>ScopedSymbolTable</em>&nbsp;class.</li> </ol> <p>Let&#8217;s start with updating the <em>ScopedSymbolTable</em> class and adding the <em>enclosing_scope</em> field. Let&#8217;s also update the <em>__init__</em> and <em>__str__</em> methods. The <em>__init__</em> method will be modified to accept a new parameter, <em>enclosing_scope</em>, with the default value set to <em>None</em>. The <em>__str__</em> method will be updated to output the name of the enclosing scope. Here is the complete source code of the updated <em>ScopedSymbolTable</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ScopedSymbolTable</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">scope_name</span><span class="p">,</span> <span class="n">scope_level</span><span class="p">,</span> <span class="n">enclosing_scope</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_name</span> <span class="o">=</span> <span class="n">scope_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">=</span> <span class="n">scope_level</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span> <span class="o">=</span> <span class="n">enclosing_scope</span> <span class="bp">self</span><span class="o">.</span><span class="n">_init_builtins</span><span class="p">()</span> <span class="k">def</span> <span class="nf">_init_builtins</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;REAL&#39;</span><span class="p">))</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">h1</span> <span class="o">=</span> <span class="s1">&#39;SCOPE (SCOPED SYMBOL TABLE)&#39;</span> <span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">h1</span><span class="p">,</span> <span class="s1">&#39;=&#39;</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">h1</span><span class="p">)]</span> <span class="k">for</span> <span class="n">header_name</span><span class="p">,</span> <span class="n">header_value</span> <span class="ow">in</span> <span class="p">(</span> <span class="p">(</span><span class="s1">&#39;Scope name&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_name</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;Scope level&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_level</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;Enclosing scope&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span><span class="o">.</span><span class="n">scope_name</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span> <span class="k">else</span> <span class="bp">None</span> <span class="p">)</span> <span class="p">):</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">&#39;</span><span class="si">%-15s</span><span class="s1">: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">header_name</span><span class="p">,</span> <span class="n">header_value</span><span class="p">))</span> <span class="n">h2</span> <span class="o">=</span> <span class="s1">&#39;Scope (Scoped symbol table) contents&#39;</span> <span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span><span class="n">h2</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">h2</span><span class="p">)])</span> <span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span> <span class="p">(</span><span class="s1">&#39;</span><span class="si">%7s</span><span class="s1">: </span><span class="si">%r</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="p">)</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> <span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Insert: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> <span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or None</span> <span class="k">return</span> <span class="n">symbol</span> </pre></div> <p>Now let&#8217;s switch our attention to the <em>visit_Program</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;ENTER scope: global&#39;</span><span class="p">)</span> <span class="n">global_scope</span> <span class="o">=</span> <span class="n">ScopedSymbolTable</span><span class="p">(</span> <span class="n">scope_name</span><span class="o">=</span><span class="s1">&#39;global&#39;</span><span class="p">,</span> <span class="n">scope_level</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">enclosing_scope</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="p">,</span> <span class="c1"># None</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="n">global_scope</span> <span class="c1"># visit subtree</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">global_scope</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">enclosing_scope</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;LEAVE scope: global&#39;</span><span class="p">)</span> </pre></div> <p>There are a couple of things here worth mentioning and&nbsp;repeating:</p> <ol> <li>We explicitly pass the <em>self.current_scope</em> as the <em>enclosing_scope</em> argument when creating a&nbsp;scope</li> <li>We assign the newly created global scope to the variable <em>self.current_scope</em></li> <li>We restore the variable <em>self.current_scope</em> to its previous value right before leaving the <em>Program</em> node. It&#8217;s important to restore the value of the <em>current_scope</em> after we&#8217;ve finished processing the node, otherwise the scope tree construction will be broken when we have more than two scopes in our program. We&#8217;ll see why&nbsp;shortly.</li> </ol> <p>And, finally, let&#8217;s update the <em>visit_ProcedureDecl</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">proc_name</span> <span class="n">proc_symbol</span> <span class="o">=</span> <span class="n">ProcedureSymbol</span><span class="p">(</span><span class="n">proc_name</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">proc_symbol</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;ENTER scope: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">proc_name</span><span class="p">)</span> <span class="c1"># Scope for parameters and local variables</span> <span class="n">procedure_scope</span> <span class="o">=</span> <span class="n">ScopedSymbolTable</span><span class="p">(</span> <span class="n">scope_name</span><span class="o">=</span><span class="n">proc_name</span><span class="p">,</span> <span class="n">scope_level</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">scope_level</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">enclosing_scope</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="n">procedure_scope</span> <span class="c1"># Insert parameters into the procedure scope</span> <span class="k">for</span> <span class="n">param</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">params</span><span class="p">:</span> <span class="n">param_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">param</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">param_name</span> <span class="o">=</span> <span class="n">param</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">param_name</span><span class="p">,</span> <span class="n">param_type</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> <span class="n">proc_symbol</span><span class="o">.</span><span class="n">params</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block_node</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">procedure_scope</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">enclosing_scope</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;LEAVE scope: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">proc_name</span><span class="p">)</span> </pre></div> <p>Again, the main changes compared to the version in <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope02.py">scope02.py</a>&nbsp;are:</p> <ol> <li>We explicitly pass the <em>self.current_scope</em> as an <em>enclosing_scope</em> argument when creating a&nbsp;scope.</li> <li>We no longer hard code the scope level of a procedure declaration because we can calculate the level automatically based on the scope level of the procedure&#8217;s enclosing scope: it&#8217;s the enclosing scope&#8217;s level plus&nbsp;one.</li> <li>We restore the value of the <em>self.current_scope</em> to its previous value (for our sample program the previous value would be the <em>global scope</em>) right before leaving the <em>ProcedureDecl</em>&nbsp;node.</li> </ol> <p>Okay, let&#8217;s see what the contents of the scoped symbol tables look like with the above changes. You can find all the changes in <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope03a.py">scope03a.py</a>. Our sample program&nbsp;is:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Run scope03a.py on the command line and inspect the&nbsp;output:</p> <div class="highlight"><pre><span></span>$ python scope03a.py ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL Insert: x Lookup: REAL Insert: y Insert: Alpha ENTER scope: Alpha Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: a Lookup: INTEGER Insert: y SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : Alpha Scope level : <span class="m">2</span> Enclosing scope: global Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: Alpha SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Enclosing scope: None Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; Alpha: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>Alpha, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; LEAVE scope: global </pre></div> <p>You can see in the output above that the <em>global scope</em> doesn&#8217;t have an enclosing scope and, the <em>Alpha</em>&#8216;s enclosing scope is the <em>global scope</em>, which is what we would expect, because the <em>Alpha</em> scope is nested within the <em>global scope</em>.</p> <p><br/> Now, as promised, let&#8217;s consider why it is important to set and restore the value of the <em>self.current_scope</em> variable. Let&#8217;s take a look at the following program, where we have two procedure declarations in the <em>global scope</em>:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">AlphaA</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ AlphaA }</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ AlphaA }</span> <span class="k">procedure</span> <span class="nf">AlphaB</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ AlphaB }</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ AlphaB }</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>The nesting relationship diagram for the sample program looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img12.png"></p> <p>An <span class="caps">AST</span> for the program (I left only the nodes that are relevant to this example) is something like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img13.png" width="440"></p> <p>If we don&#8217;t restore the current scope when we leave the <em>Program</em> and <em>ProcedureDecl</em> nodes what is going to happen? Let&#8217;s&nbsp;see.</p> <p>The way our semantic analyzer walks the tree is depth first, left-to-right, so it will traverse the <em>ProcedureDecl</em> node for <em>AlphaA</em> first and then it will visit the <em>ProcedureDecl</em> node for <em>AlphaB</em>. The problem here is that if we don&#8217;t restore the <em>self.current_scope</em> before leaving <em>AlphaA</em> the <em>self.current_scope</em> will be left pointing to <em>AlphaA</em> instead of the <em>global scope</em> and, as a result, the semantic analyzer will create the scope <em>AlphaB</em> at level 3, as if it was nested within the scope <em>AlphaA</em>, which is, of course,&nbsp;incorrect.</p> <p>To see the broken behavior when the current scope is not being restored before leaving <em>Program</em> and/or <em>ProcedureDecl</em> nodes, download and run the <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope03b.py">scope03b.py</a> on the command&nbsp;line:</p> <div class="highlight"><pre><span></span>$ python scope03b.py ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL Insert: x Lookup: REAL Insert: y Insert: AlphaA ENTER scope: AlphaA Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: a Lookup: INTEGER Insert: y SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : AlphaA Scope level : <span class="m">2</span> Enclosing scope: global Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: AlphaA Insert: AlphaB ENTER scope: AlphaB Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: a Lookup: INTEGER Insert: b SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : AlphaB Scope level : <span class="m">3</span> Enclosing scope: AlphaA Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; b: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;b&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: AlphaB SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Enclosing scope: None Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; AlphaA: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>AlphaA, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; LEAVE scope: global </pre></div> <p>As you can see, scope tree construction in our semantic analyzer is completely broken in the presence of more than two&nbsp;scopes:</p> <ol> <li>Instead of two scope levels as shown in the nesting relationships diagram, we have three&nbsp;levels</li> <li>The <em>global scope</em> contents doesn&#8217;t have <em>AlphaB</em> in it, only <em>AlphaA</em>.</li> </ol> <p><br/> To construct a scope tree correctly, we need to follow a really simple&nbsp;procedure:</p> <ol> <li>When we <span class="caps">ENTER</span> a <em>Program</em> or <em>ProcedureDecl</em> node, we create a new scope and assign it to the <em>self.current_scope</em>.</li> <li>When we are about to <span class="caps">LEAVE</span> the <em>Program</em> or <em>ProcedureDecl</em> node, we restore the value of the <em>self.current_scope</em>.</li> </ol> <p>You can think of the <em>self.current_scope</em> as a stack pointer and a <em>scope tree</em> as a collection of&nbsp;stacks:</p> <ol> <li>When you visit a <em>Program</em> or <em>ProcedureDecl</em> node, you push a new scope on the stack and adjust the stack pointer <em>self.current_scope</em> to point to the top of stack, which is now the most recently pushed&nbsp;scope.</li> <li>When you are about to leave the node, you pop the scope off the stack and you also adjust the stack pointer to point to the previous scope on the stack, which is now the new top of&nbsp;stack.</li> </ol> <p>To see the correct behavior in the presence of multiple scopes, download and run <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope03c.py">scope03c.py</a> on the command line. Study the output. Make sure you understand what is going&nbsp;on:</p> <div class="highlight"><pre><span></span>$ python scope03c.py ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL Insert: x Lookup: REAL Insert: y Insert: AlphaA ENTER scope: AlphaA Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: a Lookup: INTEGER Insert: y SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : AlphaA Scope level : <span class="m">2</span> Enclosing scope: global Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: AlphaA Insert: AlphaB ENTER scope: AlphaB Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: a Lookup: INTEGER Insert: b SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : AlphaB Scope level : <span class="m">2</span> Enclosing scope: global Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; b: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;b&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: AlphaB SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Enclosing scope: None Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; AlphaA: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>AlphaA, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; AlphaB: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>AlphaB, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; LEAVE scope: global </pre></div> <p>This is how our scoped symbol tables look like after we&#8217;ve run <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope03c.py">scope03c.py</a> and correctly constructed the <em>scope tree</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img14.png"></p> <p>Again, as I&#8217;ve mentioned above, you can think of the scope tree above as a collection of scope&nbsp;stacks.</p> <p>Now let&#8217;s continue and talk about how <em>name resolution</em> works when we have nested&nbsp;scopes.</p> <p><br/></p> <h3 id="nested-scopes-and-name-resolution">Nested scopes and name&nbsp;resolution</h3> <p>Our focus before was on variable and procedure declarations. Let&#8217;s add variable references to the&nbsp;mix.</p> <p>Here is a sample program with some variable references in&nbsp;it:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Or visually with some additional information: <img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img08.png"> <img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img09.png"></p> <p><br/> Let&#8217;s turn our attention to the assignment statement <strong>x := a + x + y;</strong> Here it is with&nbsp;subscripts:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img15.png"></p> <p>We see that <strong>x</strong> resolves to a declaration at level 1, <strong>a</strong> resolves to a declaration at level 2 and <strong>y</strong> also resolves to a declaration at level 2. How does that resolution work? Let&#8217;s see&nbsp;how.</p> <p><em>Lexically (statically) scoped</em> languages like Pascal follow <strong><em>the most closely nested scope</em></strong> rule when it comes to name resolution. It means that, in every scope, a name refers to its lexically closest declaration. For our assignment statement, let&#8217;s go over every variable reference and see how the rule works in&nbsp;practice:</p> <ol> <li> <p>Because our semantic analyzer visits the right-hand side of the assignment first, we&#8217;ll start with the variable reference <strong>a</strong> from the arithmetic expression <strong>a + x + y</strong>. We begin our search for <strong>a</strong>&#8216;s declaration in the lexically closest scope, which is the <em>Alpha</em> scope. The <em>Alpha</em> scope contains variable declarations in the <em>Alpha</em> procedure including the procedure&#8217;s formal parameters. We find the declaration of <strong>a</strong> in the <em>Alpha</em> scope: it&#8217;s the formal parameter <strong>a</strong> of the <em>Alpha</em> procedure - a variable symbol that has type <strong>integer</strong>. We usually do the search by scanning the source code with our eyes when resolving names (remember, <strong>a2</strong> is not the name of a variable, 2 is the subscript here, the variable name is <strong>a</strong>):</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img16.png"></p> </li> <li> <p>Now onto the variable reference <strong>x</strong> from the arithmetic expression <strong>a + x + y</strong>. Again, first we search for the declaration of <strong>x</strong> in the lexically closest scope. The lexically closest scope is the <em>Alpha</em> scope at level 2. The scope contains declarations in the <em>Alpha</em> procedure including the procedure&#8217;s formal parameters. We don&#8217;t find <strong>x</strong> at this scope level (in the <em>Alpha</em> scope), so we go up the chain to the <em>global scope</em> and continue our search there. Our search succeeds because the <em>global scope</em> has a variable symbol with the name <strong>x</strong> in&nbsp;it:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img17.png"></p> </li> <li> <p>Now, let&#8217;s look at the variable reference <strong>y</strong> from the arithmetic expression <strong>a + x + y</strong>. We find its declaration in the lexically closest scope, which is the <em>Alpha</em> scope. In the <em>Alpha</em> scope the variable <strong>y</strong> has type <strong>integer</strong> (if there weren&#8217;t a declaration for <strong>y</strong> in the <em>Alpha</em> scope we would scan the text and find <strong>y</strong> in the outer/global scope and it would have <strong>real</strong> type in that&nbsp;case):</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img18.png"></p> </li> <li> <p>And, finally, the variable <strong>x</strong> from the left hand side of the assignment statement <strong>x := a + x + y;</strong> It resolves to the same declaration as the variable reference <strong>x</strong> in the arithmetic expression on the right-hand&nbsp;side:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img19.png"></p> </li> </ol> <p>How do we implement that behavior of looking in the current scope, and then looking in the enclosing scope, and so on until we either find the symbol we&#8217;re looking for or we&#8217;ve reached the top of the scope tree and there are no more scopes left? We simply need to extend the <em>lookup</em> method in the <em>ScopedSymbolTable</em> class to continue its search up the chain in the scope&nbsp;tree:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">. (Scope name: </span><span class="si">%s</span><span class="s1">)&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_name</span><span class="p">))</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or None</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">if</span> <span class="n">symbol</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">return</span> <span class="n">symbol</span> <span class="c1"># recursively go up the chain and lookup the name</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> </pre></div> <p>The way the updated <em>lookup</em> method&nbsp;works:</p> <ol> <li>Search for a symbol by name in the current scope. If the symbol is found, then return&nbsp;it.</li> <li>If the symbol is not found, recursively traverse the tree and search for the symbol in the scopes up the chain. You don&#8217;t have to do the lookup recursively, you can rewrite it into an iterative form; the important part is to follow the link from a nested scope to its enclosing scope and search for the symbol there and up the tree until either the symbol is found or there are no more scopes left because you&#8217;ve reached the top of the scope&nbsp;tree.</li> <li>The <em>lookup</em> method also prints the scope name, in parenthesis, where the lookup happens to make it clearer that lookup goes up the chain to search for a symbol, if it can&#8217;t find it in the current&nbsp;scope.</li> </ol> <p>Let&#8217;s see what our semantic analyzer outputs for our sample program now that we&#8217;ve modified the way the <em>lookup</em> searches the scope tree for a symbol. Download <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope04a.py">scope04a.py</a> and run it on the command&nbsp;line:</p> <div class="highlight"><pre><span></span>$ python scope04a.py ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: x Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: y Insert: Alpha ENTER scope: Alpha Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: a Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: y Lookup: a. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : Alpha Scope level : <span class="m">2</span> Enclosing scope: global Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: Alpha SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Enclosing scope: None Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; Alpha: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>Alpha, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; LEAVE scope: global </pre></div> <p>Inspect the output above and pay attention to the <em><span class="caps">ENTER</span></em> and <em>Lookup</em> messages. A couple of things worth mentioning&nbsp;here:</p> <ol> <li> <p>Notice how the semantic analyzer looks up the <em><span class="caps">INTEGER</span></em> built-in type symbol before inserting the variable symbol <strong>a</strong>. It searches <em><span class="caps">INTEGER</span></em> first in the current scope, <em>Alpha</em>, doesn&#8217;t find it, then goes up the tree all the way to the <em>global scope</em>, and finds the symbol&nbsp;there:</p> <div class="highlight"><pre><span></span>ENTER scope: Alpha Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: a </pre></div> </li> <li> <p>Notice also how the analyzer resolves variable references from the assignment statement <strong>x := a + x + y</strong>:</p> <div class="highlight"><pre><span></span>Lookup: a. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> </pre></div> <p>The analyzer starts its search in the current scope and then goes up the tree all the way to the <em>global scope</em>.</p> </li> </ol> <p>Let&#8217;s also see what happens when a Pascal program has a variable reference that doesn&#8217;t resolve to a variable declaration as in the sample program&nbsp;below:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">b</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="cm">{ ERROR here! }</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Download <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope04b.py">scope04b.py</a> and run it on the command&nbsp;line:</p> <div class="highlight"><pre><span></span>$ python scope04b.py ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: x Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: y Insert: Alpha ENTER scope: Alpha Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: a Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: y Lookup: b. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: b. <span class="o">(</span>Scope name: global<span class="o">)</span> Error: Symbol<span class="o">(</span>identifier<span class="o">)</span> not found <span class="s1">&#39;b&#39;</span> </pre></div> <p>As you can see, the analyzer tried to resolve the variable reference <strong>b</strong> and searched for it in the <em>Alpha</em> scope first, then the <em>global scope</em>, and, not being able to find a symbol with the name <strong>b</strong>, it threw the semantic&nbsp;error.</p> <p>Okay great, now we know how to write a semantic analyzer that can analyze a program for semantic errors when the program has nested&nbsp;scopes.</p> <p><br/></p> <h3 id="source-to-source-compiler">Source-to-source&nbsp;compiler</h3> <p>Now, onto something completely different. Let&#8217;s write a <em>source-to-source compiler</em>! Why would we do it? Aren&#8217;t we talking about interpreters and nested scopes? Yes, we are, but let me explain why I think it might be a good idea to learn how to write a source-to-source compiler right&nbsp;now.</p> <p>First, let&#8217;s talk about definitions. What is <em>a source-to-source compiler</em>? For the purpose of this article, let&#8217;s define a <strong><em>source-to-source compiler</em></strong> as a compiler that translates a program in some source language into a program in the same (or almost the same) source&nbsp;language.</p> <p>So, if you write a translator that takes as an input a Pascal program and outputs a Pascal program, possibly modified, or enhanced, the translator in this case is called a <em>source-to-source compiler</em>.</p> <p>A good example of a source-to-source compiler for us to study would be a compiler that takes a Pascal program as an input and outputs a Pascal-like program where every name is subscripted with a corresponding scope level, and, in addition to that, every variable reference also has a type indicator. So we want a source-to-source compiler that would take the following Pascal&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>and turn it into the following Pascal-like&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main0</span><span class="o">;</span> <span class="k">var</span> <span class="n">x1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">var</span> <span class="n">y1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha1</span><span class="p">(</span><span class="n">a2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">begin</span> <span class="o">&lt;</span><span class="n">x1</span><span class="o">:</span><span class="kt">REAL</span><span class="o">&gt;</span> <span class="o">:=</span> <span class="o">&lt;</span><span class="n">a2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">x1</span><span class="o">:</span><span class="kt">REAL</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">y2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF Alpha}</span> <span class="k">begin</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{END OF Main}</span> </pre></div> <p>Here is the list of modifications our source-to-source compiler should make to an input Pascal&nbsp;program:</p> <ol> <li>Every declaration should be printed on a separate line, so if we have multiple declarations in the input Pascal program, the compiled output should have each declaration on a separate line. We can see in the text above, for example, how the line <em>var x, y : real;</em> gets converted into multiple&nbsp;lines.</li> <li>Every name should get subscripted with a number corresponding to the scope level of the respective&nbsp;declaration.</li> <li>Every variable reference, in addition to being subscripted, should also be printed in the following form: <em>&lt;var_name_with_subscript:type></em></li> <li>The compiler should also add a comment at the end of every block in the form <em>{<span class="caps">END</span> <span class="caps">OF</span> &#8230; }</em>, where the ellipses will get substituted either with a program name or procedure name. That will help us identify the textual boundaries of procedures&nbsp;faster.</li> </ol> <p>As you can see from the generated output above, this source-to-source compiler could be a useful tool for understanding how name resolution works, especially when a program has nested scopes, because the output generated by the compiler would allow us to quickly see to what declaration and in what scope a certain variable reference resolves to. This is good help when learning about symbols, nested scopes, and name&nbsp;resolution.</p> <p>How can we implement a source-to-source compiler like that? We have actually covered all the necessary parts to do it. All we need to do now is extend our semantic analyzer a bit to generate the enhanced output. You can see the full source code of the compiler <a href="https://github.com/rspivak/lsbasi/blob/master/part14/src2srccompiler.py">here</a>. It is basically a semantic analyzer on drugs, modified to generate and return strings for certain <span class="caps">AST</span>&nbsp;nodes.</p> <p>Download <a href="https://github.com/rspivak/lsbasi/blob/master/part14/src2srccompiler.py">src2srccompiler.py</a>, study it, and experiment with it by passing it different Pascal programs as an&nbsp;input.</p> <p>For the following program, for&nbsp;example:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">var</span> <span class="n">z</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">AlphaA</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ AlphaA }</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ AlphaA }</span> <span class="k">procedure</span> <span class="nf">AlphaB</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ AlphaB }</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ AlphaB }</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>The compiler generates the following&nbsp;output:</p> <div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">python</span> <span class="n">src2srccompiler</span><span class="o">.</span><span class="n">py</span> <span class="n">nestedscopes03</span><span class="o">.</span><span class="n">pas</span> <span class="k">program</span> <span class="n">Main0</span><span class="o">;</span> <span class="k">var</span> <span class="n">x1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">var</span> <span class="n">y1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">var</span> <span class="n">z1</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">AlphaA1</span><span class="p">(</span><span class="n">a2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">begin</span> <span class="o">&lt;</span><span class="n">x1</span><span class="o">:</span><span class="kt">REAL</span><span class="o">&gt;</span> <span class="o">:=</span> <span class="o">&lt;</span><span class="n">a2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">x1</span><span class="o">:</span><span class="kt">REAL</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">y2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF AlphaA}</span> <span class="k">procedure</span> <span class="nf">AlphaB1</span><span class="p">(</span><span class="n">a2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">b2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">begin</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF AlphaB}</span> <span class="k">begin</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{END OF Main}</span> </pre></div> <p>Cool beans and congratulations, now you know how to write a basic source-to-source&nbsp;compiler!</p> <p>Use it to further your understanding of nested scopes, name resolution, and what you can do when you have an <span class="caps">AST</span> and some extra information about the program in the form of symbol&nbsp;tables.</p> <p><br/> Now that we have a useful tool to subscript our programs for us, let&#8217;s take a look at a bigger example of nested scopes that you can find in <a href="https://github.com/rspivak/lsbasi/blob/master/part14/nestedscopes04.pas">nestedscopes04.pas</a>:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">b</span><span class="o">,</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">var</span> <span class="n">z</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">AlphaA</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Beta</span><span class="p">(</span><span class="n">c</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Gamma</span><span class="p">(</span><span class="n">c</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Gamma }</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span> <span class="o">+</span> <span class="n">z</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ Gamma }</span> <span class="k">begin</span> <span class="cm">{ Beta }</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ Beta }</span> <span class="k">begin</span> <span class="cm">{ AlphaA }</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ AlphaA }</span> <span class="k">procedure</span> <span class="nf">AlphaB</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">c</span> <span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ AlphaB }</span> <span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{ AlphaB }</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>Below you can see the declarations&#8217; scopes, nesting relationships diagram, and scope information&nbsp;table:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img20.png"> <img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img21.png"></p> <p>Let&#8217;s run our source-to-source compiler and inspect the output. The subscripts should match the ones in the scope information table in the picture&nbsp;above:</p> <div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">python</span> <span class="n">src2srccompiler</span><span class="o">.</span><span class="n">py</span> <span class="n">nestedscopes04</span><span class="o">.</span><span class="n">pas</span> <span class="k">program</span> <span class="n">Main0</span><span class="o">;</span> <span class="k">var</span> <span class="n">b1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">var</span> <span class="n">x1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">var</span> <span class="n">y1</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">var</span> <span class="n">z1</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">AlphaA1</span><span class="p">(</span><span class="n">a2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">b2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Beta2</span><span class="p">(</span><span class="n">c3</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y3</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Gamma3</span><span class="p">(</span><span class="n">c4</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">x4</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">begin</span> <span class="o">&lt;</span><span class="n">x4</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">:=</span> <span class="o">&lt;</span><span class="n">a2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">b2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">c4</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">x4</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">y3</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">z1</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF Gamma}</span> <span class="k">begin</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF Beta}</span> <span class="k">begin</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF AlphaA}</span> <span class="k">procedure</span> <span class="nf">AlphaB1</span><span class="p">(</span><span class="n">a2</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">c2</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">begin</span> <span class="o">&lt;</span><span class="n">c2</span><span class="o">:</span><span class="kt">REAL</span><span class="o">&gt;</span> <span class="o">:=</span> <span class="o">&lt;</span><span class="n">a2</span><span class="o">:</span><span class="kt">INTEGER</span><span class="o">&gt;</span> <span class="o">+</span> <span class="o">&lt;</span><span class="n">b1</span><span class="o">:</span><span class="kt">REAL</span><span class="o">&gt;;</span> <span class="k">end</span><span class="o">;</span> <span class="cm">{END OF AlphaB}</span> <span class="k">begin</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{END OF Main}</span> </pre></div> <p>Spend some time studying both the pictures and the output of the source-to-source compiler. Make sure you understand the following main&nbsp;points:</p> <ul> <li>The way the vertical lines are drawn to show the scope of the&nbsp;declarations.</li> <li>That a hole in a scope indicates that a variable is re-declared in a nested&nbsp;scope.</li> <li>That <em>AlphaA</em> and <em>AlphaB</em> are declared in the global&nbsp;scope.</li> <li>That <em>AlphaA</em> and <em>AlphaB</em> declarations introduce new&nbsp;scopes.</li> <li>How scopes are nested within each other, and their nesting&nbsp;relationships.</li> <li>Why different names, including variable references in assignment statements, are subscripted the way they are. In other words, how name resolution and specifically the <em>lookup</em> method of chained scoped symbol tables&nbsp;works.</li> </ul> <p>Also run <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope05.py">the following program</a>:</p> <div class="highlight"><pre><span></span>$ python scope05.py nestedscopes04.pas </pre></div> <p>and inspect the contents of the chained scoped symbol tables and compare it with what you see in the scope information table in the picture above. And don&#8217;t forget about the <a href="https://github.com/rspivak/lsbasi/blob/master/part14/genastdot.py">genastdot.py</a>, which you can use to generate a visual diagram of an <span class="caps">AST</span> to see how procedures are nested within each other in the&nbsp;tree.</p> <p><br/> Before we wrap up our discussion of nested scopes for today, recall that earlier we removed the semantic check that was checking source programs for duplicate identifiers. Let&#8217;s put it back. For the check to work in the presence of nested scopes and the new behavior of the <em>lookup</em> method, though, we need to make some changes. First, we need to update the <em>lookup</em> method and add an extra parameter that will allow us to limit our search to the current scope&nbsp;only:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">current_scope_only</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">. (Scope name: </span><span class="si">%s</span><span class="s1">)&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope_name</span><span class="p">))</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or None</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">if</span> <span class="n">symbol</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">return</span> <span class="n">symbol</span> <span class="k">if</span> <span class="n">current_scope_only</span><span class="p">:</span> <span class="k">return</span> <span class="bp">None</span> <span class="c1"># recursively go up the chain and lookup the name</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">enclosing_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> </pre></div> <p>And second, we need to modify the <em>visit_VarDecl</em> method and add the check using our new <em>current_scope_only</em> parameter in the <em>lookup</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="c1"># Signal an error if the table alrady has a symbol</span> <span class="c1"># with the same name</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">current_scope_only</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span> <span class="s2">&quot;Error: Duplicate identifier &#39;</span><span class="si">%s</span><span class="s2">&#39; found&quot;</span> <span class="o">%</span> <span class="n">var_name</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_scope</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p>If we don&#8217;t limit the search for a duplicate identifier to the current scope, the lookup might find a variable symbol with the same name in an outer scope and, as a result, would throw an error, while in reality there was no semantic error to begin&nbsp;with.</p> <p>Here is the output from running <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope05.py">scope05.py</a> with a program that doesn&#8217;t have duplicate identifier errors. You can notice below that the output has more lines in it, due to our duplicate identifier check that looks up for a duplicate name before inserting a new&nbsp;symbol:</p> <div class="highlight"><pre><span></span>$ python scope05.py nestedscopes02.pas ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: x Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: y Insert: Alpha ENTER scope: Alpha Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: a Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Insert: y Lookup: a. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : Alpha Scope level : <span class="m">2</span> Enclosing scope: global Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ a: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; LEAVE scope: Alpha SCOPE <span class="o">(</span>SCOPED SYMBOL TABLE<span class="o">)</span> <span class="o">===========================</span> Scope name : global Scope level : <span class="m">1</span> Enclosing scope: None Scope <span class="o">(</span>Scoped symbol table<span class="o">)</span> contents ------------------------------------ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; Alpha: &lt;ProcedureSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span>Alpha, <span class="nv">parameters</span><span class="o">=[</span>&lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;a&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt;<span class="o">])</span>&gt; LEAVE scope: global </pre></div> <p>Now, let&#8217;s take <a href="https://github.com/rspivak/lsbasi/blob/master/part14/scope05.py">scope05.py</a> for another test drive and see how it catches a duplicate identifier semantic&nbsp;error.</p> <p>For example, for <a href="https://github.com/rspivak/lsbasi/blob/master/part14/dupiderror.pas">the following erroneous program</a> with a duplicate declaration of <strong><em>a</em></strong> in the <em>Alpha</em>&nbsp;scope:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">procedure</span> <span class="nf">Alpha</span><span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">var</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="cm">{ ERROR here! }</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">begin</span> <span class="cm">{ Main }</span> <span class="k">end</span><span class="o">.</span> <span class="cm">{ Main }</span> </pre></div> <p>the program generates the following&nbsp;output:</p> <div class="highlight"><pre><span></span>$ python scope05.py dupiderror.pas ENTER scope: global Insert: INTEGER Insert: REAL Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: x. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: x Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: y Insert: Alpha ENTER scope: Alpha Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Insert: a Lookup: INTEGER. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: INTEGER. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: y. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Insert: y Lookup: REAL. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Lookup: REAL. <span class="o">(</span>Scope name: global<span class="o">)</span> Lookup: a. <span class="o">(</span>Scope name: Alpha<span class="o">)</span> Error: Duplicate identifier <span class="s1">&#39;a&#39;</span> found </pre></div> <p>It caught the error as&nbsp;expected.</p> <p>On this positive note, let&#8217;s wrap up our discussion of scopes, scoped symbol tables, and nested scopes for&nbsp;today.</p> <p><br/></p> <h3 id="summary">Summary</h3> <p>We&#8217;ve covered a lot of ground. Let&#8217;s quickly recap what we learned in this&nbsp;article:</p> <ul> <li>We learned about <em>scopes</em>, why they are useful, and how to implement them in&nbsp;code.</li> <li>We learned about <em>nested scopes</em> and how <em>chained scoped symbol tables</em> are used to implement nested&nbsp;scopes.</li> <li>We learned how to code a semantic analyzer that walks an <span class="caps">AST</span>, builds <em>scoped symbols tables</em>, chains them together, and does various semantic&nbsp;checks.</li> <li>We learned about <em>name resolution</em> and how the semantic analyzer resolves names to their declarations using <em>chained scoped symbol tables (scopes)</em> and how the <em>lookup</em> method recursively goes up the chain in a <em>scope tree</em> to find a declaration corresponding to a certain&nbsp;name.</li> <li>We learned that building a <em>scope tree</em> in the semantic analyzer involves walking an <span class="caps">AST</span>, &#8220;pushing&#8221; a new scope on top of a scoped symbol table stack when ENTERing a certain <span class="caps">AST</span> node and &#8220;popping&#8221; the scope off the stack when LEAVing the node, making a <em>scope tree</em> look like a collection of scoped symbol table&nbsp;stacks.</li> <li>We learned how to write a <em>source-to-source compiler</em>, which can be a useful tool when learning about nested scopes, scope levels, and name&nbsp;resolution.</li> </ul> <p><br/></p> <h3 id="exercises">Exercises</h3> <p>Time for exercises, oh&nbsp;yeah!</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part14/lsbasi_part14_img22.png" width="180"></p> <ol> <li> <p>You&#8217;ve seen in the pictures throughout the article that the <em>Main</em> name in a program statement had subscript zero. I also mentioned that the program&#8217;s name is not in the <em>global scope</em> and it&#8217;s in some other outer scope that has level zero. Extend <a href="https://github.com/rspivak/lsbasi/blob/master/part14/spi.py">spi.py</a> and create a <em>builtins</em> scope, a new scope at level 0, and move the built-in types <span class="caps">INTEGER</span> and <span class="caps">REAL</span> into that scope. For fun and practice, you can also update the code to put the program name into that scope as&nbsp;well.</p> </li> <li> <p>For the source program in <a href="https://github.com/rspivak/lsbasi/blob/master/part14/nestedscopes04.pas">nestedscopes04.pas</a> do the&nbsp;following:</p> <ol> <li>Write down the source Pascal program on a piece of&nbsp;paper</li> <li>Subscript every name in the program indicating the scope level of the declaration the name resolves&nbsp;to.</li> <li>Draw vertical lines for every name declaration (variable and procedure) to visually show its scope. Don&#8217;t forget about scope holes and their meaning when&nbsp;drawing.</li> <li>Write a source-to-source compiler for the program without looking at the example source-to-source compiler in this&nbsp;article.</li> <li>Use the original <a href="https://github.com/rspivak/lsbasi/blob/master/part14/src2srccompiler.py">src2srccompiler.py</a> program to verify the output from your compiler and whether you subscripted the names correctly in the exercise&nbsp;(2.2).</li> </ol> </li> <li> <p>Modify the source-to-source compiler to add subscripts to the built-in types <span class="caps">INTEGER</span> and <span class="caps">REAL</span></p> </li> <li> <p>Uncomment the following block in the <a href="https://github.com/rspivak/lsbasi/blob/master/part14/spi.py">spi.py</a></p> <div class="highlight"><pre><span></span><span class="c1"># interpreter = Interpreter(tree)</span> <span class="c1"># result = interpreter.interpret()</span> <span class="c1"># print(&#39;&#39;)</span> <span class="c1"># print(&#39;Run-time GLOBAL_MEMORY contents:&#39;)</span> <span class="c1"># for k, v in sorted(interpreter.GLOBAL_MEMORY.items()):</span> <span class="c1"># print(&#39;%s = %s&#39; % (k, v))</span> </pre></div> <p>Run the interpreter with the <a href="https://github.com/rspivak/lsbasi/blob/master/part10/python/part10.pas">part10.pas</a> file as an&nbsp;input:</p> <div class="highlight"><pre><span></span>$ python spi.py part10.pas </pre></div> <p>Spot the problems and add the missing methods to the semantic&nbsp;analyzer.</p> </li> </ol> <p><br/> That&#8217;s it for today. In the next article we&#8217;ll learn about runtime, call stack, implement procedure calls, and write our first version of a recursive factorial function. Stay tuned and see you&nbsp;soon!</p> <p><br/> If you&#8217;re interested, here is a list of books (affiliate links) I referred to most when preparing the&nbsp;article:</p> <ol> <li> <p><a target="_blank" href="https://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=5d5ca8c07bff5452ea443d8319e7703d">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a target="_blank" href="https://www.amazon.com/gp/product/012088478X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=012088478X&linkCode=as2&tag=russblo0b-20&linkId=74578959d7d04bee4050c7bff1b7d02e">Engineering a Compiler, Second Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=012088478X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a target="_blank" href="https://www.amazon.com/gp/product/0124104096/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0124104096&linkCode=as2&tag=russblo0b-20&linkId=8db1da254b12fe6da1379957dda717fc">Programming Language Pragmatics, Fourth Edition</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0124104096" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a target="_blank" href="https://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=31743d76157ef1377153dba78c54e177">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a target="_blank" href="https://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=542d1267e34a529e0f69027af20e27f3">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="//ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 13: Semantic Analysis.2017-04-27T01:00:00-04:002017-04-27T01:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2017-04-27:/lsbasi-part13/<blockquote> <p><em>Anything worth doing is worth&nbsp;overdoing.</em></p> </blockquote> <p>Before doing a deep dive into the topic of scopes, I&#8217;d like to make a &#8220;quick&#8221; detour and talk in more detail about symbols, symbol tables, and semantic analysis. In the spirit of <em>&#8220;Anything worth doing is worth overdoing&#8221;</em>, I hope you&#8217;ll …</p><blockquote> <p><em>Anything worth doing is worth&nbsp;overdoing.</em></p> </blockquote> <p>Before doing a deep dive into the topic of scopes, I&#8217;d like to make a &#8220;quick&#8221; detour and talk in more detail about symbols, symbol tables, and semantic analysis. In the spirit of <em>&#8220;Anything worth doing is worth overdoing&#8221;</em>, I hope you&#8217;ll find the material useful for building a more solid foundation before tackling nested scopes. Today we will continue to increase our knowledge of how to write interpreters and compilers. You will see that some of the material covered in this article has parts that are much more extended versions of what you saw in <a href="/lsbasi-part11/">Part 11</a>, where we discussed symbols and symbol&nbsp;tables.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img01.png"></p> <p><br/></p> <p>Okay, let&#8217;s get&nbsp;started!</p> <p><br/></p> <h2 id="introduction-to-semantic-analysis">Introduction to semantic&nbsp;analysis</h2> <p>While our Pascal program can be grammatically correct and the parser can successfully build an <em>abstract syntax tree</em>, the program still can contain some pretty serious errors. To catch those errors we need to use the <em>abstract syntax tree</em> and the information from the <em>symbol table</em>.</p> <p>Why can&#8217;t we check for those errors during parsing, that is, during <em>syntax analysis</em>? Why do we have to build an <em><span class="caps">AST</span></em> and something called the symbol table to do&nbsp;that?</p> <p>In a nutshell, for convenience and the separation of concerns. By moving those extra checks into a separate phase, we can focus on one task at a time without making our parser and interpreter do more work than they are supposed to&nbsp;do.</p> <p>When the parser has finished building the <span class="caps">AST</span>, we know that the program is grammatically correct; that is, that its syntax is correct according to our grammar rules and now we can separately focus on checking for errors that require additional context and information that the parser did not have at the time of building the <span class="caps">AST</span>. To make it more concrete, let&#8217;s take a look at the following Pascal assignment&nbsp;statement:</p> <div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">:=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> </pre></div> <p><br/> The parser will handle it all right because, grammatically, the statement is correct (according to our previously defined grammar rules for assignment statements and expressions). But that&#8217;s not the end of the story yet, because Pascal has a requirement that variables must be declared with their corresponding types before they are used. How does the parser know whether <strong>x</strong> and <strong>y</strong> have been declared&nbsp;yet?</p> <p>Well, it doesn&#8217;t and that&#8217;s why we need a separate semantic analysis phase to answer the question (among many others) of whether the variables have been declared prior to their&nbsp;use.</p> <p>What is <em><strong>semantic analysis</strong></em>? Basically, it&#8217;s just a process to help us determine whether a program makes sense, and that it has meaning, according to a language&nbsp;definition.</p> <p>What does it even mean for a program to make sense? It depends in large part on a language definition and language&nbsp;requirements.</p> <p>Pascal language and, specifically, Free Pascal&#8217;s compiler, has certain requirements that, if not followed in a program, would lead to an error from the <em>fpc</em> compiler indicating that the program doesn&#8217;t &#8220;make sense&#8221;, that it is incorrect, even though the syntax might look okay. Here are some of those&nbsp;requirements:</p> <ul> <li>The variables must be declared before they are&nbsp;used</li> <li>The variables must have matching types when used in arithmetic expressions (this is a big part of <em>semantic analysis</em> called <em>type checking</em> that we&#8217;ll cover&nbsp;separately)</li> <li>There should be no duplicate declarations (Pascal prohibits, for example, having a local variable in a procedure with the same name as one of the procedure&#8217;s formal&nbsp;parameters)</li> <li>A name reference in a call to a procedure must refer to the actual declared procedure (It doesn&#8217;t make sense in Pascal if, in the procedure call <strong>foo()</strong>, the name <em>foo</em> refers to a variable foo of a primitive type <span class="caps">INTEGER</span>)</li> <li>A procedure call must have the correct number of arguments and the arguments&#8217; types must match those of formal parameters in the procedure&nbsp;declaration</li> </ul> <p>It is much easier to enforce the above requirements when we have enough context about the program, namely, an intermediate representation in the form of an <span class="caps">AST</span> that we can walk and the symbol table with information about different program entities like variables, procedures, and&nbsp;functions.</p> <p>After we implement the semantic analysis phase, the structure of our Pascal interpreter will look something like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img03.png" width="400"></p> <p>From the picture above you can see that our lexer will get source code as an input, transform that into tokens that the parser will consume and use to verify that the program is grammatically correct, and then it will generate an abstract syntax tree that our new semantic analysis phase will use to enforce different Pascal language requirements. During the semantic analysis phase, the semantic analyzer will also build and use the symbol table. After the semantic analysis, our interpreter will take the <span class="caps">AST</span>, evaluate the program by walking the <span class="caps">AST</span>, and produce the program&nbsp;output.</p> <p>Let&#8217;s get into the details of the semantic analysis&nbsp;phase.</p> <h2 id="symbols-and-symbol-tables">Symbols and symbol&nbsp;tables</h2> <p>In the following section, we&#8217;re going to discuss how to implement some of the semantic checks and how to build the symbol table: in other words, we are going to discuss how to perform a <em>semantic analysis</em> of our Pascal programs. Keep in mind that even though <em>semantic analysis</em> sounds fancy and deep, it&#8217;s just another step after parsing our program and creating an <span class="caps">AST</span> to check the source program for some additional errors that the parser couldn&#8217;t catch due to a lack of additional information&nbsp;(context).</p> <p>Today we&#8217;re going to focus on the following two <em>static semantic checks</em>*:</p> <ol> <li>That variables are declared before they are&nbsp;used</li> <li>That there are no duplicate variable&nbsp;declarations</li> </ol> <blockquote> <p>*<span class="caps">ASIDE</span>: <em>Static semantic checks</em> are the checks that we can make before interpreting (evaluating) the program, that is, before calling the interpret method on an instance of the Interpreter class. All the Pascal requirements mentioned before can be enforced with <em>static semantic checks</em> by walking an <span class="caps">AST</span> and using information from the symbol&nbsp;table.</p> <p><em>Dynamic semantic checks</em>, on the other hand, would require checks to be performed during the interpretation (evaluation) of the program. For example, a check that there is no division by zero, and that an array index is not out of bounds would be a <em>dynamic semantic check</em>. Our focus today is on <em>static semantic checks</em>.</p> </blockquote> <p>Let&#8217;s start with our first check and make sure that in our Pascal programs variables are declared before they are used. Take a look at the following syntactically correct but semantically incorrect program (ugh&#8230; too many hard to pronounce words in one sentence.&nbsp;:)</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>The program above has one variable declaration and two variable references. You can see that in the picture&nbsp;below:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img04.png" width="400"></p> <p>Let&#8217;s actually verify that our program is syntactically correct and that our parser doesn&#8217;t throw an error when parsing it. As they say, trust but verify. :) Download <a href="https://github.com/rspivak/lsbasi/blob/master/part13/spi.py">spi.py</a>, fire off a Python shell, and see for&nbsp;yourself:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; from spi import Lexer, Parser &gt;&gt;&gt; <span class="nv">text</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span> <span class="s2">program Main;</span> <span class="s2"> var x : integer;</span> <span class="s2">begin</span> <span class="s2"> x := y;</span> <span class="s2">end.</span> <span class="s2">&quot;&quot;&quot;</span> &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span>text<span class="o">)</span> &gt;&gt;&gt; <span class="nv">parser</span> <span class="o">=</span> Parser<span class="o">(</span>lexer<span class="o">)</span> &gt;&gt;&gt; <span class="nv">tree</span> <span class="o">=</span> parser.parse<span class="o">()</span> &gt;&gt;&gt; </pre></div> <p>You see? No errors. We can even generate an <span class="caps">AST</span> diagram for that program using <a href="https://github.com/rspivak/lsbasi/blob/master/part13/genastdot.py">genastdot.py</a>. First, save the source code into a file, let&#8217;s say semanticerror01.pas, and run the following&nbsp;commands:</p> <div class="highlight"><pre><span></span>$ python genastdot.py semanticerror01.pas &gt; semanticerror01.dot $ dot -Tpng -o ast.png semanticerror01.dot </pre></div> <p>Here is the <span class="caps">AST</span>&nbsp;diagram:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img05.png"></p> <p>So, it is a grammatically (syntactically) correct program, but the program doesn&#8217;t make sense because we don&#8217;t even know what type the variable <strong>y</strong> has (that&#8217;s why we need declarations) and if it will make sense to assign <strong>y</strong> to <strong>x</strong>. What if <strong>y</strong> is a string, does it make sense to assign a string to an integer? It does not, at least not in&nbsp;Pascal.</p> <p>So the program above has a semantic error because the variable <strong>y</strong> is not declared and we don&#8217;t know its type. In order for us to be able to catch errors like that, we need to learn how to check that variables are declared before they are used. So let&#8217;s learn how to do&nbsp;it.</p> <p>Let&#8217;s take a closer look at the following syntactically and semantically correct sample&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <ul> <li>It has two variable declarations: <strong>x</strong> and <strong>y</strong></li> <li>It also has three variable references (<strong>x</strong>, another <strong>x</strong>, and <strong>y</strong>) in the assignment statement <strong>x := x + y</strong>;</li> </ul> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img06.png" width="440"></p> <p>The program is grammatically correct, all the variables are declared, and we can see that adding two integers and assigning the result to an integer makes perfect sense. That&#8217;s great, but how do we programmatically check that the variables (variable references) <strong>x</strong> and <strong>y</strong> in the assignment statement <strong>x := x +y;</strong> have been&nbsp;declared?</p> <p>We can do this in several steps by implementing the following&nbsp;algorithm:</p> <ol> <li>Go over all variable&nbsp;declarations</li> <li>For every variable declaration you encounter, collect all necessary information about the declared&nbsp;variable</li> <li>Store the collected information in some stash for future reference by using the variable&#8217;s name as a&nbsp;key</li> <li>When you see a variable reference, such as in the assignment statement <strong>x := x + y</strong>, search the stash by the variable&#8217;s name to see if the stash has any information about the variable. If it does, the variable has been declared. If it doesn&#8217;t, the variable hasn&#8217;t been declared yet, which is a semantic&nbsp;error.</li> </ol> <p>This is what a flowchart of our algorithm could look&nbsp;like:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img07.png" width="800"></p> <p>Before we can implement the algorithm, we need to answer several&nbsp;questions:</p> <ul> <li>A. What information about variables do we need to&nbsp;collect?</li> <li>B. Where and how should we store the collected&nbsp;information?</li> <li>C. How do we implement the &#8220;go over all variable declarations&#8221;&nbsp;step?</li> </ul> <p>Our plan of attack will be the&nbsp;following:</p> <ol> <li>Figure out answers to the questions A, B, and C&nbsp;above.</li> <li>Use the answers to A, B, and C to implement the steps in the algorithm for our first static semantic check: a check that variables are declared before they are&nbsp;used.</li> </ol> <p>Okay, let&#8217;s get&nbsp;started.</p> <p><em><strong>Let&#8217;s find an answer to the question &#8220;What information about variables do we need to&nbsp;collect?&#8221;</strong></em></p> <p>So, what necessary information do we need to collect about a variable? Here are the important&nbsp;parts:</p> <ul> <li><em>Name</em> (we need to know the name of a declared variable because later we will be looking up variables by their&nbsp;names)</li> <li><em>Category</em> (we need to know what kind of an identifier it is: <em>variable</em>, <em>type</em>, <em>procedure</em>, and so&nbsp;on)</li> <li><em>Type</em> (we&#8217;ll need this information for type&nbsp;checking)</li> </ul> <p>Symbols will hold that information (name, category, and type) about our variables. What&#8217;s a <em>symbol</em>? A <strong>symbol</strong> is an identifier of some program entity like a variable, subroutine, or built-in&nbsp;type.</p> <p>In the following sample program we have two variable declarations that we will use to create two variable symbols: <strong>x</strong>, and <strong>y</strong>.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img08.png" width="440"></p> <p>In the code, we&#8217;ll represent symbols with a class called <em>Symbol</em> that has fields <em>name</em> and <em>type</em>&nbsp;:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Symbol</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> </pre></div> <p>As you can see, the class takes the <em>name</em> parameter and an optional <em>type</em> parameter (not all symbols have type information associated with them, as we&#8217;ll see&nbsp;shortly).</p> <p>What about the <em>category</em>? We will encode <em>category</em> into the class name. Alternatively, we could store the category of a symbol in the dedicated <em>category</em> field of the <em>Symbol</em> class as&nbsp;in:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Symbol</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="bp">self</span><span class="o">.</span><span class="n">category</span> <span class="o">=</span> <span class="n">category</span> </pre></div> <p>However, it&#8217;s more explicit to create a hierarchy of classes where the name of the class indicates its&nbsp;category.</p> <p>Up until now I&#8217;ve sort of skirted around one topic, that of built-in types. If you look at our sample program&nbsp;again:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">Main</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>You can see that variables <strong>x</strong> and <strong>y</strong> are declared as <em>integers</em>. What is the <em>integer</em> type? The integer type is another kind of symbol, a <em>built-in type symbol</em>. It&#8217;s called built-in because it doesn&#8217;t have to be declared explicitly in a Pascal program. It&#8217;s our interpreter&#8217;s responsibility to declare that type symbol and make it available to&nbsp;programmers:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img09.png" width="500"></p> <p>We are going to make a separate class for built-in types called <em>BuiltinTypeSymbol</em>. Here is the class definition for our built-in&nbsp;types:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">BuiltinTypeSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="s2">&quot;&lt;{class_name}(name=&#39;{name}&#39;)&gt;&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">class_name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="p">)</span> </pre></div> <p>The class <em>BuiltinTypeSymbol</em> inherits from the <em>Symbol</em> class, and its constructor requires only the <em>name</em> of the type, like <em>integer</em> or <em>real</em>. The &#8216;builtin type&#8217; category is encoded in the class name, as we discussed earlier, and the <em>type</em> parameter from the base class is automatically set to <em>None</em> when we create a new instance of the <em>BuiltinTypeSymbol</em>&nbsp;class.</p> <blockquote> <p><span class="caps">ASIDE</span></p> <p>The double underscore or <em>dunder</em> (as in &#8220;<strong>D</strong>ouble <strong><span class="caps">UNDER</span></strong>score&#8221;) methods <em>__str__</em> and <em>__repr__</em> are special Python methods. We&#8217;ve defined them to have a nice formatted message when we print a symbol object to standard&nbsp;output.</p> </blockquote> <p>By the way, built-in types are the reason why the type parameter in the Symbol class constructor is an optional&nbsp;parameter.</p> <p>Here is our symbol class hierarchy so&nbsp;far:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img10.png"></p> <p>Let&#8217;s play with the builtin types in a Python shell. Download the <a href="https://github.com/rspivak/lsbasi/blob/master/part13/spi.py">interpreter file</a> and save it as spi.py; launch a python shell from the same directory where you saved the spi.py file, and play with the class we&#8217;ve just defined&nbsp;interactively:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import BuiltinTypeSymbol &gt;&gt;&gt; <span class="nv">int_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;integer&#39;</span><span class="o">)</span> &gt;&gt;&gt; int_type &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;integer&#39;</span><span class="o">)</span>&gt; &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">real_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;real&#39;</span><span class="o">)</span> &gt;&gt;&gt; real_type &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;real&#39;</span><span class="o">)</span>&gt; </pre></div> <p>That&#8217;s all there is to built-in type symbols for now. Now back to our variable&nbsp;symbols.</p> <p>How can we represent them in code? Let&#8217;s create a <em>VarSymbol</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">VarSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">):</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="s2">&quot;&lt;{class_name}(name=&#39;{name}&#39;, type=&#39;{type}&#39;)&gt;&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">class_name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="vm">__class__</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="p">)</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> </pre></div> <p>In this class, we made both the <em>name</em> and the <em>type</em> parameters required and the class name <em>VarSymbol</em> clearly indicates that an instance of the class will identify a variable symbol (the category is <em>variable</em>). The <em>type</em> parameter is an instance of the <em>BuiltinTypeSymbol</em>&nbsp;class.</p> <p>Let&#8217;s go back to the interactive Python shell to see how we can manually construct instances of our variable symbols now that we know how to construct <em>BuiltinTypeSymbol</em> class&nbsp;instances:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import BuiltinTypeSymbol, VarSymbol &gt;&gt;&gt; <span class="nv">int_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;integer&#39;</span><span class="o">)</span> &gt;&gt;&gt; <span class="nv">real_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;real&#39;</span><span class="o">)</span> &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">var_x_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;x&#39;</span>, int_type<span class="o">)</span> &gt;&gt;&gt; var_x_symbol &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;integer&#39;</span><span class="o">)</span>&gt; &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">var_y_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;y&#39;</span>, real_type<span class="o">)</span> &gt;&gt;&gt; var_y_symbol &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;real&#39;</span><span class="o">)</span>&gt; &gt;&gt;&gt; </pre></div> <p>As you can see, we first create an instance of the built-in type symbol and then pass it as a second parameter to <em>VarSymbol</em>&#8216;s constructor: variable symbols must have both a name and type associated with them as you&#8217;ve seen in various variable declarations like <strong>var x :&nbsp;integer;</strong></p> <p>And here is the complete hierarchy of symbols we&#8217;ve defined so far, in visual&nbsp;form:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img11.png"></p> <p><br/> <strong><em>Okay, now onto answering the question &#8220;Where and how should we store the collected&nbsp;information?&#8221;</em></strong></p> <p>Now that we have all the symbols representing all our variable declarations, where should we store those symbols so that we can search for them later when we encounter variable references&nbsp;(names)?</p> <p>The answer is, as you probably already know, in <em>the symbol table</em>.</p> <p>What is a <em>symbol table</em>? A <strong>symbol table</strong> is an abstract data type for tracking various symbols in source code. Think of it as a dictionary where the key is the symbol&#8217;s name and the value is an instance of the symbol class (or one of its subclasses). To represent the symbol table in code we&#8217;ll use a dedicated class for it aptly named <em>SymbolTable</em>. :) To store symbols in the symbol table we&#8217;ll add the <em>insert</em> method to our symbol table class. The method <em>insert</em> will take a symbol as a parameter and store it internally in the <em>_symbols</em> ordered dictionary using the symbol&#8217;s name as a key and the symbol instance as a&nbsp;value:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SymbolTable</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">symtab_header</span> <span class="o">=</span> <span class="s1">&#39;Symbol table contents&#39;</span> <span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">symtab_header</span><span class="p">,</span> <span class="s1">&#39;_&#39;</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">symtab_header</span><span class="p">)]</span> <span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span> <span class="p">(</span><span class="s1">&#39;</span><span class="si">%7s</span><span class="s1">: </span><span class="si">%r</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="p">)</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> <span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Insert: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> </pre></div> <p>Let&#8217;s manually populate our symbol table for the following sample program. Because we don&#8217;t know how to search our symbol table yet, our program won&#8217;t contain any variable references, only variable&nbsp;declarations:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">SymTab1</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>Download <a href="https://github.com/rspivak/lsbasi/blob/master/part13/symtab01.py">symtab01.py</a>, which contains our new <em>SymbolTable</em> class and run it on the command line. This is what the output looks like for our program&nbsp;above:</p> <div class="highlight"><pre><span></span>$ python symtab01.py Insert: INTEGER Insert: x Insert: y Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p><br/> And now let&#8217;s build and populate the symbol table manually in a Python&nbsp;shell:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from symtab01 import SymbolTable, BuiltinTypeSymbol, VarSymbol &gt;&gt;&gt; <span class="nv">symtab</span> <span class="o">=</span> SymbolTable<span class="o">()</span> &gt;&gt;&gt; <span class="nv">int_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span> &gt;&gt;&gt; <span class="c1"># now let&#39;s store the built-in type symbol in the symbol table</span> ... &gt;&gt;&gt; symtab.insert<span class="o">(</span>int_type<span class="o">)</span> Insert: INTEGER &gt;&gt;&gt; &gt;&gt;&gt; symtab Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; &gt;&gt;&gt; <span class="nv">var_x_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;x&#39;</span>, int_type<span class="o">)</span> &gt;&gt;&gt; symtab.insert<span class="o">(</span>var_x_symbol<span class="o">)</span> Insert: x &gt;&gt;&gt; symtab Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; &gt;&gt;&gt; <span class="nv">var_y_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;y&#39;</span>, int_type<span class="o">)</span> &gt;&gt;&gt; symtab.insert<span class="o">(</span>var_y_symbol<span class="o">)</span> Insert: y &gt;&gt;&gt; symtab Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; &gt;&gt;&gt; </pre></div> <p><br/> At this point we have answers to two questions that we asked&nbsp;earlier:</p> <ul> <li> <p>A. What information about variables do we need to&nbsp;collect?</p> <p>Name, category, and type. And we use symbols to hold that&nbsp;information.</p> </li> <li> <p>B. Where and how should we store the collected&nbsp;information?</p> <p>We store collected symbols in the symbol table by using its insert&nbsp;method.</p> </li> </ul> <p><br/> Now let&#8217;s find the answer to our third question: <em><strong>&#8216;How do we implement the &#8220;go over all variable declarations&#8221;&nbsp;step?&#8217;</strong></em></p> <p>This is a really easy one. Because we already have an <span class="caps">AST</span> built by our parser, we just need to create a new <span class="caps">AST</span> visitor class that will be responsible for walking over the tree and doing different actions when visiting <em>VarDecl</em> <span class="caps">AST</span>&nbsp;nodes!</p> <p>Now we have answers to all three&nbsp;questions:</p> <ul> <li> <p>A. What information about variables do we need to&nbsp;collect?</p> <p>Name, category, and type. And we use symbols to hold that&nbsp;information.</p> </li> <li> <p>B. Where and how should we store the collected&nbsp;information?</p> <p>We store collected symbols in the symbol table by using its <em>insert</em>&nbsp;method.</p> </li> <li> <p>C. How do we implement the &#8220;go over all variable declarations&#8221;&nbsp;step?</p> <p>We will create a new <span class="caps">AST</span> visitor that will do some actions on visiting <em>VarDecl</em> <span class="caps">AST</span>&nbsp;nodes.</p> </li> </ul> <p><br/> Let&#8217;s create a new tree visitor class and give it the name <em>SemanticAnalyzer</em>. Take a look the following sample program, for&nbsp;example:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">SymTab2</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>To be able to analyze the program above, we don&#8217;t need to implement all <em>visit_xxx</em> methods, just a subset of them. Below is the skeleton for the <em>SemanticAnalyzer</em> class with enough <em>visit_xxx</em> methods to be able to successfully walk the <span class="caps">AST</span> of the sample program&nbsp;above:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SemanticAnalyzer</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span> <span class="o">=</span> <span class="n">SymbolTable</span><span class="p">()</span> <span class="k">def</span> <span class="nf">visit_Block</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">declaration</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">declarations</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">declaration</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Compound</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">children</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_NoOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> <span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="c1"># Actions go here</span> <span class="k">pass</span> </pre></div> <p>Now, we have all the pieces to implement the first three steps of our algorithm for our first static semantic check, the check that verifies that variables are declared before they are&nbsp;used.</p> <p>Here are the steps of the algorithm&nbsp;again:</p> <ol> <li>Go over all variable&nbsp;declarations</li> <li>For every variable declaration you encounter, collect all necessary information about the declared&nbsp;variable</li> <li>Store the collected information in some stash for future references by using the variable&#8217;s name as a&nbsp;key</li> <li>When you see a variable reference such as in the assignment statement <strong>x := x + y</strong>, search the stash by the variable&#8217;s name to see if the stash has any information about the variable. If it does, the variable has been declared. If it doesn&#8217;t, the variable hasn&#8217;t been declared yet, which is a semantic&nbsp;error.</li> </ol> <p><br/> Let&#8217;s implement those steps. Actually, the only thing that we need to do is fill in the <em>visit_VarDecl</em> method of the <em>SemanticAnalyzer</em> class. Here it is, filled&nbsp;in:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="c1"># For now, manually create a symbol for the INTEGER built-in type</span> <span class="c1"># and insert the type symbol in the symbol table.</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">type_symbol</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p>If you look at the contents of the method, you can see that it actually incorporates all three&nbsp;steps:</p> <ol> <li>The method will be called for <em>every</em> variable declaration once we&#8217;ve invoked the <em>visit</em> method of the <em>SemanticAnalyzer</em> instance. That covers Step 1 of the algorithm: <em>&#8220;Go over all variable&nbsp;declarations&#8221;</em></li> <li>For every variable declaration, the method <em>visit_VarDecl</em> will collect the necessary information and create a variable symbol instance. That covers Step 2 of the algorithm: <em>&#8220;For every variable declaration you encounter, collect all necessary information about the declared&nbsp;variable&#8221;</em></li> <li>The method <em>visit_VarDecl</em> will store the collected information about the variable declaration in the symbol table using the symbol table&#8217;s <em>insert</em> method. This covers Step 3 of the algorithm: <em>&#8220;Store the collected information in some stash for future references by using the variable&#8217;s name as a&nbsp;key&#8221;</em></li> </ol> <p>To see all of those steps in action, download file <a href="https://github.com/rspivak/lsbasi/blob/master/part13/symtab02.py">symtab02.py</a> and study its source code first. Then run it on the command line and inspect the&nbsp;output:</p> <div class="highlight"><pre><span></span>$ python symtab02.py Insert: INTEGER Insert: x Insert: INTEGER Insert: y Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p>You might have noticed that there are two lines that say <em>Insert: <span class="caps">INTEGER</span></em>. We will fix this situation in the following section where we&#8217;ll discuss the implementation of the final step (Step 4) of the semantic check&nbsp;algorithm.</p> <p><br/> Okay, let&#8217;s implement Step 4 of our algorithm. Here is an updated version of Step 4 to reflect the introduction of symbols and the symbol table: <em>When you see a variable reference (name) such as in the assignment statement <strong>x := x + y</strong>, search the symbol table by the variable&#8217;s name to see if the table has a variable symbol associated with the name. If it does, the variable has been declared. If it doesn&#8217;t, the variable hasn&#8217;t been declared yet, which is a semantic&nbsp;error.</em></p> <p>To implement Step 4, we need to make some changes to the symbol table and semantic&nbsp;analyzer:</p> <ol> <li>We need to add a method to our symbol table that will be able to look up a symbol by&nbsp;name.</li> <li>We need to update our semantic analyzer to look up a name in the symbol table every time it encounters a variable&nbsp;reference.</li> </ol> <p>First, let&#8217;s update our <em>SymbolTable</em> class by adding the <em>lookup</em> method that will be responsible for searching for a symbol by name. In other words, the <em>lookup</em> method will be responsible for resolving a variable name (a variable reference) to its declaration. The process of mapping a variable reference to its declaration is called <strong>name resolution</strong>. And here is our <em>lookup</em> method that does just that, <em>name resolution</em>:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or None</span> <span class="k">return</span> <span class="n">symbol</span> </pre></div> <p>The method takes a symbol name as a parameter and returns a symbol if it finds it or <em>None</em> if it doesn&#8217;t. As simple as&nbsp;that.</p> <p>While we&#8217;re at it, let&#8217;s also update our <em>SymbolTable</em> class to initialize built-in types. We&#8217;ll do that by adding a method <em>_init_builtins</em> and calling it in the <em>SymbolTable</em>&#8216;s constructor. The <em>_init_builtins</em> method will insert a type symbol for <em>integer</em> and a type symbol for <em>real</em> into the symbol&nbsp;table.</p> <p>Here is the full code for our updated <em>SymbolTable</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SymbolTable</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span> <span class="o">=</span> <span class="p">{}</span> <span class="bp">self</span><span class="o">.</span><span class="n">_init_builtins</span><span class="p">()</span> <span class="k">def</span> <span class="nf">_init_builtins</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;REAL&#39;</span><span class="p">))</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">symtab_header</span> <span class="o">=</span> <span class="s1">&#39;Symbol table contents&#39;</span> <span class="n">lines</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">symtab_header</span><span class="p">,</span> <span class="s1">&#39;_&#39;</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">symtab_header</span><span class="p">)]</span> <span class="n">lines</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span> <span class="p">(</span><span class="s1">&#39;</span><span class="si">%7s</span><span class="s1">: </span><span class="si">%r</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span> <span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">items</span><span class="p">()</span> <span class="p">)</span> <span class="n">lines</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">lines</span><span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> <span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Insert: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> <span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or None</span> <span class="k">return</span> <span class="n">symbol</span> </pre></div> <p><br/> Now that we have built-in type symbols and the <em>lookup</em> method to search our symbol table when we encounter variable names (and other names like type names), let&#8217;s update the <em>SemanticAnalyzer</em>&#8216;s <em>visit_VarDecl</em> method and replace the two lines where we were manually creating the <span class="caps">INTEGER</span> built-in type symbol and manually inserting it into the symbol table with code to look up the <span class="caps">INTEGER</span> type&nbsp;symbol.</p> <p>The change will also fix the issue with that double output of the <em>Insert: <span class="caps">INTEGER</span></em> line we&#8217;ve seen&nbsp;before.</p> <p>Here is the <em>visit_VarDecl</em> method before the&nbsp;change:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="c1"># For now, manually create a symbol for the INTEGER built-in type</span> <span class="c1"># and insert the type symbol in the symbol table.</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">type_symbol</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p>and after the&nbsp;change:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p>Let&#8217;s apply the changes to the familiar Pascal program that has only variable&nbsp;declarations:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">SymTab3</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>Download the <a href="https://github.com/rspivak/lsbasi/blob/master/part13/symtab03.py">symtab03.py</a> file that has all the changes we&#8217;ve just discussed, run it on the command line, and see that there is no longer a duplicate <em>Insert: <span class="caps">INTEGER</span></em> line in the program output any&nbsp;more:</p> <div class="highlight"><pre><span></span>$ python symtab03.py Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: x Lookup: INTEGER Insert: y Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p>You can also see in the output above that our semantic analyzer looks up the <em><span class="caps">INTEGER</span></em> built-in type twice: first for the declaration of the variable <strong>x</strong>, and the second time for the declaration of the variable <strong>y</strong>.</p> <p><br/> Now let&#8217;s switch our attention to variable references (names) and how we can resolve a variable name, let&#8217;s say in an arithmetic expression, to its variable declaration (variable symbol). Let&#8217;s take a look at the following sample program, for example, that has an assignment statement <strong>x := x + y;</strong> with three variable references: <strong>x</strong>, another <strong>x</strong>, and <strong>y</strong>:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">SymTab4</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>We already have the <em>lookup</em> method in our symbol table implementation. What we need to do now is extend our semantic analyzer so that every time it encounters a variable reference it would search the symbol table by the variable reference name using the symbol table&#8217;s <em>lookup</em> name. What method of the <em>SemanticAnalyzer</em> gets called every time a variable reference is encountered when the analyzer walks the <span class="caps">AST</span>? It&#8217;s the method <em>visit_Var</em>. Let&#8217;s add it to our class. It&#8217;s very simple: all it does is look up the variable symbol by&nbsp;name:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> </pre></div> <p><br/> Because our sample program <em>SymTab4</em> has an assignment statement with arithmetic addition in its right hand side, we need to add two more methods to our <em>SemanticAnalyzer</em> so that it could actually walk the <span class="caps">AST</span> of the <em>SymTab4</em> program and call the <em>visit_Var</em> method for all <em>Var</em> nodes. The methods we need to add are <em>visit_Assign</em> and <em>visit_BinOp</em>. They are nothing new: you&#8217;ve seen these methods before. Here they&nbsp;are:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="c1"># right-hand side</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="c1"># left-hand side</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_BinOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> </pre></div> <p><br/> You can find the full source code with the changes we&#8217;ve just discussed in the file <a href="https://github.com/rspivak/lsbasi/blob/master/part13/symtab04.py">symtab04.py</a>. Download the file, run it on the command line, and inspect the output produced for our sample program <em>SymTab4</em> with an assignment&nbsp;statement.</p> <p>Here is the output on my&nbsp;laptop:</p> <div class="highlight"><pre><span></span>$ python symtab04.py Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: x Lookup: INTEGER Insert: y Lookup: x Lookup: y Lookup: x Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p>Spend some time analyzing the output and making sure you understand how and why the output is generated in that&nbsp;order.</p> <p>At this point, we have implemented all of the steps of our algorithm for a static semantic check that verifies that all variables in the program are declared before they are&nbsp;used!</p> <h3 id="semantic-errors">Semantic&nbsp;errors</h3> <p>So far we&#8217;ve looked at the programs that had their variables declared, but what if our program has a variable reference that doesn&#8217;t resolve to any declaration; that is, it&#8217;s not declared? That&#8217;s a semantic error and we need to extend our semantic analyzer to signal that error. </br></p> <p>Take a look at the following semantically incorrect program, where the variable <strong>y</strong> is not declared but used in the assignment&nbsp;statement:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">SymTab5</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>To signal the error, we need to modify our <em>SemanticAnalyzer</em>&#8216;s <em>visit_Var</em> method to throw an exception if the <em>lookup</em> method cannot resolve a name to a symbol and returns <em>None</em>. Here is the updated code for <em>visit_Var</em>:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">if</span> <span class="n">var_symbol</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span> <span class="s2">&quot;Error: Symbol(identifier) not found &#39;</span><span class="si">%s</span><span class="s2">&#39;&quot;</span> <span class="o">%</span> <span class="n">var_name</span> <span class="p">)</span> </pre></div> <p>Download <a href="https://github.com/rspivak/lsbasi/blob/master/part13/symtab05.py">symtab05.py</a>, run it on the command line, and see what&nbsp;happens:</p> <div class="highlight"><pre><span></span>$ python symtab05.py Insert: INTEGER Insert: REAL Lookup: INTEGER Insert: x Lookup: y Error: Symbol<span class="o">(</span>identifier<span class="o">)</span> not found <span class="s1">&#39;y&#39;</span> Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p>You can see the error message <strong><em>Error: Symbol(identifier) not found &#8216;y&#8217;</em></strong> and the contents of the symbol&nbsp;table.</p> <p>Congratulations on finishing the current version of our semantic analyzer that can statically check if variables in a program are declared before they are used, and if they are not, throws an exception indicating a semantic&nbsp;error!</p> <p>Let&#8217;s pause for a second and celebrate this important milestone. Okay, the second is over and we need to move on to another static semantic check. For fun and profit let&#8217;s extend our semantic analyzer to check for duplicate identifiers in&nbsp;declarations.</p> <p>Let&#8217;s take a look at the following program,&nbsp;SymTab6:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">SymTab6</span><span class="o">;</span> <span class="k">var</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">var</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">real</span><span class="o">;</span> <span class="k">begin</span> <span class="n">x</span> <span class="o">:=</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>Variable <strong>y</strong> has been declared twice: the first time as <em>integer</em> and the second time as <em>real</em>.</p> <p>To catch that semantic error we need to modify our <em>visit_VarDecl</em> method to check whether the symbol table already has a symbol with the same name before inserting a new symbol. Here is our new version of the&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="c1"># We have all the information we need to create a variable symbol.</span> <span class="c1"># Create the symbol and insert it into the symbol table.</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="c1"># Signal an error if the table alrady has a symbol</span> <span class="c1"># with the same name</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span> <span class="s2">&quot;Error: Duplicate identifier &#39;</span><span class="si">%s</span><span class="s2">&#39; found&quot;</span> <span class="o">%</span> <span class="n">var_name</span> <span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p><br/> File <a href="https://github.com/rspivak/lsbasi/blob/master/part13/symtab06.py">symtab06.py</a> has all the changes. Download it and run it on the command&nbsp;line:</p> <div class="highlight"><pre><span></span>$ python symtab06.py Insert: INTEGER Insert: REAL Lookup: INTEGER Lookup: x Insert: x Lookup: INTEGER Lookup: y Insert: y Lookup: REAL Lookup: y Error: Duplicate identifier <span class="s1">&#39;y&#39;</span> found Symbol table contents _____________________ INTEGER: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; REAL: &lt;BuiltinTypeSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span>&gt; x: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;x&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; y: &lt;VarSymbol<span class="o">(</span><span class="nv">name</span><span class="o">=</span><span class="s1">&#39;y&#39;</span>, <span class="nv">type</span><span class="o">=</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span>&gt; </pre></div> <p>Study the output and the contents of the symbol table. Make sure you understand what&#8217;s going&nbsp;on.</p> <p><br/></p> <h2 id="summary">Summary</h2> <p>Let&#8217;s quickly recap what we learned&nbsp;today:</p> <ul> <li>We learned more about symbols, symbol tables, and semantic analysis in&nbsp;general</li> <li>We learned about name resolution and how the semantic analyzer resolves names to their&nbsp;declarations</li> <li>We learned how to code a semantic analyzer that walks an <span class="caps">AST</span>, builds the symbol table, and does basic semantic&nbsp;checks</li> </ul> <p><br/> And, as a reminder, the structure of our interpreter now looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part13/lsbasi_part13_img03.png" width="400"></p> <p><br/> We&#8217;re done with semantic checks for today and we&#8217;re finally ready to tackle the topic of scopes, how they relate to symbol tables, and the topic of semantic checks in the presence of nested scopes. Those will be central topics of the next article. Stay tuned and see you&nbsp;soon!</p> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 12.2016-12-01T16:20:00-05:002016-12-01T16:20:00-05:00Ruslan Spivaktag:ruslanspivak.com,2016-12-01:/lsbasi-part12/<blockquote> <p><em><span class="dquo">&#8220;</span>Be not afraid of going slowly; be afraid only of standing still.&#8221; - Chinese&nbsp;proverb.</em></p> </blockquote> <p>Hello, and welcome&nbsp;back!</p> <p>Today we are going to take a few more baby steps and learn how to parse Pascal procedure&nbsp;declarations.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part12/lsbasi_part12_babysteps.png" width="700"></p> <p>What is a <em><strong>procedure declaration</strong></em>? A <em><strong>procedure declaration</strong></em> is a language construct that …</p><blockquote> <p><em><span class="dquo">&#8220;</span>Be not afraid of going slowly; be afraid only of standing still.&#8221; - Chinese&nbsp;proverb.</em></p> </blockquote> <p>Hello, and welcome&nbsp;back!</p> <p>Today we are going to take a few more baby steps and learn how to parse Pascal procedure&nbsp;declarations.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part12/lsbasi_part12_babysteps.png" width="700"></p> <p>What is a <em><strong>procedure declaration</strong></em>? A <em><strong>procedure declaration</strong></em> is a language construct that defines an identifier (a procedure name) and associates it with a block of Pascal&nbsp;code.</p> <p>Before we dive in, a few words about Pascal procedures and their&nbsp;declarations:</p> <ul> <li>Pascal procedures don’t have return statements. They exit when they reach the end of their corresponding&nbsp;block.</li> <li>Pascal procedures can be nested within each&nbsp;other.</li> <li>For simplicity reasons, procedure declarations in this article won&#8217;t have any formal parameters. But, don&#8217;t worry, we&#8217;ll cover that later in the&nbsp;series.</li> </ul> <p>This is our test program for&nbsp;today:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part12</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">PROCEDURE</span> <span class="nf">P1</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="n">k</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">PROCEDURE</span> <span class="nf">P2</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span><span class="o">,</span> <span class="n">z</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{P2}</span> <span class="n">z</span> <span class="o">:=</span> <span class="mi">777</span><span class="o">;</span> <span class="k">END</span><span class="o">;</span> <span class="cm">{P2}</span> <span class="k">BEGIN</span> <span class="cm">{P1}</span> <span class="k">END</span><span class="o">;</span> <span class="cm">{P1}</span> <span class="k">BEGIN</span> <span class="cm">{Part12}</span> <span class="n">a</span> <span class="o">:=</span> <span class="mi">10</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> <span class="cm">{Part12}</span> </pre></div> <p>As you can see above, we have defined two procedures (<em>P1</em> and <em>P2</em>) and <em>P2</em> is nested within <em>P1</em>. In the code above, I used comments with a procedure’s name to clearly indicate where the body of every procedure begins and where it&nbsp;ends.</p> <p>Our objective for today is pretty clear: learn how to parse a code like&nbsp;that.</p> <p><br/></p> <p>First, we need to make some changes to our grammar to add procedure declarations. Well, let’s just do&nbsp;that!</p> <p>Here is the updated <em>declarations</em> grammar&nbsp;rule:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part12/lsbasi_part12_grammar.png" width="640"></p> <p>The procedure declaration sub-rule consists of the reserved keyword <strong><span class="caps">PROCEDURE</span></strong> followed by an identifier (a procedure name), followed by a semicolon, which in turn is followed by a <em>block</em> rule, which is terminated by a semicolon. Whoa! This is a case where I think the picture is actually worth however many words I just put in the previous sentence!&nbsp;:)</p> <p>Here is the updated syntax diagram for the <em>declarations</em>&nbsp;rule:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part12/lsbasi_part12_syntaxdiagram.png" width="640"></p> <p>From the grammar and the diagram above you can see that you can have as many procedure declarations on the same level as you want. For example, in the code snippet below we define two procedure declarations, <em>P1</em> and <em><span class="caps">P1A</span></em>, on the same&nbsp;level:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Test</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">PROCEDURE</span> <span class="nf">P1</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{P1}</span> <span class="k">END</span><span class="o">;</span> <span class="cm">{P1}</span> <span class="k">PROCEDURE</span> <span class="nf">P1A</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{P1A}</span> <span class="k">END</span><span class="o">;</span> <span class="cm">{P1A}</span> <span class="k">BEGIN</span> <span class="cm">{Test}</span> <span class="n">a</span> <span class="o">:=</span> <span class="mi">10</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> <span class="cm">{Test}</span> </pre></div> <p>The diagram and the grammar rule above also indicate that procedure declarations can be nested because the <em>procedure declaration</em> sub-rule references the <em>block</em> rule which contains the <em>declarations</em> rule, which in turn contains the <em>procedure declaration</em> sub-rule. As a reminder, here is the syntax diagram and the grammar for the block rule from <a href="/lsbasi-part10/">Part10</a>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part12/lsbasi_part12_block_rule_from_part10.png" width="800"></p> <p><br/> Okay, now let&#8217;s focus on the interpreter components that need to be updated to support procedure&nbsp;declarations:</p> <p><em><strong>Updating the&nbsp;Lexer</strong></em></p> <p>All we need to do is add a new token named <strong><span class="caps">PROCEDURE</span></strong>:</p> <div class="highlight"><pre><span></span><span class="n">PROCEDURE</span> <span class="o">=</span> <span class="s1">&#39;PROCEDURE&#39;</span> </pre></div> <p>And add <em>&#8216;<span class="caps">PROCEDURE</span>&#8217;</em> to the reserved keywords. Here is the complete mapping of reserved keywords to&nbsp;tokens:</p> <div class="highlight"><pre><span></span><span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="p">{</span> <span class="s1">&#39;PROGRAM&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;PROGRAM&#39;</span><span class="p">,</span> <span class="s1">&#39;PROGRAM&#39;</span><span class="p">),</span> <span class="s1">&#39;VAR&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;VAR&#39;</span><span class="p">,</span> <span class="s1">&#39;VAR&#39;</span><span class="p">),</span> <span class="s1">&#39;DIV&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;INTEGER_DIV&#39;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">),</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">),</span> <span class="s1">&#39;REAL&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;REAL&#39;</span><span class="p">,</span> <span class="s1">&#39;REAL&#39;</span><span class="p">),</span> <span class="s1">&#39;BEGIN&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;BEGIN&#39;</span><span class="p">,</span> <span class="s1">&#39;BEGIN&#39;</span><span class="p">),</span> <span class="s1">&#39;END&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;END&#39;</span><span class="p">,</span> <span class="s1">&#39;END&#39;</span><span class="p">),</span> <span class="s1">&#39;PROCEDURE&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;PROCEDURE&#39;</span><span class="p">,</span> <span class="s1">&#39;PROCEDURE&#39;</span><span class="p">),</span> <span class="p">}</span> </pre></div> <p><br/> <em><strong>Updating the&nbsp;Parser</strong></em></p> <p>Here is a summary of the parser&nbsp;changes:</p> <ol> <li>New <em>ProcedureDecl</em> <span class="caps">AST</span>&nbsp;node</li> <li>Update to the parser’s <em>declarations</em> method to support procedure&nbsp;declarations</li> </ol> <p>Let’s go over the&nbsp;changes.</p> <ol> <li> <p>The <em>ProcedureDecl</em> <span class="caps">AST</span> node represents a <em>procedure declaration</em>. The class constructor takes as parameters the name of the procedure and the <span class="caps">AST</span> node of the block of code that the procedure’s name refers&nbsp;to.</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ProcedureDecl</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">proc_name</span><span class="p">,</span> <span class="n">block_node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">proc_name</span> <span class="o">=</span> <span class="n">proc_name</span> <span class="bp">self</span><span class="o">.</span><span class="n">block_node</span> <span class="o">=</span> <span class="n">block_node</span> </pre></div> </li> <li> <p>Here is the updated <em>declarations</em> method of the <em>Parser</em>&nbsp;class</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">declarations</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;declarations : VAR (variable_declaration SEMI)+</span> <span class="sd"> | (PROCEDURE ID SEMI block SEMI)*</span> <span class="sd"> | empty</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">declarations</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">VAR</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">VAR</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">ID</span><span class="p">:</span> <span class="n">var_decl</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable_declaration</span><span class="p">()</span> <span class="n">declarations</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">var_decl</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PROCEDURE</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PROCEDURE</span><span class="p">)</span> <span class="n">proc_name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">value</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span> <span class="n">block_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">block</span><span class="p">()</span> <span class="n">proc_decl</span> <span class="o">=</span> <span class="n">ProcedureDecl</span><span class="p">(</span><span class="n">proc_name</span><span class="p">,</span> <span class="n">block_node</span><span class="p">)</span> <span class="n">declarations</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">proc_decl</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span> <span class="k">return</span> <span class="n">declarations</span> </pre></div> <p>Hopefully, the code above is pretty self-explanatory. It follows the grammar/syntax diagram for procedure declarations that you’ve seen earlier in the&nbsp;article.</p> </li> </ol> <p><br/> <em><strong>Updating the SymbolTable&nbsp;builder</strong></em></p> <p>Because we’re not ready yet to handle nested procedure scopes, we’ll simply add an empty <em>visit_ProcedureDecl</em> method to the <em>SymbolTreeBuilder</em> <span class="caps">AST</span> visitor class. We&#8217;ll fill it out in the next&nbsp;article.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_ProcedureDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p></br> <em><strong>Updating the&nbsp;Interpreter</strong></em></p> <p>We also need to add an empty <em>visit_ProcedureDecl</em> method to the <em>Interpreter</em> class, which will cause our interpreter to silently ignore all our procedure&nbsp;declarations.</p> <p>So far, so&nbsp;good.</p> <p></br> Now that we&#8217;ve made all the necessary changes, let&#8217;s see what the <em>Abstract Syntax Tree</em> looks like with the new <em>ProcedureDecl</em>&nbsp;nodes.</p> <p>Here is our Pascal program again (you can download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part12/python/part12.pas">GitHub</a>):</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part12</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">PROCEDURE</span> <span class="nf">P1</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="n">k</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">PROCEDURE</span> <span class="nf">P2</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span><span class="o">,</span> <span class="n">z</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{P2}</span> <span class="n">z</span> <span class="o">:=</span> <span class="mi">777</span><span class="o">;</span> <span class="k">END</span><span class="o">;</span> <span class="cm">{P2}</span> <span class="k">BEGIN</span> <span class="cm">{P1}</span> <span class="k">END</span><span class="o">;</span> <span class="cm">{P1}</span> <span class="k">BEGIN</span> <span class="cm">{Part12}</span> <span class="n">a</span> <span class="o">:=</span> <span class="mi">10</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> <span class="cm">{Part12}</span> </pre></div> <p><br/> Let&#8217;s generate an <span class="caps">AST</span> and visualize it with the <a href="https://github.com/rspivak/lsbasi/blob/master/part12/python/genastdot.py">genastdot.py</a>&nbsp;utility:</p> <div class="highlight"><pre><span></span>$ python genastdot.py part12.pas &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part12/lsbasi_part12_procdecl_ast.png"></p> <p>In the picture above you can see two <em>ProcedureDecl</em> nodes: <em>ProcDecl:P1</em> and <em>ProcDecl:P2</em> that correspond to procedures <em>P1</em> and <em>P2</em>. Mission accomplished.&nbsp;:)</p> <p>As a last item for today, let&#8217;s quickly check that our updated interpreter works as before when a Pascal program has procedure declarations in it. Download <a href="https://github.com/rspivak/lsbasi/blob/master/part12/python/spi.py">the interpreter</a> and <a href="https://github.com/rspivak/lsbasi/blob/master/part12/python/part12.pas">the test program</a> if you haven&#8217;t done so yet, and run it on the command line. Your output should look similar to&nbsp;this:</p> <div class="highlight"><pre><span></span>$ python spi.py part12.pas Define: INTEGER Define: REAL Lookup: INTEGER Define: &lt;a:INTEGER&gt; Lookup: a Symbol Table contents: Symbols: <span class="o">[</span>INTEGER, REAL, &lt;a:INTEGER&gt;<span class="o">]</span> Run-time GLOBAL_MEMORY contents: <span class="nv">a</span> <span class="o">=</span> <span class="m">10</span> </pre></div> <p></br> Okay, with all that knowledge and experience under our belt, we’re ready to tackle the topic of nested scopes that we need to understand in order to be able to analyze nested procedures and prepare ourselves to handle procedure and function calls. And that&#8217;s exactly what we are going to do in the next article: dive deep into nested scopes. So don&#8217;t forget to bring your swimming gear next time! Stay tuned and see you&nbsp;soon!</p> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 11.2016-09-20T21:15:00-04:002016-09-20T21:15:00-04:00Ruslan Spivaktag:ruslanspivak.com,2016-09-20:/lsbasi-part11/<p>I was sitting in my room the other day and thinking about how much we had covered, and I thought I would recap what we’ve learned so far and what lies ahead of&nbsp;us.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_recap.png" width="700"></p> <p>Up until now we&#8217;ve&nbsp;learned:</p> <ul> <li>How to break sentences into tokens. The process is …</li></ul><p>I was sitting in my room the other day and thinking about how much we had covered, and I thought I would recap what we’ve learned so far and what lies ahead of&nbsp;us.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_recap.png" width="700"></p> <p>Up until now we&#8217;ve&nbsp;learned:</p> <ul> <li>How to break sentences into tokens. The process is called <em><strong>lexical analysis</strong></em> and the part of the interpreter that does it is called a <em><strong>lexical analyzer</strong></em>, <em><strong>lexer</strong></em>, <em><strong>scanner</strong></em>, or <em><strong>tokenizer</strong></em>. We&#8217;ve learned how to write our own <em><strong>lexer</strong></em> from the ground up without using regular expressions or any other tools like <a href="https://en.wikipedia.org/wiki/Lex_(software)">Lex</a>.</li> <li>How to recognize a phrase in the stream of tokens. The process of recognizing a phrase in the stream of tokens or, to put it differently, the process of finding structure in the stream of tokens is called <em><strong>parsing</strong></em> or <em><strong>syntax analysis</strong></em>. The part of an interpreter or compiler that performs that job is called a <em><strong>parser</strong></em> or <em><strong>syntax analyzer</strong></em>.</li> <li>How to represent a programming language&#8217;s syntax rules with <em><strong>syntax diagrams</strong></em>, which are a graphical representation of a programming language’s syntax rules. <em><strong>Syntax diagrams</strong></em> visually show us which statements are allowed in our programming language and which are&nbsp;not.</li> <li>How to use another widely used notation for specifying the syntax of a programming language. It’s called <em><strong>context-free grammars</strong></em> (<em><strong>grammars</strong></em>, for short) or <em><strong><span class="caps">BNF</span></strong></em> (Backus-Naur&nbsp;Form).</li> <li>How to map a <em><strong>grammar</strong></em> to code and how to write a <em><strong>recursive-descent parser</strong></em>.</li> <li>How to write a really basic <em><strong>interpreter</strong></em>.</li> <li>How <em><strong>associativity</strong></em> and <em><strong>precedence</strong></em> of operators work and how to construct a grammar using a precedence&nbsp;table.</li> <li>How to build an <em><strong>Abstract Syntax Tree</strong></em> (<span class="caps">AST</span>) of a parsed sentence and how to represent the whole source program in Pascal as one big <em><strong><span class="caps">AST</span></strong></em>.</li> <li>How to walk an <span class="caps">AST</span> and how to implement our interpreter as an <span class="caps">AST</span> node&nbsp;visitor.</li> </ul> <p>With all that knowledge and experience under our belt, we’ve built an interpreter that can scan, parse, and build an <span class="caps">AST</span> and interpret, by walking the <span class="caps">AST</span>, our very first complete Pascal program. Ladies and gentlemen, I honestly think if you’ve reached this far, you deserve a pat on the back. But don’t let it go to your head. Keep going. Even though we’ve covered a lot of ground, there are even more exciting parts coming our&nbsp;way.</p> <p></br></p> <p>With everything we&#8217;ve covered so far, we are almost ready to tackle topics&nbsp;like:</p> <ul> <li>Nested procedures and&nbsp;functions</li> <li>Procedure and function&nbsp;calls</li> <li>Semantic analysis (type checking, making sure variables are declared before they are used, and basically checking if a program makes&nbsp;sense)</li> <li>Control flow elements (like <span class="caps">IF</span>&nbsp;statements)</li> <li>Aggregate data types&nbsp;(Records)</li> <li>More built-in&nbsp;types</li> <li>Source-level&nbsp;debugger</li> <li>Miscellanea (All the other goodness not mentioned above&nbsp;:)</li> </ul> <p>But before we cover those topics, we need to build a solid foundation and&nbsp;infrastructure.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_foundation.png" width="500"></p> <p>This is where we start diving deeper into the super important topic of symbols, symbol tables, and scopes. The topic itself will span several articles. It’s that important and you’ll see why. Okay, let’s start building that foundation and infrastructure, then, shall&nbsp;we?</p> <p></br> First, let&#8217;s talk about symbols and why we need to track them. What is a <em><strong>symbol</strong></em>? For our purposes, we’ll informally define <em><strong>symbol</strong></em> as an identifier of some program entity like a variable, subroutine, or built-in type. For symbols to be useful they need to have at least the following information about the program entities they&nbsp;identify:</p> <ul> <li>Name (for example, ‘x’, ‘y’,&nbsp;‘number’)</li> <li>Category (Is it a variable, subroutine, or built-in&nbsp;type?)</li> <li>Type (<span class="caps">INTEGER</span>, <span class="caps">REAL</span>)</li> </ul> <p>Today we’ll tackle variable symbols and built-in type symbols because we’ve already used variables and types before. By the way, the “built-in” type just means a type that hasn’t been defined by you and is available for you right out of the box, like <span class="caps">INTEGER</span> and <span class="caps">REAL</span> types that you’ve seen and used&nbsp;before.</p> <p>Let’s take a look at the following Pascal program, specifically at the variable declaration part. You can see in the picture below that there are four symbols in that section: two variable symbols (<em>x</em> and <em>y</em>) and two built-in type symbols (<em><span class="caps">INTEGER</span></em> and <em><span class="caps">REAL</span></em>).</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_prog_symbols.png" width="640"></p> <p>How can we represent symbols in code? Let’s create a base <em>Symbol</em> class in&nbsp;Python:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Symbol</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> </pre></div> <p>As you can see, the class takes the <em>name</em> parameter and an optional <em>type</em> parameter (not all symbols may have a type associated with them). What about the category of a symbol? We’ll encode the category of a symbol in the class name itself, which means we’ll create separate classes to represent different symbol&nbsp;categories.</p> <p>Let’s start with basic built-in types. We’ve seen two built-in types so far, when we declared variables: <span class="caps">INTEGER</span> and <span class="caps">REAL</span>. How do we represent a built-in type symbol in code? Here is one&nbsp;option:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">BuiltinTypeSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> </pre></div> <p>The class inherits from the <em>Symbol</em> class and the constructor requires only a name of the type. The category is encoded in the class name, and the <em>type</em> parameter from the base class for a built-in type symbol is <em>None</em>. The double underscore or <em>dunder</em> (as in “Double UNDERscore”) methods <em>__str__</em> and <em>__repr__</em> are special Python methods and we’ve defined them to have a nice formatted message when you print a symbol&nbsp;object.</p> <p>Download the <a href="https://github.com/rspivak/lsbasi/blob/master/part11/python/spi.py">interpreter file</a> and save it as <em>spi.py</em>; launch a python shell from the same directory where you saved the spi.py file, and play with the class we’ve just defined&nbsp;interactively:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import BuiltinTypeSymbol &gt;&gt;&gt; <span class="nv">int_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span> &gt;&gt;&gt; int_type INTEGER &gt;&gt;&gt; <span class="nv">real_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span> &gt;&gt;&gt; real_type REAL </pre></div> <p></br> How can we represent a variable symbol? Let’s create a <em>VarSymbol</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">VarSymbol</span><span class="p">(</span><span class="n">Symbol</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">):</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="p">)</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="s1">&#39;&lt;{name}:{type}&gt;&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">)</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> </pre></div> <p>In the class we made both the <em>name</em> and the <em>type</em> parameters required parameters and the class name <em>VarSymbol</em> clearly indicates that an instance of the class will identify a variable symbol (the category is <em>variable</em>.)</p> <p>Back to the interactive python shell to see how we can manually construct instances for our variable symbols now that we know how to construct <em>BuiltinTypeSymbol</em> class&nbsp;instances:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import BuiltinTypeSymbol, VarSymbol &gt;&gt;&gt; <span class="nv">int_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span> &gt;&gt;&gt; <span class="nv">real_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span> &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">var_x_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;x&#39;</span>, int_type<span class="o">)</span> &gt;&gt;&gt; var_x_symbol &lt;x:INTEGER&gt; &gt;&gt;&gt; <span class="nv">var_y_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;y&#39;</span>, real_type<span class="o">)</span> &gt;&gt;&gt; var_y_symbol &lt;y:REAL&gt; </pre></div> <p>As you can see, we first create an instance of a built-in type symbol and then pass it as a parameter to <em>VarSymbol</em>&#8216;s&nbsp;constructor.</p> <p>Here is the hierarchy of symbols we’ve defined in visual&nbsp;form:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_symbol_hierarchy.png" width="500"></p> <p>So far so good, but we haven’t answered the question yet as to why we even need to track those symbols in the first&nbsp;place.</p> <p>Here are some of the&nbsp;reasons:</p> <ul> <li>To make sure that when we assign a value to a variable the types are correct (type&nbsp;checking)</li> <li>To make sure that a variable is declared before it is&nbsp;used</li> </ul> <p>Take a look at the following incorrect Pascal program, for&nbsp;example:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_symtracking.png" width="640"></p> <p>There are two problems with the program above (you can compile it with <a href="http://www.freepascal.org/"><em>fpc</em></a> to see it for&nbsp;yourself):</p> <ol> <li>In the expression <em>&#8220;x := 2 + y;&#8221;</em> we assigned a decimal value to the variable &#8220;x&#8221; that was declared as integer. That wouldn&#8217;t compile because the types are&nbsp;incompatible.</li> <li>In the assignment statement <em>&#8220;x := a;&#8221;</em> we referenced the variable &#8220;a&#8221; that wasn&#8217;t declared -&nbsp;wrong!</li> </ol> <p>To be able to identify cases like that even before interpreting/evaluating the source code of the program at run-time, we need to track program symbols. And where do we store the symbols that we track? I think you’ve guessed it right - in the symbol&nbsp;table!</p> <p></br> What is a <em><strong>symbol table</strong></em>? A <em><strong>symbol table</strong></em> is an abstract data type (<em><strong><span class="caps">ADT</span></strong></em>) for tracking various symbols in source code. Today we’re going to implement our symbol table as a separate class with some helper&nbsp;methods:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SymbolTable</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;Symbols: {symbols}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">symbols</span><span class="o">=</span><span class="p">[</span><span class="n">value</span> <span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">values</span><span class="p">()]</span> <span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> <span class="k">def</span> <span class="nf">define</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Define: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> <span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or &#39;None&#39;</span> <span class="k">return</span> <span class="n">symbol</span> </pre></div> <p>There are two main operations that we will be performing with the symbol table: storing symbols and looking them up by name: hence, we need two helper methods - <em>define</em> and <em>lookup</em>.</p> <p>The method <em>define</em> takes a symbol as a parameter and stores it internally in its <em>_symbols</em> ordered dictionary using the symbol’s name as a key and the symbol instance as a value. The method <em>lookup</em> takes a symbol name as a parameter and returns a symbol if it finds it or “None” if it&nbsp;doesn’t.</p> <p>Let’s manually populate our symbol table for the same Pascal program we’ve used just recently where we were manually creating variable and built-in type&nbsp;symbols:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part11</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="k">END</span><span class="o">.</span> </pre></div> <p>Launch a Python shell again and follow&nbsp;along:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import SymbolTable, BuiltinTypeSymbol, VarSymbol &gt;&gt;&gt; <span class="nv">symtab</span> <span class="o">=</span> SymbolTable<span class="o">()</span> &gt;&gt;&gt; <span class="nv">int_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="o">)</span> &gt;&gt;&gt; symtab.define<span class="o">(</span>int_type<span class="o">)</span> Define: INTEGER &gt;&gt;&gt; symtab Symbols: <span class="o">[</span>INTEGER<span class="o">]</span> &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">var_x_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;x&#39;</span>, int_type<span class="o">)</span> &gt;&gt;&gt; symtab.define<span class="o">(</span>var_x_symbol<span class="o">)</span> Define: &lt;x:INTEGER&gt; &gt;&gt;&gt; symtab Symbols: <span class="o">[</span>INTEGER, &lt;x:INTEGER&gt;<span class="o">]</span> &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">real_type</span> <span class="o">=</span> BuiltinTypeSymbol<span class="o">(</span><span class="s1">&#39;REAL&#39;</span><span class="o">)</span> &gt;&gt;&gt; symtab.define<span class="o">(</span>real_type<span class="o">)</span> Define: REAL &gt;&gt;&gt; symtab Symbols: <span class="o">[</span>INTEGER, &lt;x:INTEGER&gt;, REAL<span class="o">]</span> &gt;&gt;&gt; &gt;&gt;&gt; <span class="nv">var_y_symbol</span> <span class="o">=</span> VarSymbol<span class="o">(</span><span class="s1">&#39;y&#39;</span>, real_type<span class="o">)</span> &gt;&gt;&gt; symtab.define<span class="o">(</span>var_y_symbol<span class="o">)</span> Define: &lt;y:REAL&gt; &gt;&gt;&gt; symtab Symbols: <span class="o">[</span>INTEGER, &lt;x:INTEGER&gt;, REAL, &lt;y:REAL&gt;<span class="o">]</span> </pre></div> <p></br> If you looked at the contents of the <em>_symbols</em> dictionary it would look something like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_symtab.png" width="360"></p> <p>How do we automate the process of building the symbol table? We’ll just write another node visitor that walks the <span class="caps">AST</span> built by our parser! This is another example of how useful it is to have an intermediary form like <span class="caps">AST</span>. Instead of extending our parser to deal with the symbol table, we separate concerns and write a new node visitor class. Nice and clean.&nbsp;:)</p> <p>Before doing that, though, let’s extend our <em>SymbolTable</em> class to initialize the built-in types when the symbol table instance is created. Here is the full source code for today’s <em>SymbolTable</em>&nbsp;class:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SymbolTable</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span> <span class="o">=</span> <span class="n">OrderedDict</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">_init_builtins</span><span class="p">()</span> <span class="k">def</span> <span class="nf">_init_builtins</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">BuiltinTypeSymbol</span><span class="p">(</span><span class="s1">&#39;REAL&#39;</span><span class="p">))</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">s</span> <span class="o">=</span> <span class="s1">&#39;Symbols: {symbols}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">symbols</span><span class="o">=</span><span class="p">[</span><span class="n">value</span> <span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">values</span><span class="p">()]</span> <span class="p">)</span> <span class="k">return</span> <span class="n">s</span> <span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span> <span class="k">def</span> <span class="nf">define</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">symbol</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Define: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="p">[</span><span class="n">symbol</span><span class="o">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">symbol</span> <span class="k">def</span> <span class="nf">lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Lookup: </span><span class="si">%s</span><span class="s1">&#39;</span> <span class="o">%</span> <span class="n">name</span><span class="p">)</span> <span class="n">symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_symbols</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="c1"># &#39;symbol&#39; is either an instance of the Symbol class or &#39;None&#39;</span> <span class="k">return</span> <span class="n">symbol</span> </pre></div> <p></br> Now onto the <em>SymbolTableBuilder</em> <span class="caps">AST</span> node&nbsp;visitor:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">SymbolTableBuilder</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span> <span class="o">=</span> <span class="n">SymbolTable</span><span class="p">()</span> <span class="k">def</span> <span class="nf">visit_Block</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">declaration</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">declarations</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">declaration</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_BinOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Num</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> <span class="k">def</span> <span class="nf">visit_UnaryOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">expr</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Compound</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">children</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_NoOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> <span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p></br> You’ve seen most of those methods before in the <em>Interpreter</em> class, but the <em>visit_VarDecl</em> method deserves some special attention. Here it is&nbsp;again:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">type_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_node</span><span class="o">.</span><span class="n">value</span> <span class="n">type_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">type_name</span><span class="p">)</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="n">VarSymbol</span><span class="p">(</span><span class="n">var_name</span><span class="p">,</span> <span class="n">type_symbol</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">define</span><span class="p">(</span><span class="n">var_symbol</span><span class="p">)</span> </pre></div> <p>This method is responsible for visiting (walking) a <em>VarDecl</em> <span class="caps">AST</span> node and storing the corresponding symbol in the symbol table. First, the method looks up the built-in type symbol by name in the symbol table, then it creates an instance of the <em>VarSymbol</em> class and stores (defines) it in the symbol&nbsp;table.</p> <p></br> Let’s take our <em>SymbolTableBuilder</em> <span class="caps">AST</span> walker for a test drive and see it in&nbsp;action:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import Lexer, Parser, SymbolTableBuilder &gt;&gt;&gt; <span class="nv">text</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span> <span class="s2">... PROGRAM Part11;</span> <span class="s2">... VAR</span> <span class="s2">... x : INTEGER;</span> <span class="s2">... y : REAL;</span> <span class="s2">...</span> <span class="s2">... BEGIN</span> <span class="s2">...</span> <span class="s2">... END.</span> <span class="s2">... &quot;&quot;&quot;</span> &gt;&gt;&gt; <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span>text<span class="o">)</span> &gt;&gt;&gt; <span class="nv">parser</span> <span class="o">=</span> Parser<span class="o">(</span>lexer<span class="o">)</span> &gt;&gt;&gt; <span class="nv">tree</span> <span class="o">=</span> parser.parse<span class="o">()</span> &gt;&gt;&gt; <span class="nv">symtab_builder</span> <span class="o">=</span> SymbolTableBuilder<span class="o">()</span> Define: INTEGER Define: REAL &gt;&gt;&gt; symtab_builder.visit<span class="o">(</span>tree<span class="o">)</span> Lookup: INTEGER Define: &lt;x:INTEGER&gt; Lookup: REAL Define: &lt;y:REAL&gt; &gt;&gt;&gt; <span class="c1"># Let’s examine the contents of our symbol table</span> … &gt;&gt;&gt; symtab_builder.symtab Symbols: <span class="o">[</span>INTEGER, REAL, &lt;x:INTEGER&gt;, &lt;y:REAL&gt;<span class="o">]</span> </pre></div> <p>In the interactive session above, you can see the sequence of “Define: …” and “Lookup: …” messages that indicate the order in which symbols are defined and looked up in the symbol table. The last command in the session prints the contents of the symbol table and you can see that it’s exactly the same as the contents of the symbol table that we’ve built manually before. The magic of <span class="caps">AST</span> node visitors is that they pretty much do all the work for you.&nbsp;:)</p> <p></br> We can already put our symbol table and symbol table builder to good use: we can use them to verify that variables are declared before they are used in assignments and expressions. All we need to do is just extend the visitor with two more methods: <em>visit_Assign</em> and <em>visit_Var</em>:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">if</span> <span class="n">var_symbol</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">NameError</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">var_name</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">var_symbol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">symtab</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">if</span> <span class="n">var_symbol</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">NameError</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">var_name</span><span class="p">))</span> </pre></div> <p>These methods will raise a <em>NameError</em> exception if they cannot find the symbol in the symbol&nbsp;table.</p> <p></br> Take a look at the following program, where we reference the variable “b” that hasn’t been declared&nbsp;yet:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">NameError1</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="n">a</span> <span class="o">:=</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">b</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> <p>Let’s see what happens if we construct an <span class="caps">AST</span> for the program and pass it to our symbol table builder to&nbsp;visit:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import Lexer, Parser, SymbolTableBuilder &gt;&gt;&gt; <span class="nv">text</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span> <span class="s2">... PROGRAM NameError1;</span> <span class="s2">... VAR</span> <span class="s2">... a : INTEGER;</span> <span class="s2">...</span> <span class="s2">... BEGIN</span> <span class="s2">... a := 2 + b;</span> <span class="s2">... END.</span> <span class="s2">... &quot;&quot;&quot;</span> &gt;&gt;&gt; <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span>text<span class="o">)</span> &gt;&gt;&gt; <span class="nv">parser</span> <span class="o">=</span> Parser<span class="o">(</span>lexer<span class="o">)</span> &gt;&gt;&gt; <span class="nv">tree</span> <span class="o">=</span> parser.parse<span class="o">()</span> &gt;&gt;&gt; <span class="nv">symtab_builder</span> <span class="o">=</span> SymbolTableBuilder<span class="o">()</span> Define: INTEGER Define: REAL &gt;&gt;&gt; symtab_builder.visit<span class="o">(</span>tree<span class="o">)</span> Lookup: INTEGER Define: &lt;a:INTEGER&gt; Lookup: a Lookup: b Traceback <span class="o">(</span>most recent call last<span class="o">)</span>: ... File <span class="s2">&quot;spi.py&quot;</span>, line <span class="m">674</span>, in visit_Var raise NameError<span class="o">(</span>repr<span class="o">(</span>var_name<span class="o">))</span> NameError: <span class="s1">&#39;b&#39;</span> </pre></div> <p>Exactly what we were&nbsp;expecting!</p> <p></br> Here is another error case where we try to assign a value to a variable that hasn’t been defined yet, in this case the variable&nbsp;‘a’:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">NameError2</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="n">b</span> <span class="o">:=</span> <span class="mi">1</span><span class="o">;</span> <span class="n">a</span> <span class="o">:=</span> <span class="n">b</span> <span class="o">+</span> <span class="mi">2</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> <p>Meanwhile, in the Python&nbsp;shell:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; from spi import Lexer, Parser, SymbolTableBuilder &gt;&gt;&gt; <span class="nv">text</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;</span> <span class="s2">... PROGRAM NameError2;</span> <span class="s2">... VAR</span> <span class="s2">... b : INTEGER;</span> <span class="s2">...</span> <span class="s2">... BEGIN</span> <span class="s2">... b := 1;</span> <span class="s2">... a := b + 2;</span> <span class="s2">... END.</span> <span class="s2">... &quot;&quot;&quot;</span> &gt;&gt;&gt; <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span>text<span class="o">)</span> &gt;&gt;&gt; <span class="nv">parser</span> <span class="o">=</span> Parser<span class="o">(</span>lexer<span class="o">)</span> &gt;&gt;&gt; <span class="nv">tree</span> <span class="o">=</span> parser.parse<span class="o">()</span> &gt;&gt;&gt; <span class="nv">symtab_builder</span> <span class="o">=</span> SymbolTableBuilder<span class="o">()</span> Define: INTEGER Define: REAL &gt;&gt;&gt; symtab_builder.visit<span class="o">(</span>tree<span class="o">)</span> Lookup: INTEGER Define: &lt;b:INTEGER&gt; Lookup: b Lookup: a Traceback <span class="o">(</span>most recent call last<span class="o">)</span>: ... File <span class="s2">&quot;spi.py&quot;</span>, line <span class="m">665</span>, in visit_Assign raise NameError<span class="o">(</span>repr<span class="o">(</span>var_name<span class="o">))</span> NameError: <span class="s1">&#39;a&#39;</span> </pre></div> <p>Great, our new visitor caught this problem&nbsp;too!</p> <p>I would like to emphasize the point that all those checks that our <em>SymbolTableBuilder</em> <span class="caps">AST</span> visitor makes are made before the run-time, so before our interpreter actually evaluates the source program. To drive the point home if we were to interpret the following&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part11</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="n">x</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> <p>The contents of the symbol table and the run-time GLOBAL_MEMORY right before the program exited would look something like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_symtab_vs_globmem.png" width="700"></p> <p>Do you see the difference? Can you see that the symbol table doesn’t hold the value 2 for variable “x”? That’s solely the interpreter’s job&nbsp;now.</p> <p></br> Remember the picture from <a href="/lsbasi-part9/">Part 9</a> where the Symbol Table was used as global&nbsp;memory?</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part9_ast_st02.png" width="700"></p> <p>No more! We effectively got rid of the hack where symbol table did double duty as global&nbsp;memory.</p> <p></br> Let’s put it all together and test our new interpreter with the following&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part11</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">number</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">a</span><span class="o">,</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{Part11}</span> <span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">a</span> <span class="o">:=</span> <span class="n">number</span> <span class="o">;</span> <span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">number</span> <span class="k">DIV</span> <span class="mi">4</span><span class="o">;</span> <span class="n">y</span> <span class="o">:=</span> <span class="mi">20</span> <span class="o">/</span> <span class="mi">7</span> <span class="o">+</span> <span class="mf">3.14</span> <span class="k">END</span><span class="o">.</span> <span class="cm">{Part11}</span> </pre></div> <p></br> Save the program as part11.pas and fire up the&nbsp;interpreter:</p> <div class="highlight"><pre><span></span>$ python spi.py part11.pas Define: INTEGER Define: REAL Lookup: INTEGER Define: &lt;number:INTEGER&gt; Lookup: INTEGER Define: &lt;a:INTEGER&gt; Lookup: INTEGER Define: &lt;b:INTEGER&gt; Lookup: REAL Define: &lt;y:REAL&gt; Lookup: number Lookup: a Lookup: number Lookup: b Lookup: a Lookup: number Lookup: y Symbol Table contents: Symbols: <span class="o">[</span>INTEGER, REAL, &lt;number:INTEGER&gt;, &lt;a:INTEGER&gt;, &lt;b:INTEGER&gt;, &lt;y:REAL&gt;<span class="o">]</span> Run-time GLOBAL_MEMORY contents: <span class="nv">a</span> <span class="o">=</span> <span class="m">2</span> <span class="nv">b</span> <span class="o">=</span> <span class="m">25</span> <span class="nv">number</span> <span class="o">=</span> <span class="m">2</span> <span class="nv">y</span> <span class="o">=</span> <span class="m">5</span>.99714285714 </pre></div> <p></br> I’d like to draw your attention again to the fact that the <em>Interpreter</em> class has nothing to do with building the symbol table and it relies on the <em>SymbolTableBuilder</em> to make sure that the variables in the source code are properly declared before they are used by the <em>Interpreter</em>.</p> <p></br> <strong>Check your&nbsp;understanding</strong></p> <ul> <li>What is a&nbsp;symbol?</li> <li>Why do we need to track&nbsp;symbols?</li> <li>What is a symbol&nbsp;table?</li> <li>What is the difference between defining a symbol and resolving/looking up the&nbsp;symbol?</li> <li>Given the following small Pascal program, what would be the contents of the symbol table, the global memory (the GLOBAL_MEMORY dictionary that is part of the <em>Interpreter</em>)?<div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part11</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">x</span><span class="o">,</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="n">x</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">y</span> <span class="o">:=</span> <span class="mi">3</span> <span class="o">+</span> <span class="n">x</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> </li> </ul> <p></br> That’s all for today. In the next article, I’ll talk about scopes and we’ll get our hands dirty with parsing nested procedures. Stay tuned and see you soon! And remember that no matter what, &#8220;Keep&nbsp;going!&#8221;</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part11_keep_going.png" width="300"></p> <p></br></p> <p><span class="caps">P.S.</span> My explanation of the topic of symbols and symbol table management is heavily influenced by the book <em><a href="http://amzn.to/2cHsHT1">Language Implementation Patterns</a></em> by Terence Parr. It’s a terrific book. I think it has the clearest explanation of the topic I’ve ever seen and it also covers class scopes, a subject that I’m not going to cover in the series because we will not be discussing object-oriented&nbsp;Pascal.</p> <p><span class="caps">P.P.</span>S.: If you can’t wait and want to start digging into compilers, I highly recommend the freely available classic by Jack Crenshaw <a href="http://compilers.iecc.com/crenshaw/">&#8220;Let’s Build a&nbsp;Compiler.&#8221;</a></p> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 10.2016-08-04T09:15:00-04:002016-08-04T09:15:00-04:00Ruslan Spivaktag:ruslanspivak.com,2016-08-04:/lsbasi-part10/<p>Today we will continue closing the gap between where we are right now and where we want to be: <a href="/lsbasi-part1/">a fully functional interpreter for a subset of Pascal programming language</a>.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part10/lsbasi_part10_intro.png"></p> <p>In this article we will update our interpreter to parse and interpret our very first complete Pascal program. The program …</p><p>Today we will continue closing the gap between where we are right now and where we want to be: <a href="/lsbasi-part1/">a fully functional interpreter for a subset of Pascal programming language</a>.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part10/lsbasi_part10_intro.png"></p> <p>In this article we will update our interpreter to parse and interpret our very first complete Pascal program. The program can also be compiled by the <a href="http://www.freepascal.org/">Free Pascal compiler, <em>fpc</em></a>.</p> <p>Here is the program&nbsp;itself:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part10</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">number</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">,</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{Part10}</span> <span class="k">BEGIN</span> <span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">a</span> <span class="o">:=</span> <span class="n">number</span><span class="o">;</span> <span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">number</span> <span class="k">DIV</span> <span class="mi">4</span><span class="o">;</span> <span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span> <span class="k">END</span><span class="o">;</span> <span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span> <span class="n">y</span> <span class="o">:=</span> <span class="mi">20</span> <span class="o">/</span> <span class="mi">7</span> <span class="o">+</span> <span class="mf">3.14</span><span class="o">;</span> <span class="cm">{ writeln(&#39;a = &#39;, a); }</span> <span class="cm">{ writeln(&#39;b = &#39;, b); }</span> <span class="cm">{ writeln(&#39;c = &#39;, c); }</span> <span class="cm">{ writeln(&#39;number = &#39;, number); }</span> <span class="cm">{ writeln(&#39;x = &#39;, x); }</span> <span class="cm">{ writeln(&#39;y = &#39;, y); }</span> <span class="k">END</span><span class="o">.</span> <span class="cm">{Part10}</span> </pre></div> <p>Before we start digging into the details, download the source code of the interpreter from <a href="https://github.com/rspivak/lsbasi/blob/master/part10/python/spi.py">GitHub</a> and the <a href="https://github.com/rspivak/lsbasi/blob/master/part10/python/part10.pas">Pascal source code above</a>, and try it on the command&nbsp;line:</p> <div class="highlight"><pre><span></span>$ python spi.py part10.pas <span class="nv">a</span> <span class="o">=</span> <span class="m">2</span> <span class="nv">b</span> <span class="o">=</span> <span class="m">25</span> <span class="nv">c</span> <span class="o">=</span> <span class="m">27</span> <span class="nv">number</span> <span class="o">=</span> <span class="m">2</span> <span class="nv">x</span> <span class="o">=</span> <span class="m">11</span> <span class="nv">y</span> <span class="o">=</span> <span class="m">5</span>.99714285714 </pre></div> <p></br> If I remove the comments around the <em>writeln</em> statements in the <a href="https://github.com/rspivak/lsbasi/blob/master/part10/python/part10.pas">part10.pas</a> file, compile the source code with <a href="http://www.freepascal.org/"><em>fpc</em></a> and then run the produced executable, this is what I get on my&nbsp;laptop:</p> <div class="highlight"><pre><span></span>$ fpc part10.pas $ ./part10 <span class="nv">a</span> <span class="o">=</span> <span class="m">2</span> <span class="nv">b</span> <span class="o">=</span> <span class="m">25</span> <span class="nv">c</span> <span class="o">=</span> <span class="m">27</span> <span class="nv">number</span> <span class="o">=</span> <span class="m">2</span> <span class="nv">x</span> <span class="o">=</span> <span class="m">11</span> <span class="nv">y</span> <span class="o">=</span> <span class="m">5</span>.99714285714286E+000 </pre></div> <p></br> Okay, let’s see what we’re going cover&nbsp;today:</p> <ol> <li>We will learn how to parse and interpret the Pascal <strong><em><span class="caps">PROGRAM</span></em></strong>&nbsp;header</li> <li>We will learn how to parse Pascal variable&nbsp;declarations</li> <li>We will update our interpreter to use the <strong><em><span class="caps">DIV</span></em></strong> keyword for integer division and a forward slash / for float&nbsp;division</li> <li>We will add support for Pascal&nbsp;comments</li> </ol> <p></br> Let’s dive in and look at the grammar changes first. Today we will add some new rules and update some of the existing rules. <img alt="" src="https://ruslanspivak.com/lsbasi-part10/lsbasi_part10_grammar1.png"> <img alt="" src="https://ruslanspivak.com/lsbasi-part10/lsbasi_part10_grammar2.png"></p> <ol> <li> <p>The <em><strong>program</strong></em> definition grammar rule is updated to include the <em><strong><span class="caps">PROGRAM</span></strong></em> reserved keyword, the program name, and a block that ends with a dot. Here is an example of a complete Pascal&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part10</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="k">END</span><span class="o">.</span> </pre></div> </li> <li> <p>The <em><strong>block</strong></em> rule combines a <em>declarations</em> rule and a <em>compound_statement</em> rule. We’ll also use the rule later in the series when we add procedure declarations. Here is an example of a&nbsp;block:</p> <div class="highlight"><pre><span></span><span class="k">VAR</span> <span class="n">number</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="k">END</span> </pre></div> <p>Here is another&nbsp;example:</p> <div class="highlight"><pre><span></span><span class="k">BEGIN</span> <span class="k">END</span> </pre></div> </li> <li> <p>Pascal declarations have several parts and each part is optional. In this article, we’ll cover the variable declaration part only. The <em><strong>declarations</strong></em> rule has either a variable declaration sub-rule or it’s&nbsp;empty.</p> </li> <li> <p>Pascal is a statically typed language, which means that every variable needs a variable declaration that explicitly specifies its type. In Pascal, variables must be declared before they are used. This is achieved by declaring variables in the program variable declaration section using the <em><strong><span class="caps">VAR</span></strong></em> reserved keyword. You can define variables like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">VAR</span> <span class="n">number</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">,</span> <span class="n">x</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> </pre></div> </li> <li> <p>The <em><strong>type_spec</strong></em> rule is for handling <em><span class="caps">INTEGER</span></em> and <em><span class="caps">REAL</span></em> types and is used in variable declarations. In the example&nbsp;below</p> <div class="highlight"><pre><span></span><span class="k">VAR</span> <span class="n">a</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> </pre></div> <p>the variable &#8220;a&#8221; is declared with the type <em><span class="caps">INTEGER</span></em> and the variable &#8220;b&#8221; is declared with the type <em><span class="caps">REAL</span></em> (float). In this article we won’t enforce type checking, but we will add type checking later in the&nbsp;series.</p> </li> <li> <p>The <em><strong>term</strong></em> rule is updated to use the <em><strong><span class="caps">DIV</span></strong></em> keyword for integer division and a forward slash / for float&nbsp;division.</p> <p>Before, dividing 20 by 7 using a forward slash would produce an <span class="caps">INTEGER</span>&nbsp;2:</p> <div class="highlight"><pre><span></span>20 / 7 = 2 </pre></div> <p>Now, dividing 20 by 7 using a forward slash will produce a <span class="caps">REAL</span> (floating point number) 2.85714285714&nbsp;:</p> <div class="highlight"><pre><span></span>20 / 7 = 2.85714285714 </pre></div> <p>From now on, to get an <span class="caps">INTEGER</span> instead of a <span class="caps">REAL</span>, you need to use the <em><strong><span class="caps">DIV</span></strong></em>&nbsp;keyword:</p> <div class="highlight"><pre><span></span>20 DIV 7 = 2 </pre></div> </li> <li> <p>The <em><strong>factor</strong></em> rule is updated to handle both integer and real (float) constants. I also removed the <span class="caps">INTEGER</span> sub-rule because the constants will be represented by <em><strong>INTEGER_CONST</strong></em> and <em><strong>REAL_CONST</strong></em> tokens and the <em><strong><span class="caps">INTEGER</span></strong></em> token will be used to represent the integer type. In the example below the lexer will generate an <em>INTEGER_CONST</em> token for 20 and 7 and a <em>REAL_CONST</em> token for 3.14&nbsp;:</p> <div class="highlight"><pre><span></span><span class="n">y</span> <span class="o">:=</span> <span class="mi">20</span> <span class="o">/</span> <span class="mi">7</span> <span class="o">+</span> <span class="mf">3.14</span><span class="o">;</span> </pre></div> </li> </ol> <p></br> Here is our complete grammar for&nbsp;today:</p> <div class="highlight"><pre><span></span> program : PROGRAM variable SEMI block DOT block : declarations compound_statement declarations : VAR (variable_declaration SEMI)+ | empty variable_declaration : ID (COMMA ID)* COLON type_spec type_spec : INTEGER | REAL compound_statement : BEGIN statement_list END statement_list : statement | statement SEMI statement_list statement : compound_statement | assignment_statement | empty assignment_statement : variable ASSIGN expr empty : expr : term ((PLUS | MINUS) term)* term : factor ((MUL | INTEGER_DIV | FLOAT_DIV) factor)* factor : PLUS factor | MINUS factor | INTEGER_CONST | REAL_CONST | LPAREN expr RPAREN | variable variable: ID </pre></div> <p>In the rest of the article we’ll go through the same drill we went through last&nbsp;time:</p> <ol> <li>Update the&nbsp;lexer</li> <li>Update the&nbsp;parser</li> <li>Update the&nbsp;interpreter</li> </ol> <p></br> <strong>Updating the&nbsp;Lexer</strong></p> <p>Here is a summary of the lexer&nbsp;changes:</p> <ol> <li>New&nbsp;tokens</li> <li>New and updated reserved&nbsp;keywords</li> <li>New <em>skip_comment</em> method to handle Pascal&nbsp;comments</li> <li>Rename the <em>integer</em> method and make some changes to the method&nbsp;itself</li> <li>Update the <em>get_next_token</em> method to return new&nbsp;tokens</li> </ol> <p>Let’s dig into the changes mentioned&nbsp;above:</p> <ol> <li> <p>To handle a program header, variable declarations, integer and float constants as well as integer and float division, we need to add some new tokens - some of which are reserved keywords - and we also need to update the meaning of the <span class="caps">INTEGER</span> token to represent the integer type and not an integer constant. Here is a complete list of new and updated&nbsp;tokens:</p> <ul> <li><span class="caps">PROGRAM</span> (reserved&nbsp;keyword)</li> <li><span class="caps">VAR</span> (reserved&nbsp;keyword)</li> <li><span class="caps">COLON</span>&nbsp;(:)</li> <li><span class="caps">COMMA</span>&nbsp;(,)</li> <li><span class="caps">INTEGER</span> (we change it to mean integer type and not integer constant like 3 or&nbsp;5)</li> <li><span class="caps">REAL</span> (for Pascal <span class="caps">REAL</span>&nbsp;type)</li> <li>INTEGER_CONST (for example, 3 or&nbsp;5)</li> <li>REAL_CONST (for example, 3.14 and so&nbsp;on)</li> <li>INTEGER_DIV for integer division (the <em><strong><span class="caps">DIV</span></strong></em> reserved&nbsp;keyword)</li> <li>FLOAT_DIV for float division ( forward slash /&nbsp;)</li> </ul> </li> <li> <p>Here is the complete mapping of reserved keywords to&nbsp;tokens:</p> <div class="highlight"><pre><span></span><span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="p">{</span> <span class="s1">&#39;PROGRAM&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;PROGRAM&#39;</span><span class="p">,</span> <span class="s1">&#39;PROGRAM&#39;</span><span class="p">),</span> <span class="s1">&#39;VAR&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;VAR&#39;</span><span class="p">,</span> <span class="s1">&#39;VAR&#39;</span><span class="p">),</span> <span class="s1">&#39;DIV&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;INTEGER_DIV&#39;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">),</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">),</span> <span class="s1">&#39;REAL&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;REAL&#39;</span><span class="p">,</span> <span class="s1">&#39;REAL&#39;</span><span class="p">),</span> <span class="s1">&#39;BEGIN&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;BEGIN&#39;</span><span class="p">,</span> <span class="s1">&#39;BEGIN&#39;</span><span class="p">),</span> <span class="s1">&#39;END&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;END&#39;</span><span class="p">,</span> <span class="s1">&#39;END&#39;</span><span class="p">),</span> <span class="p">}</span> </pre></div> </li> <li> <p>We’re adding the <em>skip_comment</em> method to handle Pascal comments. The method is pretty basic and all it does is discard all the characters until the closing curly brace is&nbsp;found:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">skip_comment</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">!=</span> <span class="s1">&#39;}&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="c1"># the closing curly brace</span> </pre></div> </li> <li> <p>We are renaming the <em>integer</em> method the <em>number</em> method. It can handle both integer constants and float constants like 3 and&nbsp;3.14:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">number</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer or float consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;.&#39;</span><span class="p">:</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">while</span> <span class="p">(</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">()</span> <span class="p">):</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="n">token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;REAL_CONST&#39;</span><span class="p">,</span> <span class="nb">float</span><span class="p">(</span><span class="n">result</span><span class="p">))</span> <span class="k">else</span><span class="p">:</span> <span class="n">token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;INTEGER_CONST&#39;</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">))</span> <span class="k">return</span> <span class="n">token</span> </pre></div> </li> <li> <p>We&#8217;re also updating the <em>get_next_token</em> method to return new&nbsp;tokens:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="o">...</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;{&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_comment</span><span class="p">()</span> <span class="k">continue</span> <span class="o">...</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">number</span><span class="p">()</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;:&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">COLON</span><span class="p">,</span> <span class="s1">&#39;:&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;,&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">COMMA</span><span class="p">,</span> <span class="s1">&#39;,&#39;</span><span class="p">)</span> <span class="o">...</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;/&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">FLOAT_DIV</span><span class="p">,</span> <span class="s1">&#39;/&#39;</span><span class="p">)</span> <span class="o">...</span> </pre></div> </li> </ol> <p></br> <strong>Updating the&nbsp;Parser</strong></p> <p>Now onto the parser&nbsp;changes.</p> <p>Here is a summary of the&nbsp;changes:</p> <ol> <li>New <span class="caps">AST</span> nodes: <em>Program</em>, <em>Block</em>, <em>VarDecl</em>, <em>Type</em></li> <li>New methods corresponding to new grammar rules: <em>block</em>, <em>declarations</em>, <em>variable_declaration</em>, and <em>type_spec</em>.</li> <li>Updates to the existing parser methods: <em>program</em>, <em>term</em>, and <em>factor</em></li> </ol> <p>Let&#8217;s go over the changes one by&nbsp;one:</p> <ol> <li> <p>We&#8217;ll start with new <span class="caps">AST</span> nodes first. There are four new&nbsp;nodes:</p> <ul> <li> <p>The <em>Program</em> <span class="caps">AST</span> node represents a program and will be our root&nbsp;node</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Program</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">block</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span> <span class="bp">self</span><span class="o">.</span><span class="n">block</span> <span class="o">=</span> <span class="n">block</span> </pre></div> </li> <li> <p>The <em>Block</em> <span class="caps">AST</span> node holds declarations and a compound&nbsp;statement:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Block</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">declarations</span><span class="p">,</span> <span class="n">compound_statement</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">declarations</span> <span class="o">=</span> <span class="n">declarations</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span> <span class="o">=</span> <span class="n">compound_statement</span> </pre></div> </li> <li> <p>The <em>VarDecl</em> <span class="caps">AST</span> node represents a variable declaration. It holds a variable node and a type&nbsp;node:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">VarDecl</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">var_node</span><span class="p">,</span> <span class="n">type_node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">var_node</span> <span class="o">=</span> <span class="n">var_node</span> <span class="bp">self</span><span class="o">.</span><span class="n">type_node</span> <span class="o">=</span> <span class="n">type_node</span> </pre></div> </li> <li> <p>The <em>Type</em> <span class="caps">AST</span> node represents a variable type (<span class="caps">INTEGER</span> or <span class="caps">REAL</span>):</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Type</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> </pre></div> </li> </ul> </li> <li> <p>As you probably remember, each rule from the grammar has a corresponding method in our recursive-descent parser. Today we’re adding four new methods: <em>block</em>, <em>declarations</em>, <em>variable_declaration</em>, and <em>type_spec</em>. These methods are responsible for parsing new language constructs and constructing new <span class="caps">AST</span>&nbsp;nodes:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">block</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;block : declarations compound_statement&quot;&quot;&quot;</span> <span class="n">declaration_nodes</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">declarations</span><span class="p">()</span> <span class="n">compound_statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">()</span> <span class="n">node</span> <span class="o">=</span> <span class="n">Block</span><span class="p">(</span><span class="n">declaration_nodes</span><span class="p">,</span> <span class="n">compound_statement_node</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">declarations</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;declarations : VAR (variable_declaration SEMI)+</span> <span class="sd"> | empty</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">declarations</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">VAR</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">VAR</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">ID</span><span class="p">:</span> <span class="n">var_decl</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable_declaration</span><span class="p">()</span> <span class="n">declarations</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">var_decl</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span> <span class="k">return</span> <span class="n">declarations</span> <span class="k">def</span> <span class="nf">variable_declaration</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;variable_declaration : ID (COMMA ID)* COLON type_spec&quot;&quot;&quot;</span> <span class="n">var_nodes</span> <span class="o">=</span> <span class="p">[</span><span class="n">Var</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="p">)]</span> <span class="c1"># first ID</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">COMMA</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">COMMA</span><span class="p">)</span> <span class="n">var_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">Var</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">COLON</span><span class="p">)</span> <span class="n">type_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">type_spec</span><span class="p">()</span> <span class="n">var_declarations</span> <span class="o">=</span> <span class="p">[</span> <span class="n">VarDecl</span><span class="p">(</span><span class="n">var_node</span><span class="p">,</span> <span class="n">type_node</span><span class="p">)</span> <span class="k">for</span> <span class="n">var_node</span> <span class="ow">in</span> <span class="n">var_nodes</span> <span class="p">]</span> <span class="k">return</span> <span class="n">var_declarations</span> <span class="k">def</span> <span class="nf">type_spec</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;type_spec : INTEGER</span> <span class="sd"> | REAL</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">REAL</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">Type</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> </pre></div> </li> <li> <p>We also need to update the <em>program</em>, <em>term</em>, and, <em>factor</em> methods to accommodate our grammar&nbsp;changes:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">program</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;program : PROGRAM variable SEMI block DOT&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PROGRAM</span><span class="p">)</span> <span class="n">var_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="p">()</span> <span class="n">prog_name</span> <span class="o">=</span> <span class="n">var_node</span><span class="o">.</span><span class="n">value</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span> <span class="n">block_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">block</span><span class="p">()</span> <span class="n">program_node</span> <span class="o">=</span> <span class="n">Program</span><span class="p">(</span><span class="n">prog_name</span><span class="p">,</span> <span class="n">block_node</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DOT</span><span class="p">)</span> <span class="k">return</span> <span class="n">program_node</span> <span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;term : factor ((MUL | INTEGER_DIV | FLOAT_DIV) factor)*&quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">INTEGER_DIV</span><span class="p">,</span> <span class="n">FLOAT_DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER_DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER_DIV</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">FLOAT_DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">FLOAT_DIV</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="n">node</span><span class="p">,</span> <span class="n">op</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : PLUS factor</span> <span class="sd"> | MINUS factor</span> <span class="sd"> | INTEGER_CONST</span> <span class="sd"> | REAL_CONST</span> <span class="sd"> | LPAREN expr RPAREN</span> <span class="sd"> | variable</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER_CONST</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER_CONST</span><span class="p">)</span> <span class="k">return</span> <span class="n">Num</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">REAL_CONST</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">REAL_CONST</span><span class="p">)</span> <span class="k">return</span> <span class="n">Num</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">LPAREN</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">else</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="p">()</span> <span class="k">return</span> <span class="n">node</span> </pre></div> </li> </ol> <p></br> Now, let&#8217;s see what the <em><strong>Abstract Syntax Tree</strong></em> looks like with the new nodes. Here is a small working Pascal&nbsp;program:</p> <div class="highlight"><pre><span></span><span class="k">PROGRAM</span> <span class="n">Part10AST</span><span class="o">;</span> <span class="k">VAR</span> <span class="n">a</span><span class="o">,</span> <span class="n">b</span> <span class="o">:</span> <span class="kt">INTEGER</span><span class="o">;</span> <span class="n">y</span> <span class="o">:</span> <span class="kt">REAL</span><span class="o">;</span> <span class="k">BEGIN</span> <span class="cm">{Part10AST}</span> <span class="n">a</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="k">DIV</span> <span class="mi">4</span><span class="o">;</span> <span class="n">y</span> <span class="o">:=</span> <span class="mi">20</span> <span class="o">/</span> <span class="mi">7</span> <span class="o">+</span> <span class="mf">3.14</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> <span class="cm">{Part10AST}</span> </pre></div> <p>Let&#8217;s generate an <span class="caps">AST</span> and visualize it with the <a href="https://github.com/rspivak/lsbasi/blob/master/part10/python/genastdot.py">genastdot.py</a>:</p> <div class="highlight"><pre><span></span>$ python genastdot.py part10ast.pas &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part10/lsbasi_part10_ast.png"></p> <p>In the picture you can see the new nodes that we have&nbsp;added.</p> <p></br> <strong>Updating the&nbsp;Interpreter</strong></p> <p>We&#8217;re done with the lexer and parser changes. What&#8217;s left is to add new visitor methods to our <em>Interpreter</em> class. There will be four new methods to visit our new&nbsp;nodes:</p> <ul> <li><em>visit_Program</em></li> <li><em>visit_Block</em></li> <li><em>visit_VarDecl</em></li> <li><em>visit_Type</em></li> </ul> <p>They are pretty straightforward. You can also see that the <em>Interpreter</em> does nothing with <em>VarDecl</em> and <em>Type</em>&nbsp;nodes:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Program</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">block</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Block</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">declaration</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">declarations</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">declaration</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_VarDecl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="c1"># Do nothing</span> <span class="k">pass</span> <span class="k">def</span> <span class="nf">visit_Type</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="c1"># Do nothing</span> <span class="k">pass</span> </pre></div> <p>We also need to update the <em>visit_BinOp</em> method to properly interpret integer and float&nbsp;divisions:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_BinOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER_DIV</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">//</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">FLOAT_DIV</span><span class="p">:</span> <span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">))</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">))</span> </pre></div> <p></br> Let’s sum up what we had to do to extend the Pascal interpreter in this&nbsp;article:</p> <ul> <li>Add new rules to the grammar and update some existing&nbsp;rules</li> <li>Add new tokens and supporting methods to the lexer, update and modify some existing&nbsp;methods</li> <li>Add new <span class="caps">AST</span> nodes to the parser for new language&nbsp;constructs</li> <li>Add new methods corresponding to the new grammar rules to our recursive-descent parser and update some existing&nbsp;methods</li> <li>Add new visitor methods to the interpreter and update one existing visitor&nbsp;method</li> </ul> <p>As a result of our changes we also got rid of some of the hacks I introduced in <a href="/lsbasi-part9/">Part 9</a>,&nbsp;namely:</p> <ul> <li>Our interpreter can now handle the <em><strong><span class="caps">PROGRAM</span></strong></em>&nbsp;header</li> <li>Variables can now be declared using the <em><strong><span class="caps">VAR</span></strong></em>&nbsp;keyword</li> <li>The <em><strong><span class="caps">DIV</span></strong></em> keyword is used for integer division and a forward slash / is used for float&nbsp;division</li> </ul> <p></br> If you haven&#8217;t done so yet, then, as an exercise, re-implement the interpreter in this article without looking at the source code and use <a href="https://github.com/rspivak/lsbasi/blob/master/part10/python/part10.pas">part10.pas</a> as your test input&nbsp;file.</p> <p></br> That&#8217;s all for today. In the next article, I’ll talk in greater detail about symbol table management. Stay tuned and see you&nbsp;soon!</p> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 9.2016-05-01T06:10:00-04:002016-05-01T06:10:00-04:00Ruslan Spivaktag:ruslanspivak.com,2016-05-01:/lsbasi-part9/<p>I remember when I was in university (a long time ago) and learning systems programming, I believed that the only &#8220;real&#8221; languages were Assembly and C. And Pascal was - how to put it nicely - a very high-level language used by application developers who didn’t want to know what was …</p><p>I remember when I was in university (a long time ago) and learning systems programming, I believed that the only &#8220;real&#8221; languages were Assembly and C. And Pascal was - how to put it nicely - a very high-level language used by application developers who didn’t want to know what was going on under the&nbsp;hood.</p> <p>Little did I know back then that I would be writing almost everything in Python (and love every bit of it) to pay my bills and that I would also be writing an interpreter and compiler for Pascal for the reasons I stated in <a href="/lsbasi-part1/">the very first article of the series</a>.</p> <p>These days, I consider myself a programming languages enthusiast, and I’m fascinated by all languages and their unique features. Having said that, I have to note that I enjoy using certain languages way more than others. I am biased and I’ll be the first one to admit that.&nbsp;:)</p> <p>This is me&nbsp;before:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_story_before.png" width="720"></p> <p>And&nbsp;now:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_story_now.png" width="720"></p> <p>Okay, let’s get down to business. Here is what you’re going to learn&nbsp;today:</p> <ol> <li>How to parse and interpret a Pascal program&nbsp;definition.</li> <li>How to parse and interpret compound&nbsp;statements.</li> <li>How to parse and interpret assignment statements, including&nbsp;variables.</li> <li>A bit about symbol tables and how to store and lookup&nbsp;variables.</li> </ol> <p>I’ll use the following sample Pascal-like program to introduce new&nbsp;concepts:</p> <div class="highlight"><pre><span></span><span class="k">BEGIN</span> <span class="k">BEGIN</span> <span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">a</span> <span class="o">:=</span> <span class="n">number</span><span class="o">;</span> <span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">number</span> <span class="o">/</span> <span class="mi">4</span><span class="o">;</span> <span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span> <span class="k">END</span><span class="o">;</span> <span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> <p>You could say that that’s quite a jump from the command line interpreter you wrote so far by following the previous articles in the series, but it’s a jump that I hope will bring excitement. It’s not &#8220;just&#8221; a calculator anymore, we’re getting serious here, Pascal serious.&nbsp;:)</p> <p>Let’s dive in and look at syntax diagrams for new language constructs and their corresponding grammar&nbsp;rules.</p> <p>On your marks: Ready. Set.&nbsp;Go!</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_syntax_diagram_01.png" width="720"> <img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_syntax_diagram_02.png" width="720"> <img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_syntax_diagram_03.png" width="720"></p> <ol> <li> <p>I’ll start with describing what a Pascal <em>program</em> is. A Pascal <em><strong>program</strong></em> consists of a <em>compound statement</em> that ends with a dot. Here is an example of a&nbsp;program:</p> <div class="highlight"><pre><span></span>“BEGIN END.” </pre></div> <p>I have to note that this is not a complete program definition, and we’ll extend it later in the&nbsp;series.</p> </li> <li> <p>What is a <em>compound statement</em>? A <em><strong>compound statement</strong></em> is a block marked with <span class="caps">BEGIN</span> and <span class="caps">END</span> that can contain a list (possibly empty) of statements including other compound statements. Every statement inside the compound statement, except for the last one, must terminate with a semicolon. The last statement in the block may or may not have a terminating semicolon. Here are some examples of valid compound&nbsp;statements:</p> <div class="highlight"><pre><span></span>“BEGIN END” “BEGIN a := 5; x := 11 END” “BEGIN a := 5; x := 11; END” “BEGIN BEGIN a := 5 END; x := 11 END” </pre></div> </li> <li> <p>A <em><strong>statement list</strong></em> is a list of zero or more statements inside a compound statement. See above for some&nbsp;examples.</p> </li> <li> <p>A <em><strong>statement</strong></em> can be a <em>compound statement</em>, an <em>assignment statement</em>, or it can be an <em>empty</em>&nbsp;statement.</p> </li> <li> <p>An <em><strong>assignment statement</strong></em> is a variable followed by an <span class="caps">ASSIGN</span> token (two characters, ‘:’ and ‘=’) followed by an&nbsp;expression.</p> <div class="highlight"><pre><span></span>“a := 11” “b := a + 9 - 5 * 2” </pre></div> </li> <li> <p>A <em><strong>variable</strong></em> is an identifier. We’ll use the <span class="caps">ID</span> token for variables. The value of the token will be a variable’s name like ‘a’, ‘number’, and so on. In the following code block ‘a’ and ‘b’ are&nbsp;variables:</p> <div class="highlight"><pre><span></span>“BEGIN a := 11; b := a + 9 - 5 * 2 END” </pre></div> </li> <li> <p>An <em><strong>empty</strong></em> statement represents a grammar rule with no further productions. We use the <em>empty_statement</em> grammar rule to indicate the end of the <em>statement_list</em> in the parser and also to allow for empty compound statements as in ‘<span class="caps">BEGIN</span> <span class="caps">END</span>’.</p> </li> <li> <p>The <em><strong>factor</strong></em> rule is updated to handle&nbsp;variables.</p> </li> </ol> <p><br/> Now let’s take a look at our complete&nbsp;grammar:</p> <div class="highlight"><pre><span></span> program : compound_statement DOT compound_statement : BEGIN statement_list END statement_list : statement | statement SEMI statement_list statement : compound_statement | assignment_statement | empty assignment_statement : variable ASSIGN expr empty : expr: term ((PLUS | MINUS) term)* term: factor ((MUL | DIV) factor)* factor : PLUS factor | MINUS factor | INTEGER | LPAREN expr RPAREN | variable variable: ID </pre></div> <p>You probably noticed that I didn’t use the star <strong>‘*’</strong> symbol in the <em>compound_statement</em> rule to represent zero or more repetitions, but instead explicitly specified the <em>statement_list</em> rule. This is another way to represent the ‘zero or more’ operation, and it will come in handy when we look at parser generators like <a href="http://www.dabeaz.com/ply/"><span class="caps">PLY</span></a>, later in the series. I also split the “(<span class="caps">PLUS</span> | <span class="caps">MINUS</span>) factor” sub-rule into two separate&nbsp;rules.</p> <p><br/> In order to support the updated grammar, we need to make a number of changes to our lexer, parser, and interpreter. Let’s go over those changes one by&nbsp;one.</p> <p>Here is the summary of the changes in our lexer: <img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_lexer.png" width="720"></p> <ol> <li> <p>To support a Pascal program’s definition, compound statements, assignment statements, and variables, our lexer needs to return new&nbsp;tokens:</p> <ul> <li><span class="caps">BEGIN</span> (to mark the beginning of a compound&nbsp;statement)</li> <li><span class="caps">END</span> (to mark the end of the compound&nbsp;statement)</li> <li><span class="caps">DOT</span> (a token for a dot character ‘.’ required by a Pascal program’s&nbsp;definition)</li> <li><span class="caps">ASSIGN</span> (a token for a two character sequence ‘:=’). In Pascal, an assignment operator is different than in many other languages like C, Python, Java, Rust, or Go, where you would use single character ‘=’ to indicate&nbsp;assignment</li> <li><span class="caps">SEMI</span> (a token for a semicolon character ‘;’ that is used to mark the end of a statement inside a compound&nbsp;statement)</li> <li><span class="caps">ID</span> (A token for a valid identifier. Identifiers start with an alphabetical character followed by any number of alphanumerical&nbsp;characters)</li> </ul> </li> <li> <p>Sometimes, in order to be able to differentiate between different tokens that start with the same character, (‘:’ vs ‘:=’ or ‘==’ vs ‘=&gt;’ ) we need to peek into the input buffer without actually consuming the next character. For this particular purpose, I introduced a <em>peek</em> method that will help us tokenize assignment statements. The method is not strictly required, but I thought I would introduce it earlier in the series and it will also make the <em>get_next_token</em> method a bit cleaner. All it does is return the next character from the text buffer without incrementing the <em>self.pos</em> variable. Here is the method&nbsp;itself:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">peek</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">peek_pos</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">peek_pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="k">return</span> <span class="bp">None</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="n">peek_pos</span><span class="p">]</span> </pre></div> </li> <li> <p>Because Pascal variables and reserved keywords are both identifiers, we will combine their handling into one method called <em>_id</em>. The way it works is that the lexer consumes a sequence of alphanumerical characters and then checks if the character sequence is a reserved word. If it is, it returns a pre-constructed token for that reserved keyword. And if it’s not a reserved keyword, it returns a new <span class="caps">ID</span> token whose value is the character string (lexeme). I bet at this point you think, “Gosh, just show me the code.” :) Here it&nbsp;is:</p> <div class="highlight"><pre><span></span><span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="p">{</span> <span class="s1">&#39;BEGIN&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;BEGIN&#39;</span><span class="p">,</span> <span class="s1">&#39;BEGIN&#39;</span><span class="p">),</span> <span class="s1">&#39;END&#39;</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">&#39;END&#39;</span><span class="p">,</span> <span class="s1">&#39;END&#39;</span><span class="p">),</span> <span class="p">}</span> <span class="k">def</span> <span class="nf">_id</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Handle identifiers and reserved keywords&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isalnum</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="n">token</span> <span class="o">=</span> <span class="n">RESERVED_KEYWORDS</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">Token</span><span class="p">(</span><span class="n">ID</span><span class="p">,</span> <span class="n">result</span><span class="p">))</span> <span class="k">return</span> <span class="n">token</span> </pre></div> </li> <li> <p>And now let’s take a look at the changes in the main lexer method <em>get_next_token</em>:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="o">...</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isalpha</span><span class="p">():</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_id</span><span class="p">()</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;:&#39;</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="o">==</span> <span class="s1">&#39;=&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">ASSIGN</span><span class="p">,</span> <span class="s1">&#39;:=&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;;&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">SEMI</span><span class="p">,</span> <span class="s1">&#39;;&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;.&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">DOT</span><span class="p">,</span> <span class="s1">&#39;.&#39;</span><span class="p">)</span> <span class="o">...</span> </pre></div> </li> </ol> <p>It’s time to see our shiny new lexer in all its glory and action. Download the source code from <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python">GitHub</a> and launch your Python shell from the same directory where you saved the <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python/spi.py">spi.py</a>&nbsp;file:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; from spi import Lexer &gt;&gt;&gt; <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span><span class="s1">&#39;BEGIN a := 2; END.&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>BEGIN, <span class="s1">&#39;BEGIN&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>ID, <span class="s1">&#39;a&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>ASSIGN, <span class="s1">&#39;:=&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>INTEGER, <span class="m">2</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>SEMI, <span class="s1">&#39;;&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>END, <span class="s1">&#39;END&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>DOT, <span class="s1">&#39;.&#39;</span><span class="o">)</span> &gt;&gt;&gt; lexer.get_next_token<span class="o">()</span> Token<span class="o">(</span>EOF, None<span class="o">)</span> &gt;&gt;&gt; </pre></div> <p><br/> Moving on to parser&nbsp;changes.</p> <p>Here is the summary of changes in our parser: <img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_parser.png" width="720"></p> <ol> <li> <p>Let’s start with new <span class="caps">AST</span>&nbsp;nodes:</p> <ul> <li> <p><em>Compound</em> <span class="caps">AST</span> node represents a compound statement. It contains a list of statement nodes in its <em>children</em>&nbsp;variable.</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Compound</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Represents a &#39;BEGIN ... END&#39; block&quot;&quot;&quot;</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">children</span> <span class="o">=</span> <span class="p">[]</span> </pre></div> </li> <li> <p><em>Assign</em> <span class="caps">AST</span> node represents an assignment statement. Its <em>left</em> variable is for storing a <em>Var</em> node and its <em>right</em> variable is for storing a node returned by the expr parser&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Assign</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">right</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">left</span> <span class="o">=</span> <span class="n">left</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">op</span> <span class="o">=</span> <span class="n">op</span> <span class="bp">self</span><span class="o">.</span><span class="n">right</span> <span class="o">=</span> <span class="n">right</span> </pre></div> </li> <li> <p><em>Var</em> <span class="caps">AST</span> node (you guessed it) represents a variable. The <em>self.value</em> holds the variable’s&nbsp;name.</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Var</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;The Var node is constructed out of ID token.&quot;&quot;&quot;</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> </pre></div> </li> <li> <p><em>NoOp</em> node is used to represent an <em>empty</em> statement. For example ‘<span class="caps">BEGIN</span> <span class="caps">END</span>’ is a valid compound statement that has no&nbsp;statements.</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">NoOp</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">pass</span> </pre></div> </li> </ul> </li> <li> <p>As you remember, each rule from the grammar has a corresponding method in our recursive-descent parser. This time we’re adding seven new methods. These methods are responsible for parsing new language constructs and constructing new <span class="caps">AST</span> nodes. They are pretty&nbsp;straightforward:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">program</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;program : compound_statement DOT&quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DOT</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> compound_statement: BEGIN statement_list END</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">BEGIN</span><span class="p">)</span> <span class="n">nodes</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">statement_list</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">END</span><span class="p">)</span> <span class="n">root</span> <span class="o">=</span> <span class="n">Compound</span><span class="p">()</span> <span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">nodes</span><span class="p">:</span> <span class="n">root</span><span class="o">.</span><span class="n">children</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">return</span> <span class="n">root</span> <span class="k">def</span> <span class="nf">statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> statement_list : statement</span> <span class="sd"> | statement SEMI statement_list</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">statement</span><span class="p">()</span> <span class="n">results</span> <span class="o">=</span> <span class="p">[</span><span class="n">node</span><span class="p">]</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">SEMI</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span> <span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">statement</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">ID</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">results</span> <span class="k">def</span> <span class="nf">statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> statement : compound_statement</span> <span class="sd"> | assignment_statement</span> <span class="sd"> | empty</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">BEGIN</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">()</span> <span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">ID</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">assignment_statement</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">empty</span><span class="p">()</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">assignment_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> assignment_statement : variable ASSIGN expr</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="p">()</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ASSIGN</span><span class="p">)</span> <span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="n">node</span> <span class="o">=</span> <span class="n">Assign</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">variable</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> variable : ID</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="n">Var</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">empty</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;An empty production&quot;&quot;&quot;</span> <span class="k">return</span> <span class="n">NoOp</span><span class="p">()</span> </pre></div> </li> <li> <p>We also need to update the existing <em>factor</em> method to parse&nbsp;variables:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : PLUS factor</span> <span class="sd"> | MINUS factor</span> <span class="sd"> | INTEGER</span> <span class="sd"> | LPAREN expr RPAREN</span> <span class="sd"> | variable</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="o">...</span> <span class="k">else</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="p">()</span> <span class="k">return</span> <span class="n">node</span> </pre></div> </li> <li> <p>The parser’s <em>parse</em> method is updated to start the parsing process by parsing a program&nbsp;definition:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">program</span><span class="p">()</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">node</span> </pre></div> </li> </ol> <p>Here is our sample program&nbsp;again:</p> <div class="highlight"><pre><span></span><span class="k">BEGIN</span> <span class="k">BEGIN</span> <span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">a</span> <span class="o">:=</span> <span class="n">number</span><span class="o">;</span> <span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">number</span> <span class="o">/</span> <span class="mi">4</span><span class="o">;</span> <span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span> <span class="k">END</span><span class="o">;</span> <span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> <p>Let’s visualize it with <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python/genastdot.py">genastdot.py</a> (For brevity, when displaying a <em>Var</em> node, it just shows the node’s variable name and when displaying an Assign node it shows ‘:=’ instead of showing ‘Assign’&nbsp;text):</p> <div class="highlight"><pre><span></span>$ python genastdot.py assignments.txt &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_full_ast.png" width="640"></p> <p><br/> And finally, here are the required interpreter changes: <img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_interpreter.png" width="720"></p> <p>To interpret new <span class="caps">AST</span> nodes, we need to add corresponding visitor methods to the interpreter. There are four new visitor&nbsp;methods:</p> <ul> <li>visit_Compound</li> <li>visit_Assign</li> <li>visit_Var</li> <li>visit_NoOp</li> </ul> <p><em>Compound</em> and <em>NoOp</em> visitor methods are pretty straightforward. The <em>visit_Compound</em> method iterates over its children and visits each one in turn, and the <em>visit_NoOp</em> method does&nbsp;nothing.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Compound</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">children</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">child</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_NoOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p><br/> The <em>Assign</em> and <em>Var</em> visitor methods deserve a closer&nbsp;examination.</p> <p>When we assign a value to a variable, we need to store that value somewhere for when we need it later, and that’s exactly what the <em>visit_Assign</em> method&nbsp;does:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_SCOPE</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> </pre></div> <p>The method stores a key-value pair (a variable name and a value associated with the variable) in a <em>symbol table</em> GLOBAL_SCOPE. What is a <em>symbol table</em>? A <em><strong>symbol table</strong></em> is an abstract data type (<strong><span class="caps">ADT</span></strong>) for tracking various symbols in source code. The only symbol category we have right now is variables and we use the Python dictionary to implement the symbol table <span class="caps">ADT</span>. For now I’ll just say that the way the symbol table is used in this article is pretty “hacky”: it’s not a separate class with special methods but a simple Python dictionary and it also does double duty as a memory space. In future articles, I will be talking about symbol tables in much greater detail, and together we’ll also remove all the&nbsp;hacks.</p> <p>Let’s take a look at an <span class="caps">AST</span> for the statement “a := 3;” and the symbol table before and after the <em>visit_Assign</em> method does its&nbsp;job:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_ast_st01.png" width="720"></p> <p>Now let’s take a look at an <span class="caps">AST</span> for the statement “b := a +&nbsp;7;”</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_ast_only_st02.png" width="280"></p> <p>As you can see, the right-hand side of the assignment statement - “a + 7” - references the variable ‘a’, so before we can evaluate the expression “a + 7” we need to find out what the value of ‘a’ is and that’s the responsibility of the <em>visit_Var</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="n">val</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_SCOPE</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span> <span class="k">if</span> <span class="n">val</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">NameError</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">var_name</span><span class="p">))</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="n">val</span> </pre></div> <p>When the method visits a <em>Var</em> node as in the above <span class="caps">AST</span> picture, it first gets the variable’s name and then uses that name as a key into the <em>GLOBAL_SCOPE</em> dictionary to get the variable’s value. If it can find the value, it returns it, if not - it raises a <em>NameError</em> exception. Here are the contents of the symbol table before evaluating the assignment statement “b := a +&nbsp;7;”:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part11/lsbasi_part9_ast_st02.png" width="720"></p> <p>These are all the changes that we need to do today to make our interpreter tick. At the end of the main program, we simply print the contents of the symbol table GLOBAL_SCOPE to standard&nbsp;output.</p> <p>Let’s take our updated interpreter for a drive both from a Python interactive shell and from the command line. Make sure that you downloaded both the source code for the interpreter and the <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python/assignments.txt">assignments.txt</a> file before&nbsp;testing:</p> <p>Launch your Python&nbsp;shell:</p> <div class="highlight"><pre><span></span>$ python &gt;&gt;&gt; from spi import Lexer, Parser, Interpreter &gt;&gt;&gt; <span class="nv">text</span> <span class="o">=</span> <span class="s2">&quot;&quot;&quot;\</span> <span class="s2">... BEGIN</span> <span class="s2">...</span> <span class="s2">... BEGIN</span> <span class="s2">... number := 2;</span> <span class="s2">... a := number;</span> <span class="s2">... b := 10 * a + 10 * number / 4;</span> <span class="s2">... c := a - - b</span> <span class="s2">... END;</span> <span class="s2">...</span> <span class="s2">... x := 11;</span> <span class="s2">... END.</span> <span class="s2">... &quot;&quot;&quot;</span> &gt;&gt;&gt; <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span>text<span class="o">)</span> &gt;&gt;&gt; <span class="nv">parser</span> <span class="o">=</span> Parser<span class="o">(</span>lexer<span class="o">)</span> &gt;&gt;&gt; <span class="nv">interpreter</span> <span class="o">=</span> Interpreter<span class="o">(</span>parser<span class="o">)</span> &gt;&gt;&gt; interpreter.interpret<span class="o">()</span> &gt;&gt;&gt; print<span class="o">(</span>interpreter.GLOBAL_SCOPE<span class="o">)</span> <span class="o">{</span><span class="s1">&#39;a&#39;</span>: <span class="m">2</span>, <span class="s1">&#39;x&#39;</span>: <span class="m">11</span>, <span class="s1">&#39;c&#39;</span>: <span class="m">27</span>, <span class="s1">&#39;b&#39;</span>: <span class="m">25</span>, <span class="s1">&#39;number&#39;</span>: <span class="m">2</span><span class="o">}</span> </pre></div> <p>And from the command line, using a source file as input to our&nbsp;interpreter:</p> <div class="highlight"><pre><span></span>$ python spi.py assignments.txt <span class="o">{</span><span class="s1">&#39;a&#39;</span>: <span class="m">2</span>, <span class="s1">&#39;x&#39;</span>: <span class="m">11</span>, <span class="s1">&#39;c&#39;</span>: <span class="m">27</span>, <span class="s1">&#39;b&#39;</span>: <span class="m">25</span>, <span class="s1">&#39;number&#39;</span>: <span class="m">2</span><span class="o">}</span> </pre></div> <p>If you haven’t tried it yet, try it now and see for yourself that the interpreter is doing its job&nbsp;properly.</p> <p><br/> Let’s sum up what you had to do to extend the Pascal interpreter in this&nbsp;article:</p> <ol> <li>Add new rules to the&nbsp;grammar</li> <li>Add new tokens and supporting methods to the lexer and update the <em>get_next_token</em>&nbsp;method</li> <li>Add new <span class="caps">AST</span> nodes to the parser for new language&nbsp;constructs</li> <li>Add new methods corresponding to the new grammar rules to our recursive-descent parser and update any existing methods, if necessary (<em>factor</em> method, I’m looking at you.&nbsp;:)</li> <li>Add new visitor methods to the&nbsp;interpreter</li> <li>Add a dictionary for storing variables and for looking them&nbsp;up</li> </ol> <p><br/> In this part I had to introduce a number of “hacks” that we’ll remove as we move forward with the&nbsp;series:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_hacks.png" width="720"></p> <ol> <li>The <em>program</em> grammar rule is incomplete. We’ll extend it later with additional&nbsp;elements.</li> <li>Pascal is a statically typed language, and you must declare a variable and its type before using it. But, as you saw, that was not the case in this&nbsp;article.</li> <li>No type checking so far. It’s not a big deal at this point, but I just wanted to mention it explicitly. Once we add more types to our interpreter we’ll need to report an error when you try to add a string and an integer, for&nbsp;example.</li> <li>A symbol table in this part is a simple Python dictionary that does double duty as a memory space. Worry not: symbol tables are such an important topic that I’ll have several articles dedicated just to them. And memory space (runtime management) is a topic of its&nbsp;own.</li> <li>In our simple calculator from previous articles, we used a forward slash character ‘/’ for denoting integer division. In Pascal, though, you have to use a keyword <em>div</em> to specify integer division (See Exercise&nbsp;1).</li> <li>There is also one hack that I introduced on purpose so that you could fix it in Exercise 2: in Pascal all reserved keywords and identifiers are case insensitive, but the interpreter in this article treats them as case&nbsp;sensitive.</li> </ol> <p><br/> To keep you fit, here are new exercises for&nbsp;you:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part9/lsbasi_part9_exercises.png" width="320"></p> <ol> <li> <p>Pascal variables and reserved keywords are case insensitive, unlike in many other programming languages, so <em><span class="caps">BEGIN</span></em>, <em>begin</em>, and <em>BeGin</em> they all refer to the same reserved keyword. Update the interpreter so that variables and reserved keywords are case insensitive. Use the following program to test&nbsp;it:</p> <div class="highlight"><pre><span></span><span class="k">BEGIN</span> <span class="k">BEGIN</span> <span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span> <span class="n">a</span> <span class="o">:=</span> <span class="n">NumBer</span><span class="o">;</span> <span class="n">B</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">NUMBER</span> <span class="o">/</span> <span class="mi">4</span><span class="o">;</span> <span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span> <span class="k">end</span><span class="o">;</span> <span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span> <span class="k">END</span><span class="o">.</span> </pre></div> </li> <li> <p>I mentioned in the “hacks” section before that our interpreter is using the forward slash character ‘/’ to denote integer division, but instead it should be using Pascal’s reserved keyword <em>div</em> for integer division. Update the interpreter to use the <em>div</em> keyword for integer division, thus eliminating one of the&nbsp;hacks.</p> </li> <li> <p>Update the interpreter so that variables could also start with an underscore as in ‘_num :=&nbsp;5’.</p> </li> </ol> <p><br/> That’s all for today. Stay tuned and see you&nbsp;soon.</p> <p><br/> Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 8.2016-01-18T06:10:00-05:002016-01-18T06:10:00-05:00Ruslan Spivaktag:ruslanspivak.com,2016-01-18:/lsbasi-part8/<p>Today we&#8217;ll talk about <strong>unary operators</strong>, namely unary plus (+) and unary minus (-)&nbsp;operators.</p> <p>A lot of today&#8217;s material is based on the material from the previous article, so if you need a refresher just head back to <a href="http://ruslanspivak.com/lsbasi-part7/" title="Part 7">Part 7</a> and go over it again. Remember: repetition is the …</p><p>Today we&#8217;ll talk about <strong>unary operators</strong>, namely unary plus (+) and unary minus (-)&nbsp;operators.</p> <p>A lot of today&#8217;s material is based on the material from the previous article, so if you need a refresher just head back to <a href="http://ruslanspivak.com/lsbasi-part7/" title="Part 7">Part 7</a> and go over it again. Remember: repetition is the mother of all&nbsp;learning.</p> <p>Having said that, this is what you are going to do&nbsp;today:</p> <ul> <li>extend the grammar to handle unary plus and unary minus&nbsp;operators</li> <li>add a new <em>UnaryOp</em> <span class="caps">AST</span> node&nbsp;class</li> <li>extend the parser to generate an <span class="caps">AST</span> with <em>UnaryOp</em>&nbsp;nodes</li> <li>extend the interpreter and add a new <em>visit_UnaryOp</em> method to interpret unary&nbsp;operators</li> </ul> <p>Let’s get started, shall&nbsp;we?</p> <p>So far we&#8217;ve worked with binary operators only (+, -, *, /), that is, the operators that operate on two&nbsp;operands.</p> <p>What is a unary operator then? A <em>unary operator</em> is an operator that operates on one <em>operand</em>&nbsp;only.</p> <p>Here are the rules for unary plus and unary minus&nbsp;operators:</p> <ul> <li>The unary minus (-) operator produces the negation of its numeric&nbsp;operand</li> <li>The unary plus (+) operator yields its numeric operand without&nbsp;change</li> <li>The unary operators have higher precedence than the binary operators +, -, *, and&nbsp;/</li> </ul> <p>In the expression &#8220;+ - 3&#8221; the first &#8216;+&#8217; operator represents the unary plus operation and the second &#8216;-&#8216; operator represents the unary minus operation. The expression &#8220;+ - 3&#8221; is equivalent to &#8220;+ (- (3))&#8221; which is equal to -3. One could also say that <strong>-3</strong> in the expression is a negative integer, but in our case we treat it as a unary minus operator with 3 as its positive integer&nbsp;operand:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_exp1.png" width="640"></p> <p>Let’s take a look at another expression, &#8220;5 - -&nbsp;2&#8221;:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_exp2.png" width="640"></p> <p>In the expression &#8220;5 - - 2&#8221; the first &#8216;-&#8216; represents the <em>binary</em> subtraction operation and the second &#8216;-&#8216; represents the <em>unary</em> minus operation, the&nbsp;negation.</p> <p>And some more&nbsp;examples:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_exp3.png" width="640"></p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_exp4.png" width="640"></p> <p>Now let’s update our grammar to include unary plus and unary minus operators. We&#8217;ll modify the <em>factor</em> rule and add unary operators there because unary operators have higher precedence than binary +, -, * and /&nbsp;operators.</p> <p>This is our current <em>factor</em>&nbsp;rule:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_factor_before.png" width="640"></p> <p>And this is our updated <em>factor</em> rule to handle unary plus and unary minus&nbsp;operators:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_factor_after.png" width="640"></p> <p>As you can see, I extended the <em>factor</em> rule to reference itself, which allows us to derive expressions like &#8220;- - - + - 3&#8221;, a legitimate expression with a lot of unary&nbsp;operators.</p> <p>Here is the full grammar that can now derive expressions with unary plus and unary minus&nbsp;operators:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_grammar.png" width="640"></p> <p>The next step is to add an <span class="caps">AST</span> node class to represent unary&nbsp;operators.</p> <p>This one will&nbsp;do:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">UnaryOp</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">expr</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">op</span> <span class="o">=</span> <span class="n">op</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span> <span class="o">=</span> <span class="n">expr</span> </pre></div> <p>The constructor takes two parameters: <em>op</em>, which represents the unary operator token (plus or minus) and <em>expr</em>, which represents an <span class="caps">AST</span>&nbsp;node.</p> <p>Our updated grammar had changes to the <em>factor</em> rule, so that’s what we’re going to modify in our parser - the <em>factor</em> method. We will add code to the method to handle the &#8220;(<span class="caps">PLUS</span> | <span class="caps">MINUS</span>) factor&#8221;&nbsp;sub-rule:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : (PLUS | MINUS) factor | INTEGER | LPAREN expr RPAREN&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">Num</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">LPAREN</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> </pre></div> <p><br/> And now we need to extend the <em>Interpreter</em> class and add a <em>visit_UnaryOp</em> method to interpret unary&nbsp;nodes:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_UnaryOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">op</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="k">if</span> <span class="n">op</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="k">return</span> <span class="o">+</span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">expr</span><span class="p">)</span> <span class="k">elif</span> <span class="n">op</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="k">return</span> <span class="o">-</span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">expr</span><span class="p">)</span> </pre></div> <p>Onward!</p> <p>Let&#8217;s manually build an <span class="caps">AST</span> for the expression &#8220;5 - - - 2&#8221; and pass it to our interpreter to verify that the new <em>visit_UnaryOp</em> method works. Here is how you can do it from the Python&nbsp;shell:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">BinOp</span><span class="p">,</span> <span class="n">UnaryOp</span><span class="p">,</span> <span class="n">Num</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">,</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">Token</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">five_tok</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">two_tok</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">minus_tok</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">MINUS</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">expr_node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span> <span class="o">...</span> <span class="n">Num</span><span class="p">(</span><span class="n">five_tok</span><span class="p">),</span> <span class="o">...</span> <span class="n">minus_tok</span><span class="p">,</span> <span class="o">...</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">minus_token</span><span class="p">,</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">minus_token</span><span class="p">,</span> <span class="n">Num</span><span class="p">(</span><span class="n">two_tok</span><span class="p">)))</span> <span class="o">...</span> <span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">Interpreter</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">inter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="bp">None</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">inter</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">expr_node</span><span class="p">)</span> <span class="mi">3</span> </pre></div> <p>Visually the above <span class="caps">AST</span> tree looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_ast.png" width="420"></p> <p>Download the full source code of the interpreter for this article directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part8/python/spi.py">GitHub</a>. Try it out and see for yourself that your updated tree-based interpreter properly evaluates arithmetic expressions containing unary&nbsp;operators.</p> <p>Here is a sample&nbsp;session:</p> <div class="highlight"><pre><span></span>$ python spi.py spi&gt; - <span class="m">3</span> -3 spi&gt; + <span class="m">3</span> <span class="m">3</span> spi&gt; <span class="m">5</span> - - - + - <span class="m">3</span> <span class="m">8</span> spi&gt; <span class="m">5</span> - - - + - <span class="o">(</span><span class="m">3</span> + <span class="m">4</span><span class="o">)</span> - +2 <span class="m">10</span> </pre></div> <p><br/> I also updated the <a href="https://github.com/rspivak/lsbasi/blob/master/part8/python/genastdot.py">genastdot.py</a> utility to handle unary operators. Here are some of the examples of the generated <span class="caps">AST</span> images for expressions with unary&nbsp;operators:</p> <div class="highlight"><pre><span></span>$ python genastdot.py <span class="s2">&quot;- 3&quot;</span> &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_genastdot_01.png"></p> <div class="highlight"><pre><span></span>$ python genastdot.py <span class="s2">&quot;+ 3&quot;</span> &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_genastdot_02.png"></p> <div class="highlight"><pre><span></span>$ python genastdot.py <span class="s2">&quot;5 - - - + - 3&quot;</span> &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_genastdot_03.png"></p> <div class="highlight"><pre><span></span>$ python genastdot.py <span class="s2">&quot;5 - - - + - (3 + 4) - +2&quot;</span> <span class="se">\</span> &gt; ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_genastdot_04.png"></p> <p><br/> <br/> And here is a new exercise for&nbsp;you:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part8/lsbasi_part8_exercises.png" width="280"></p> <ul> <li>Install <a href="http://www.freepascal.org/">Free Pascal</a>, compile and run <a href="https://github.com/rspivak/lsbasi/blob/master/part8/python/testunary.pas">testunary.pas</a>, and verify that the results are the same as produced with your <a href="https://github.com/rspivak/lsbasi/blob/master/part8/python/spi.py">spi</a>&nbsp;interpreter.</li> </ul> <p><br/> That’s all for today. In the next article, we’ll tackle assignment statements. Stay tuned and see you&nbsp;soon.</p> <p><br/> Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 7: Abstract Syntax Trees2015-12-15T07:00:00-05:002015-12-15T07:00:00-05:00Ruslan Spivaktag:ruslanspivak.com,2015-12-15:/lsbasi-part7/<p>As I promised you last time, today I will talk about one of the central data structures that we’ll use throughout the rest of the series, so buckle up and let&#8217;s&nbsp;go.</p> <p>Up until now, we had our interpreter and parser code mixed together and the interpreter would …</p><p>As I promised you last time, today I will talk about one of the central data structures that we’ll use throughout the rest of the series, so buckle up and let&#8217;s&nbsp;go.</p> <p>Up until now, we had our interpreter and parser code mixed together and the interpreter would evaluate an expression as soon as the parser recognized a certain language construct like addition, subtraction, multiplication, or division. Such interpreters are called <em>syntax-directed interpreters</em>. They usually make a single pass over the input and are suitable for basic language applications. In order to analyze more complex Pascal programming language constructs, we need to build an <em>intermediate representation</em> (<em><span class="caps">IR</span></em>). Our parser will be responsible for building an <em><span class="caps">IR</span></em> and our interpreter will use it to interpret the input represented as the <em><span class="caps">IR</span></em>.</p> <p>It turns out that a tree is a very suitable data structure for an <span class="caps">IR</span>.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_realtree.png" width="500"></p> <p>Let’s quickly talk about tree&nbsp;terminology.</p> <ul> <li>A <em>tree</em> is a data structure that consists of one or more nodes organized into a&nbsp;hierarchy.</li> <li>The tree has one <em>root</em>, which is the top&nbsp;node.</li> <li>All nodes except the root have a unique <em>parent</em>.</li> <li>The node labeled <strong>*</strong> in the picture below is a <em>parent</em>. Nodes labeled <strong>2</strong> and <strong>7</strong> are its <em>children</em>; children are ordered from left to&nbsp;right.</li> <li>A node with no children is called a <em>leaf</em>&nbsp;node.</li> <li>A node that has one or more children and that is not the root is called an <em>interior</em>&nbsp;node.</li> <li>The children can also be complete <em>subtrees</em>. In the picture below the left child (labeled <strong>*</strong>) of the <strong>+</strong> node is a complete <em>subtree</em> with its own&nbsp;children.</li> <li>In computer science we draw trees upside down starting with the root node at the top and branches growing&nbsp;downward.</li> </ul> <p>Here is a tree for the expression 2 * 7 + 3 with&nbsp;explanations:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_tree_terminology.png" width="640"></p> <p>The <span class="caps">IR</span> we’ll use throughout the series is called an <em>abstract-syntax tree</em> (<em><span class="caps">AST</span></em>). But before we dig deeper into ASTs let’s talk about <em>parse trees</em> briefly. Though we’re not going to use parse trees for our interpreter and compiler, they can help you understand how your parser interpreted the input by visualizing the execution trace of the parser. We’ll also compare them with ASTs to see why ASTs are better suited for intermediate representation than parse&nbsp;trees.</p> <p>So, what is a parse tree? A <em>parse-tree</em> (sometimes called a <em>concrete syntax tree</em>) is a tree that represents the syntactic structure of a language construct according to our grammar definition. It basically shows how your parser recognized the language construct or, in other words, it shows how the start symbol of your grammar derives a certain string in the programming&nbsp;language.</p> <p>The call stack of the parser implicitly represents a parse tree and it’s automatically built in memory by your parser as it is trying to recognize a certain language&nbsp;construct.</p> <p>Let’s take a look at a parse tree for the expression 2 * 7 +&nbsp;3:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_parsetree_01.png" width="520"></p> <p>In the picture above you can see&nbsp;that:</p> <ul> <li>The parse tree records a sequence of rules the parser applies to recognize the&nbsp;input.</li> <li>The root of the parse tree is labeled with the grammar start&nbsp;symbol.</li> <li>Each interior node represents a non-terminal, that is it represents a grammar rule application, like <em>expr</em>, <em>term</em>, or <em>factor</em> in our&nbsp;case.</li> <li>Each leaf node represents a&nbsp;token.</li> </ul> <p>As I&#8217;ve already mentioned, we&#8217;re not going to manually construct parser trees and use them for our interpreter but parse trees can help you understand how the parser interpreted the input by visualizing the parser call&nbsp;sequence.</p> <p>You can see how parse trees look like for different arithmetic expressions by trying out a small utility called <a href="https://github.com/rspivak/lsbasi/blob/master/part7/python/genptdot.py">genptdot.py</a> that I quickly wrote to help you visualize them. To use the utility you first need to install <a href="http://graphviz.org">Graphviz</a> package and after you&#8217;ve run the following command, you can open the generated image file parsetree.png and see a parse tree for the expression you passed as a command line&nbsp;argument:</p> <div class="highlight"><pre><span></span>$ python genptdot.py <span class="s2">&quot;14 + 2 * 3 - 6 / 2&quot;</span> &gt; <span class="se">\</span> parsetree.dot <span class="o">&amp;&amp;</span> dot -Tpng -o parsetree.png parsetree.dot </pre></div> <p>Here is the generated image parsetree.png for the expression 14 + 2 * 3 - 6 /&nbsp;2:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_genptdot_01.png"></p> <p>Play with the utility a bit by passing it different arithmetic expressions and see what a parse tree looks like for a particular&nbsp;expression.</p> <p>Now, let&#8217;s talk about <em>abstract-syntax trees</em> (<span class="caps">AST</span>). This is the <em>intermediate representation</em> (<span class="caps">IR</span>) that we’ll heavily use throughout the rest of the series. It is one of the central data structures for our interpreter and future compiler&nbsp;projects.</p> <p>Let&#8217;s start our discussion by taking a look at both the <span class="caps">AST</span> and the parse tree for the expression 2 * 7 +&nbsp;3:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_01.png" width="640"></p> <p>As you can see from the picture above, the <span class="caps">AST</span> captures the essence of the input while being&nbsp;smaller.</p> <p>Here are the main differences between ASTs and Parse&nbsp;trees:</p> <ul> <li>ASTs uses operators/operations as root and interior nodes and it uses operands as their&nbsp;children.</li> <li>ASTs do not use interior nodes to represent a grammar rule, unlike the parse tree&nbsp;does.</li> <li>ASTs don’t represent every detail from the real syntax (that’s why they’re called <em>abstract</em>) - no rule nodes and no parentheses, for&nbsp;example.</li> <li>ASTs are dense compared to a parse tree for the same language&nbsp;construct.</li> </ul> <p>So, what is an abstract syntax tree? An <em>abstract syntax tree</em> (<em><span class="caps">AST</span></em>) is a tree that represents the abstract syntactic structure of a language construct where each interior node and the root node represents an operator, and the children of the node represent the operands of that&nbsp;operator.</p> <p>I’ve already mentioned that ASTs are more compact than parse trees. Let’s take a look at an <span class="caps">AST</span> and a parse tree for the expression 7 + ((2 + 3)). You can see that the following <span class="caps">AST</span> is much smaller than the parse tree, but still captures the essence of the&nbsp;input:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_02.png"></p> <p>So far so good, but how do you encode operator precedence in an <span class="caps">AST</span>? In order to encode the operator precedence in <span class="caps">AST</span>, that is, to represent that “X happens before Y” you just need to put X lower in the tree than Y. And you’ve already seen that in the previous&nbsp;pictures.</p> <p>Let’s take a look at some more&nbsp;examples.</p> <p>In the picture below, on the left, you can see an <span class="caps">AST</span> for the expression 2 * 7 + 3. Let’s change the precedence by putting 7 + 3 inside the parentheses. You can see, on the right, what an <span class="caps">AST</span> looks like for the modified expression 2 * (7 +&nbsp;3):</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_astprecedence_01.png" width="640"></p> <p>Here is an <span class="caps">AST</span> for the expression 1 + 2 + 3 + 4 +&nbsp;5:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_astprecedence_02.png" width="480"></p> <p>From the pictures above you can see that operators with higher precedence end up being lower in the&nbsp;tree.</p> <p>Okay, let’s write some code to implement different <span class="caps">AST</span> node types and modify our parser to generate an <span class="caps">AST</span> tree composed of those&nbsp;nodes.</p> <p>First, we’ll create a base node class called <span class="caps">AST</span> that other classes will inherit&nbsp;from:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">AST</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">pass</span> </pre></div> <p>Not much there, actually. Recall that ASTs represent the operator-operand model. So far, we have four operators and integer operands. The operators are addition, subtraction, multiplication, and division. We could have created a separate class to represent each operator like AddNode, SubNode, MulNode, and DivNode, but instead we’re going to have only one <em>BinOp</em> class to represent all four binary operators (a <em>binary operator</em> is an operator that operates on two&nbsp;operands):</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">BinOp</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">right</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">left</span> <span class="o">=</span> <span class="n">left</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">op</span> <span class="o">=</span> <span class="n">op</span> <span class="bp">self</span><span class="o">.</span><span class="n">right</span> <span class="o">=</span> <span class="n">right</span> </pre></div> <p>The parameters to the constructor are <em>left</em>, <em>op</em>, and <em>right</em>, where <em>left</em> and <em>right</em> point correspondingly to the node of the left operand and to the node of the right operand. <em>Op</em> holds a token for the operator itself: Token(<span class="caps">PLUS</span>, ‘+’) for the plus operator, Token(<span class="caps">MINUS</span>, ‘-‘) for the minus operator, and so&nbsp;on.</p> <p>To represent integers in our <span class="caps">AST</span>, we’ll define a class <em>Num</em> that will hold an <span class="caps">INTEGER</span> token and the token’s&nbsp;value:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Num</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> </pre></div> <p>As you’ve noticed, all nodes store the token used to create the node. This is mostly for convenience and it will come in handy in the&nbsp;future.</p> <p>Recall the <span class="caps">AST</span> for the expression 2 * 7 + 3. We’re going to manually create it in code for that&nbsp;expression:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">Token</span><span class="p">,</span> <span class="n">MUL</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">Num</span><span class="p">,</span> <span class="n">BinOp</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">mul_token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">plus_token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">mul_node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span> <span class="o">...</span> <span class="n">left</span><span class="o">=</span><span class="n">Num</span><span class="p">(</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">2</span><span class="p">)),</span> <span class="o">...</span> <span class="n">op</span><span class="o">=</span><span class="n">mul_token</span><span class="p">,</span> <span class="o">...</span> <span class="n">right</span><span class="o">=</span><span class="n">Num</span><span class="p">(</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">7</span><span class="p">))</span> <span class="o">...</span> <span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">add_node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span> <span class="o">...</span> <span class="n">left</span><span class="o">=</span><span class="n">mul_node</span><span class="p">,</span> <span class="o">...</span> <span class="n">op</span><span class="o">=</span><span class="n">plus_token</span><span class="p">,</span> <span class="o">...</span> <span class="n">right</span><span class="o">=</span><span class="n">Num</span><span class="p">(</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span> <span class="o">...</span> <span class="p">)</span> </pre></div> <p>Here is how an <span class="caps">AST</span> will look with our new node classes defined. The picture below also follows the manual construction process&nbsp;above:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_astimpl_01.png" width="600"></p> <p>Here is our modified parser code that builds and returns an <span class="caps">AST</span> as a result of recognizing the input (an arithmetic&nbsp;expression):</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">AST</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">pass</span> <span class="k">class</span> <span class="nc">BinOp</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">right</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">left</span> <span class="o">=</span> <span class="n">left</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">op</span> <span class="o">=</span> <span class="n">op</span> <span class="bp">self</span><span class="o">.</span><span class="n">right</span> <span class="o">=</span> <span class="n">right</span> <span class="k">class</span> <span class="nc">Num</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">class</span> <span class="nc">Parser</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">lexer</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">lexer</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : INTEGER | LPAREN expr RPAREN&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">Num</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">LPAREN</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;term : factor ((MUL | DIV) factor)*&quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DIV</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="n">node</span><span class="p">,</span> <span class="n">op</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> expr : term ((PLUS | MINUS) term)*</span> <span class="sd"> term : factor ((MUL | DIV) factor)*</span> <span class="sd"> factor : INTEGER | LPAREN expr RPAREN</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="n">node</span><span class="p">,</span> <span class="n">op</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> </pre></div> <p>Let&#8217;s go over the process of an <span class="caps">AST</span> construction for some arithmetic&nbsp;expressions.</p> <p>If you look at the parser code above you can see that the way it builds nodes of an <span class="caps">AST</span> is that each BinOp node adopts the current value of the <em>node</em> variable as its left child and the result of a call to a <em>term</em> or <em>factor</em> as its right child, so it’s effectively pushing down nodes to the left and the tree for the expression 1 +2 + 3 + 4 + 5 below is a good example of that. Here is a visual representation how the parser gradually builds an <span class="caps">AST</span> for the expression 1 + 2 + 3 + 4 +&nbsp;5:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_astimpl_02.png" width="780"></p> <p>To help you visualize ASTs for different arithmetic expressions, I wrote a small utility that takes an arithmetic expression as its first argument and generates a <span class="caps">DOT</span> file that is then processed by the <em>dot</em> utility to actually draw an <span class="caps">AST</span> for you (<em>dot</em> is part of the <a href="http://graphviz.org">Graphviz</a> package that you need to install to run the <em>dot</em> command). Here is a command and a generated <span class="caps">AST</span> image for the expression 7 + 3 * (10 / (12 / (3 + 1) -&nbsp;1)):</p> <div class="highlight"><pre><span></span>$ python genastdot.py <span class="s2">&quot;7 + 3 * (10 / (12 / (3 + 1) - 1))&quot;</span> &gt; <span class="se">\</span> ast.dot <span class="o">&amp;&amp;</span> dot -Tpng -o ast.png ast.dot </pre></div> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_genastdot_01.png"></p> <p>It’s worth your while to write some arithmetic expressions, manually draw ASTs for the expressions, and then verify them by generating <span class="caps">AST</span> images for the same expressions with the <a href="https://github.com/rspivak/lsbasi/blob/master/part7/python/genastdot.py">genastdot.py</a> tool. That will help you better understand how ASTs are constructed by the parser for different arithmetic&nbsp;expressions.</p> <p>Okay, here is an <span class="caps">AST</span> for the expression 2 * 7 +&nbsp;3:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_walking_01.png" width="360"></p> <p>How do you navigate the tree to properly evaluate the expression represented by that tree? You do that by using a <em>postorder traversal</em> - a special case of <em>depth-first traversal</em> - which starts at the root node and recursively visits the children of each node from left to right. The postorder traversal visits nodes as far away from the root as fast as it&nbsp;can.</p> <p>Here is a pseudo code for the postorder traversal where <em>&lt;&lt;postorder actions&gt;&gt;</em> is a placeholder for actions like addition, subtraction, multiplication, or division for a <em>BinOp</em> node or a simpler action like returning the integer value of a <em>Num</em>&nbsp;node:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_visit_postorder.png" width="640"></p> <p>The reason we’re going to use a postorder traversal for our interpreter is that first, we need to evaluate interior nodes lower in the tree because they represent operators with higher precedence and second, we need to evaluate operands of an operator before applying the operator to those operands. In the picture below, you can see that with postorder traversal we first evaluate the expression 2 * 7 and only after that we evaluate 14 + 3, which gives us the correct result,&nbsp;17:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_walking_02.png" width="360"></p> <p>For the sake of completeness, I&#8217;ll mention that there are three types of depth-first traversal: <em>preorder traversal</em>, <em>inorder traversal</em>, and <em>postorder traversal</em>. The name of the traversal method comes from the place where you put actions in the visitation&nbsp;code:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_ast_visit_generic.png" width="560"></p> <p>Sometimes you might have to execute certain actions at all those points (preorder, inorder, and postorder). You’ll see some examples of that in the source code repository for this&nbsp;article.</p> <p>Okay, let’s write some code to visit and interpret the abstract syntax trees built by our parser, shall&nbsp;we?</p> <p>Here is the source code that implements the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor pattern</a>:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">NodeVisitor</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">method_name</span> <span class="o">=</span> <span class="s1">&#39;visit_&#39;</span> <span class="o">+</span> <span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="o">.</span><span class="vm">__name__</span> <span class="n">visitor</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">method_name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">generic_visit</span><span class="p">)</span> <span class="k">return</span> <span class="n">visitor</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">def</span> <span class="nf">generic_visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;No visit_{} method&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="o">.</span><span class="vm">__name__</span><span class="p">))</span> </pre></div> <p>And here is the source code of our <em>Interpreter</em> class that inherits from the <em>NodeVisitor</em> class and implements different methods that have the form <em>visit_NodeType</em>, where <em>NodeType</em> is replaced with the node&#8217;s class name like <em>BinOp</em>, <em>Num</em> and so&nbsp;on:</p> <div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">parser</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">parser</span> <span class="o">=</span> <span class="n">parser</span> <span class="k">def</span> <span class="nf">visit_BinOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Num</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">return</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> </pre></div> <p>There are two interesting things about the code that are worth mentioning here: First, the visitor code that manipulates <span class="caps">AST</span> nodes is decoupled from the <span class="caps">AST</span> nodes themselves. You can see that none of the <span class="caps">AST</span> node classes (BinOp and Num) provide any code to manipulate the data stored in those nodes. That logic is encapsulated in the <em>Interpreter</em> class that implements the <em>NodeVisitor</em>&nbsp;class.</p> <p>Second, instead of a giant <em>if</em> statement in the NodeVisitor’s <em>visit</em> method like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="n">node</span><span class="p">):</span> <span class="n">node_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="o">.</span><span class="vm">__name__</span> <span class="k">if</span> <span class="n">node_type</span> <span class="o">==</span> <span class="s1">&#39;BinOp&#39;</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit_BinOp</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node_type</span> <span class="o">==</span> <span class="s1">&#39;Num&#39;</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit_Num</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">elif</span> <span class="o">...</span> <span class="c1"># ...</span> </pre></div> <p>or like&nbsp;this:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="n">node</span><span class="p">):</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">BinOp</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit_BinOp</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">Num</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit_Num</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">elif</span> <span class="o">...</span> </pre></div> <p>the NodeVisitor&#8217;s <em>visit</em> method is very generic and dispatches calls to the appropriate method based on the node type passed to it. As I’ve mentioned before, in order to make use of it, our interpreter inherits from the <em>NodeVisitor</em> class and implements necessary methods. So if the type of a node passed to the <em>visit</em> method is BinOp, then the <em>visit</em> method will dispatch the call to the <em>visit_BinOp</em> method, and if the type of a node is Num, then the <em>visit</em> method will dispatch the call to the <em>visit_Num</em> method, and so&nbsp;on.</p> <p>Spend some time studying this approach (standard Python module <a href="https://docs.python.org/2.7/library/ast.html#module-ast">ast</a> uses the same mechanism for node traversal) as we will be extending our interpreter with many new <em>visit_NodeType</em> methods in the&nbsp;future.</p> <p>The <em>generic_visit</em> method is a fallback that raises an exception to indicate that it encountered a node that the implementation class has no corresponding <em>visit_NodeType</em> method&nbsp;for.</p> <p>Now, let&#8217;s manually build an <span class="caps">AST</span> for the expression 2 * 7 + 3 and pass it to our interpreter to see the visit method in action to evaluate the expression. Here is how you can do it from the Python&nbsp;shell:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">Token</span><span class="p">,</span> <span class="n">MUL</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">Num</span><span class="p">,</span> <span class="n">BinOp</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">mul_token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">plus_token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">mul_node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span> <span class="o">...</span> <span class="n">left</span><span class="o">=</span><span class="n">Num</span><span class="p">(</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">2</span><span class="p">)),</span> <span class="o">...</span> <span class="n">op</span><span class="o">=</span><span class="n">mul_token</span><span class="p">,</span> <span class="o">...</span> <span class="n">right</span><span class="o">=</span><span class="n">Num</span><span class="p">(</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">7</span><span class="p">))</span> <span class="o">...</span> <span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">add_node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span> <span class="o">...</span> <span class="n">left</span><span class="o">=</span><span class="n">mul_node</span><span class="p">,</span> <span class="o">...</span> <span class="n">op</span><span class="o">=</span><span class="n">plus_token</span><span class="p">,</span> <span class="o">...</span> <span class="n">right</span><span class="o">=</span><span class="n">Num</span><span class="p">(</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span> <span class="o">...</span> <span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">spi</span> <span class="kn">import</span> <span class="n">Interpreter</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">inter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="bp">None</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">inter</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">add_node</span><span class="p">)</span> <span class="mi">17</span> </pre></div> <p>As you can see, I passed the root of the expression tree to the <em>visit</em> method and that triggered traversal of the tree by dispatching calls to the correct methods of the <em>Interpreter</em> class(<em>visit_BinOp</em> and <em>visit_Num</em>) and generating the&nbsp;result.</p> <p>Okay, here is the complete code of our new interpreter for your&nbsp;convenience:</p> <div class="highlight"><pre><span></span><span class="sd">&quot;&quot;&quot; SPI - Simple Pascal Interpreter &quot;&quot;&quot;</span> <span class="c1">###############################################################################</span> <span class="c1"># #</span> <span class="c1"># LEXER #</span> <span class="c1"># #</span> <span class="c1">###############################################################################</span> <span class="c1"># Token types</span> <span class="c1">#</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">,</span> <span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">,</span> <span class="n">LPAREN</span><span class="p">,</span> <span class="n">RPAREN</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="p">(</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;PLUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MINUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MUL&#39;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">,</span> <span class="s1">&#39;(&#39;</span><span class="p">,</span> <span class="s1">&#39;)&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="p">)</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(PLUS, &#39;+&#39;)</span> <span class="sd"> Token(MUL, &#39;*&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Lexer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;4 + 2 * 3 - 6 / 2&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid character&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the `pos` pointer and set the `current_char` variable.&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">skip_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">def</span> <span class="nf">integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens. One token at a time.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_whitespace</span><span class="p">()</span> <span class="k">continue</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">integer</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;+&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;-&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MINUS</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;*&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;/&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">DIV</span><span class="p">,</span> <span class="s1">&#39;/&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;(&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">,</span> <span class="s1">&#39;(&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;)&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">,</span> <span class="s1">&#39;)&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="c1">###############################################################################</span> <span class="c1"># #</span> <span class="c1"># PARSER #</span> <span class="c1"># #</span> <span class="c1">###############################################################################</span> <span class="k">class</span> <span class="nc">AST</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">pass</span> <span class="k">class</span> <span class="nc">BinOp</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">right</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">left</span> <span class="o">=</span> <span class="n">left</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">op</span> <span class="o">=</span> <span class="n">op</span> <span class="bp">self</span><span class="o">.</span><span class="n">right</span> <span class="o">=</span> <span class="n">right</span> <span class="k">class</span> <span class="nc">Num</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">class</span> <span class="nc">Parser</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">lexer</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">lexer</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : INTEGER | LPAREN expr RPAREN&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">Num</span><span class="p">(</span><span class="n">token</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">LPAREN</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">)</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;term : factor ((MUL | DIV) factor)*&quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DIV</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="n">node</span><span class="p">,</span> <span class="n">op</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;</span> <span class="sd"> expr : term ((PLUS | MINUS) term)*</span> <span class="sd"> term : factor ((MUL | DIV) factor)*</span> <span class="sd"> factor : INTEGER | LPAREN expr RPAREN</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">node</span> <span class="o">=</span> <span class="n">BinOp</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="n">node</span><span class="p">,</span> <span class="n">op</span><span class="o">=</span><span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">())</span> <span class="k">return</span> <span class="n">node</span> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="c1">###############################################################################</span> <span class="c1"># #</span> <span class="c1"># INTERPRETER #</span> <span class="c1"># #</span> <span class="c1">###############################################################################</span> <span class="k">class</span> <span class="nc">NodeVisitor</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="n">method_name</span> <span class="o">=</span> <span class="s1">&#39;visit_&#39;</span> <span class="o">+</span> <span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="o">.</span><span class="vm">__name__</span> <span class="n">visitor</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">method_name</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">generic_visit</span><span class="p">)</span> <span class="k">return</span> <span class="n">visitor</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">def</span> <span class="nf">generic_visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;No visit_{} method&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="o">.</span><span class="vm">__name__</span><span class="p">))</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="n">NodeVisitor</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">parser</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">parser</span> <span class="o">=</span> <span class="n">parser</span> <span class="k">def</span> <span class="nf">visit_BinOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">elif</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">def</span> <span class="nf">visit_Num</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span> <span class="k">return</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="k">def</span> <span class="nf">interpret</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">tree</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parser</span><span class="o">.</span><span class="n">parse</span><span class="p">()</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">tree</span><span class="p">)</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;spi&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">NameError</span><span class="p">:</span> <span class="c1"># Python3</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s1">&#39;spi&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">lexer</span> <span class="o">=</span> <span class="n">Lexer</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">parser</span> <span class="o">=</span> <span class="n">Parser</span><span class="p">(</span><span class="n">lexer</span><span class="p">)</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">parser</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">interpret</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Save the above code into the <em>spi.py</em> file or download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part7/python/spi.py">GitHub</a>. Try it out and see for yourself that your new tree-based interpreter properly evaluates arithmetic&nbsp;expressions.</p> <p>Here is a sample&nbsp;session:</p> <div class="highlight"><pre><span></span>$ python spi.py spi&gt; <span class="m">7</span> + <span class="m">3</span> * <span class="o">(</span><span class="m">10</span> / <span class="o">(</span><span class="m">12</span> / <span class="o">(</span><span class="m">3</span> + <span class="m">1</span><span class="o">)</span> - <span class="m">1</span><span class="o">))</span> <span class="m">22</span> spi&gt; <span class="m">7</span> + <span class="m">3</span> * <span class="o">(</span><span class="m">10</span> / <span class="o">(</span><span class="m">12</span> / <span class="o">(</span><span class="m">3</span> + <span class="m">1</span><span class="o">)</span> - <span class="m">1</span><span class="o">))</span> / <span class="o">(</span><span class="m">2</span> + <span class="m">3</span><span class="o">)</span> - <span class="m">5</span> - <span class="m">3</span> + <span class="o">(</span><span class="m">8</span><span class="o">)</span> <span class="m">10</span> spi&gt; <span class="m">7</span> + <span class="o">(((</span><span class="m">3</span> + <span class="m">2</span><span class="o">)))</span> <span class="m">12</span> </pre></div> <p><br/> Today you’ve learned about parse trees, ASTs, how to construct ASTs and how to traverse them to interpret the input represented by those ASTs. You’ve also modified the parser and the interpreter and split them apart. The current interface between the lexer, parser, and the interpreter now looks like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_pipeline.png" width="640"></p> <p>You can read that as &#8220;The parser gets tokens from the lexer and then returns the generated <span class="caps">AST</span> for the interpreter to traverse and interpret the&nbsp;input.&#8221;</p> <p>That&#8217;s it for today, but before wrapping up I&#8217;d like to talk briefly about recursive-descent parsers, namely just give them a definition because I promised last time to talk about them in more detail. So here you go: a <em>recursive-descent parser</em> is a top-down parser that uses a set of recursive procedures to process the input. Top-down reflects the fact that the parser begins by constructing the top node of the parse tree and then gradually constructs lower&nbsp;nodes.</p> <p><br/> And now it’s time for exercises&nbsp;:)</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part7/lsbasi_part7_exercise.png" width="280"></p> <ul> <li>Write a translator (hint: node visitor) that takes as input an arithmetic expression and prints it out in postfix notation, also known as Reverse Polish Notation (<span class="caps">RPN</span>). For example, if the input to the translator is the expression (5 + 3) * 12 / 3 than the output should be 5 3 + 12 * 3 /. See the answer <a href="https://github.com/rspivak/lsbasi/blob/master/part7/python/ex1.py">here</a> but try to solve it first on your&nbsp;own.</li> <li>Write a translator (node visitor) that takes as input an arithmetic expression and prints it out in <span class="caps">LISP</span> style notation, that is 2 + 3 would become (+ 2 3) and (2 + 3 * 5) would become (+ 2 (* 3 5)). You can find the answer <a href="https://github.com/rspivak/lsbasi/blob/master/part7/python/ex2.py">here</a> but again try to solve it first before looking at the provided&nbsp;solution.</li> </ul> <p><br/> In the next article, we’ll add assignment and unary operators to our growing Pascal interpreter. Until then, have fun and see you&nbsp;soon.</p> <p><br/> <span class="caps">P.S.</span> I&#8217;ve also provided a Rust implementation of the interpreter that you can find on <a href="https://github.com/rspivak/lsbasi/blob/master/part7/rust/spi/src/main.rs">GitHub</a>. This is a way for me to learn <a href="https://www.rust-lang.org/">Rust</a> so keep in mind that the code might not be &#8220;idiomatic&#8221; yet. Comments and suggestions as to how to make the code better are always&nbsp;welcome.</p> <p><br/> Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 6.2015-11-02T07:00:00-05:002015-11-02T07:00:00-05:00Ruslan Spivaktag:ruslanspivak.com,2015-11-02:/lsbasi-part6/<p>Today is <em>the</em> day :) “Why?” you might ask. The reason is that today we’re wrapping up our discussion of arithmetic expressions (well, almost) by adding parenthesized expressions to our grammar and implementing an interpreter that will be able to evaluate parenthesized expressions with arbitrarily deep nesting, like the expression …</p><p>Today is <em>the</em> day :) “Why?” you might ask. The reason is that today we’re wrapping up our discussion of arithmetic expressions (well, almost) by adding parenthesized expressions to our grammar and implementing an interpreter that will be able to evaluate parenthesized expressions with arbitrarily deep nesting, like the expression 7 + 3 * (10 / (12 / (3 + 1) -&nbsp;1)).</p> <p>Let’s get started, shall&nbsp;we?</p> <p>First, let’s modify the grammar to support expressions inside parentheses. As you remember from <a href="http://ruslanspivak.com/lsbasi-part5/" title="Part 5">Part 5</a>, the <em>factor</em> rule is used for basic units in expressions. In that article, the only basic unit we had was an integer. Today we’re adding another basic unit - a parenthesized expression. Let’s do&nbsp;it.</p> <p>Here is our updated&nbsp;grammar:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part6/lsbasi_part6_grammar.png" width="640"></p> <p>The <em>expr</em> and the <em>term</em> productions are exactly the same as in <a href="http://ruslanspivak.com/lsbasi-part5/" title="Part 5">Part 5</a> and the only change is in the <em>factor</em> production where the terminal <span class="caps">LPAREN</span> represents a left parenthesis &#8216;(&#8216;, the terminal <span class="caps">RPAREN</span> represents a right parenthesis &#8216;)&#8217;, and the non-terminal <em>expr</em> between the parentheses refers to the <em>expr</em>&nbsp;rule.</p> <p>Here is the updated syntax diagram for the <em>factor</em>, which now includes&nbsp;alternatives:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part6/lsbasi_part6_factor_diagram.png" width="640"></p> <p>Because the grammar rules for the <em>expr</em> and the <em>term</em> haven’t changed, their syntax diagrams look the same as in <a href="http://ruslanspivak.com/lsbasi-part5/" title="Part 5">Part 5</a>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part6/lsbasi_part6_expr_term_diagram.png" width="640"></p> <p>Here is an interesting feature of our new grammar - it is recursive. If you try to derive the expression 2 * (7 + 3), you will start with the <em>expr</em> start symbol and eventually you will get to a point where you will recursively use the <em>expr</em> rule again to derive the (7 + 3) portion of the original arithmetic&nbsp;expression.</p> <p>Let’s decompose the expression 2 * (7 + 3) according to the grammar and see how it&nbsp;looks:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part6/lsbasi_part6_decomposition.png" width="640"></p> <p>A little aside: if you need a refresher on recursion, take a look at Daniel P. Friedman and Matthias Felleisen&#8217;s <a rel="nofollow" href="http://www.amazon.com/gp/product/0262560992/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0262560992&linkCode=as2&tag=russblo0b-20&linkId=IM7CT7RLWNGJ7J54">The Little Schemer</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0262560992" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> book - it’s really&nbsp;good.</p> <p>Okay, let’s get moving and translate our new updated grammar to&nbsp;code.</p> <p>The following are the main changes to the code from the previous&nbsp;article:</p> <ol> <li>The <em>Lexer</em> has been modified to return two more tokens: <span class="caps">LPAREN</span> for a left parenthesis and <span class="caps">RPAREN</span> for a right&nbsp;parenthesis.</li> <li>The <em>Interpreter</em>&#8216;s <em>factor</em> method has been slightly updated to parse parenthesized expressions in addition to&nbsp;integers.</li> </ol> <p>Here is the complete code of a calculator that can evaluate arithmetic expressions containing integers; any number of addition, subtraction, multiplication and division operators; and parenthesized expressions with arbitrarily deep&nbsp;nesting:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="c1">#</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">,</span> <span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">,</span> <span class="n">LPAREN</span><span class="p">,</span> <span class="n">RPAREN</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="p">(</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;PLUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MINUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MUL&#39;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">,</span> <span class="s1">&#39;(&#39;</span><span class="p">,</span> <span class="s1">&#39;)&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="p">)</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(PLUS, &#39;+&#39;)</span> <span class="sd"> Token(MUL, &#39;*&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Lexer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;4 + 2 * 3 - 6 / 2&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid character&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the `pos` pointer and set the `current_char` variable.&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">skip_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">def</span> <span class="nf">integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens. One token at a time.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_whitespace</span><span class="p">()</span> <span class="k">continue</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">integer</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;+&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;-&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MINUS</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;*&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;/&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">DIV</span><span class="p">,</span> <span class="s1">&#39;/&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;(&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">,</span> <span class="s1">&#39;(&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;)&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">,</span> <span class="s1">&#39;)&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">lexer</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">lexer</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : INTEGER | LPAREN expr RPAREN&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">INTEGER</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">LPAREN</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">LPAREN</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">RPAREN</span><span class="p">)</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;term : factor ((MUL | DIV) factor)*&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DIV</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Arithmetic expression parser / interpreter.</span> <span class="sd"> calc&gt; 7 + 3 * (10 / (12 / (3 + 1) - 1))</span> <span class="sd"> 22</span> <span class="sd"> expr : term ((PLUS | MINUS) term)*</span> <span class="sd"> term : factor ((MUL | DIV) factor)*</span> <span class="sd"> factor : INTEGER | LPAREN expr RPAREN</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># To run under Python3 replace &#39;raw_input&#39; call</span> <span class="c1"># with &#39;input&#39;</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;calc&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">lexer</span> <span class="o">=</span> <span class="n">Lexer</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">lexer</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Save the above code into the <a href="https://github.com/rspivak/lsbasi/blob/master/part6/calc6.py">calc6.py</a> file, try it out and see for yourself that your new interpreter properly evaluates arithmetic expressions that have different operators and&nbsp;parentheses.</p> <p>Here is a sample&nbsp;session:</p> <div class="highlight"><pre><span></span>$ python calc6.py calc&gt; <span class="m">3</span> <span class="m">3</span> calc&gt; <span class="m">2</span> + <span class="m">7</span> * <span class="m">4</span> <span class="m">30</span> calc&gt; <span class="m">7</span> - <span class="m">8</span> / <span class="m">4</span> <span class="m">5</span> calc&gt; <span class="m">14</span> + <span class="m">2</span> * <span class="m">3</span> - <span class="m">6</span> / <span class="m">2</span> <span class="m">17</span> calc&gt; <span class="m">7</span> + <span class="m">3</span> * <span class="o">(</span><span class="m">10</span> / <span class="o">(</span><span class="m">12</span> / <span class="o">(</span><span class="m">3</span> + <span class="m">1</span><span class="o">)</span> - <span class="m">1</span><span class="o">))</span> <span class="m">22</span> calc&gt; <span class="m">7</span> + <span class="m">3</span> * <span class="o">(</span><span class="m">10</span> / <span class="o">(</span><span class="m">12</span> / <span class="o">(</span><span class="m">3</span> + <span class="m">1</span><span class="o">)</span> - <span class="m">1</span><span class="o">))</span> / <span class="o">(</span><span class="m">2</span> + <span class="m">3</span><span class="o">)</span> - <span class="m">5</span> - <span class="m">3</span> + <span class="o">(</span><span class="m">8</span><span class="o">)</span> <span class="m">10</span> calc&gt; <span class="m">7</span> + <span class="o">(((</span><span class="m">3</span> + <span class="m">2</span><span class="o">)))</span> <span class="m">12</span> </pre></div> <p><br/> And here is a new exercise for you for&nbsp;today:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part6/lsbasi_part6_exercises.png" width="280"></p> <ul> <li>Write your own version of the interpreter of arithmetic expressions as described in this article. Remember: repetition is the mother of all&nbsp;learning.</li> </ul> <p><br/> Hey, you read all the way to the end! Congratulations, you&#8217;ve just learned how to create (and if you&#8217;ve done the exercise - you&#8217;ve actually written) a basic <em>recursive-descent parser / interpreter</em> that can evaluate pretty complex arithmetic&nbsp;expressions.</p> <p>In the next article I will talk in a lot more detail about <em>recursive-descent parsers</em>. I will also introduce an important and widely used data structure in interpreter and compiler construction that we’ll use throughout the&nbsp;series.</p> <p>Stay tuned and see you soon. Until then, keep working on your interpreter and most importantly: have fun and enjoy the&nbsp;process!</p> <p><br/> Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 5.2015-10-14T07:00:00-04:002015-10-14T07:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-10-14:/lsbasi-part5/<p>How do you tackle something as complex as understanding how to create an interpreter or compiler? In the beginning it all looks pretty much like a tangled mess of yarn that you need to untangle to get that perfect&nbsp;ball.</p> <p>The way to get there is to just untangle it …</p><p>How do you tackle something as complex as understanding how to create an interpreter or compiler? In the beginning it all looks pretty much like a tangled mess of yarn that you need to untangle to get that perfect&nbsp;ball.</p> <p>The way to get there is to just untangle it one thread, one knot at a time. Sometimes, though, you might feel like you don’t understand something right away, but you have to keep going. It will eventually &#8220;click&#8221; if you’re persistent enough, I promise you (Gee, if I put aside 25 cents every time I didn’t understand something right away I would have become rich a long time ago&nbsp;:).</p> <p>Probably one of the best pieces of advice I could give you on your way to understanding how to create an interpreter and compiler is to read the explanations in the articles, read the code, and then write code yourself, and even write the same code several times over a period of time to make the material and code feel natural to you, and only then move on to learn new topics. Do not rush, just slow down and take your time to understand the basic ideas deeply. This approach, while seemingly slow, will pay off down the road. Trust&nbsp;me.</p> <p>You will eventually get your perfect ball of yarn in the end. And, you know what? Even if it is not that perfect it is still better than the alternative, which is to do nothing and not learn the topic or quickly skim over it and forget it in a couple of&nbsp;days.</p> <p>Remember - just keep untangling: one thread, one knot at a time and practice what you’ve learned by writing code, a lot of&nbsp;it:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_ballofyarn.png" width="640"></p> <p>Today you’re going to use all the knowledge you’ve gained from previous articles in the series and learn how to parse and interpret arithmetic expressions that have any number of addition, subtraction, multiplication, and division operators. You will write an interpreter that will be able to evaluate expressions like &#8220;14 + 2 * 3 - 6 /&nbsp;2&#8221;.</p> <p>Before diving in and writing some code let’s talk about the <strong>associativity</strong> and <strong>precedence</strong> of&nbsp;operators.</p> <p>By convention 7 + 3 + 1 is the same as (7 + 3) + 1 and 7 - 3 - 1 is equivalent to (7 - 3) - 1. No surprises here. We all learned that at some point and have been taking it for granted since then. If we treated 7 - 3 - 1 as 7 - (3 - 1) the result would be unexpected 5 instead of the expected&nbsp;3.</p> <p>In ordinary arithmetic and most programming languages addition, subtraction, multiplication, and division are <em>left-associative</em>:</p> <div class="highlight"><pre><span></span>7 + 3 + 1 is equivalent to (7 + 3) + 1 7 - 3 - 1 is equivalent to (7 - 3) - 1 8 * 4 * 2 is equivalent to (8 * 4) * 2 8 / 4 / 2 is equivalent to (8 / 4) / 2 </pre></div> <p>What does it mean for an operator to be <em>left-associative</em>?</p> <p>When an operand like 3 in the expression 7 + 3 + 1 has plus signs on both sides, we need a convention to decide which operator applies to 3. Is it the one to the left or the one to the right of the operand 3? The operator + <em>associates</em> to the left because an operand that has plus signs on both sides belongs to the operator to its left and so we say that the operator + is <em>left-associative</em>. That’s why 7 + 3 + 1 is equivalent to (7 + 3) + 1 by the <em>associativity</em>&nbsp;convention.</p> <p>Okay, what about an expression like 7 + 5 * 2 where we have different kinds of operators on both sides of the operand 5? Is the expression equivalent to 7 + (5 * 2) or (7 + 5) * 2? How do we resolve this&nbsp;ambiguity?</p> <p>In this case, the associativity convention is of no help to us because it applies only to operators of one kind, either additive (+, -) or multiplicative (*, /). We need another convention to resolve the ambiguity when we have different kinds of operators in the same expression. We need a convention that defines relative <em>precedence</em> of&nbsp;operators.</p> <p>And here it is: we say that if the operator * takes its operands before + does, then it has <em>higher precedence</em>. In the arithmetic that we know and use, multiplication and division have <em>higher precedence</em> than addition and subtraction. As a result the expression 7 + 5 * 2 is equivalent to 7 + (5 * 2) and the expression 7 - 8 / 4 is equivalent to 7 - (8 /&nbsp;4).</p> <p>In a case where we have an expression with operators that have the same <em>precedence</em>, we just use the <em>associativity</em> convention and execute the operators from left to&nbsp;right:</p> <div class="highlight"><pre><span></span>7 + 3 - 1 is equivalent to (7 + 3) - 1 8 / 4 * 2 is equivalent to (8 / 4) * 2 </pre></div> <p>I hope you didn’t think I wanted to bore you to death by talking so much about the associativity and precedence of operators. The nice thing about those conventions is that we can construct a grammar for arithmetic expressions from a table that shows the associativity and precedence of arithmetic operators. Then, we can translate the grammar into code by following the guidelines I outlined in <a href="http://ruslanspivak.com/lsbasi-part4/" title="Part 4">Part 4</a>, and our interpreter will be able to handle the precedence of operators in addition to&nbsp;associativity.</p> <p>Okay, here is our precedence&nbsp;table:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_precedence.png" width="640"></p> <p>From the table, you can tell that operators + and - have the same precedence level and they are both left-associative. You can also see that operators * and / are also left-associative, have the same precedence among themselves but have higher-precedence than addition and subtraction&nbsp;operators.</p> <p>Here are the rules for how to construct a grammar from the precedence&nbsp;table:</p> <ol> <li>For each level of precedence define a non-terminal. The body of a production for the non-terminal should contain arithmetic operators from that level and non-terminals for the next higher level of&nbsp;precedence.</li> <li>Create an additional non-terminal <em>factor</em> for basic units of expression, in our case, integers. The general rule is that if you have N levels of precedence, you will need N + 1 non-terminals in total: one non-terminal for each level plus one non-terminal for basic units of&nbsp;expression.</li> </ol> <p>Onward!</p> <p>Let’s follow the rules and construct our&nbsp;grammar.</p> <p>According to Rule 1 we will define two non-terminals: a non-terminal called <em>expr</em> for level 2 and a non-terminal called <em>term</em> for level 1. And by following Rule 2 we will define a <em>factor</em> non-terminal for basic units of arithmetic expressions,&nbsp;integers.</p> <p>The <em>start symbol</em> of our new grammar will be <em>expr</em> and the <em>expr</em> production will contain a body representing the use of operators from level 2, which in our case are operators + and - , and will contain <em>term</em> non-terminals for the next higher level of precedence, level&nbsp;1:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_cfg_expr.png" width="640"></p> <p>The <em>term</em> production will have a body representing the use of operators from level 1, which are operators * and / , and it will contain the non-terminal <em>factor</em> for the basic units of expression,&nbsp;integers:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_cfg_term.png" width="640"></p> <p>And the production for the non-terminal <em>factor</em> will&nbsp;be:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_cfg_factor.png" width="340"></p> <p>You’ve already seen above productions as part of grammars and syntax diagrams from previous articles, but here we combine them into one grammar that takes care of the associativity and the precedence of&nbsp;operators:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_grammar.png" width="640"></p> <p>Here is a syntax diagram that corresponds to the grammar&nbsp;above:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_syntaxdiagram.png" width="640"></p> <p>Each rectangular box in the diagram is a &#8220;method call&#8221; to another diagram. If you take the expression 7 + 5 * 2 and start with the top diagram <em>expr</em> and walk your way down to the bottommost diagram <em>factor</em>, you should be able to see that <em>higher-precedence</em> operators * and / in the lower diagram execute before operators + and - in the higher&nbsp;diagram.</p> <p>To drive the precedence of operators point home, let’s take a look at the decomposition of the same arithmetic expression 7 + 5 * 2 done in accordance with our grammar and syntax diagrams above. This is just another way to show that <em>higher-precedence</em> operators execute before operators with <em>lower precedence</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_exprdecomp.png" width="640"></p> <p>Okay, let’s convert the grammar to code following guidelines from <a href="http://ruslanspivak.com/lsbasi-part4/" title="Part 4">Part 4</a> and see how our new interpreter works, shall&nbsp;we?</p> <p>Here is the grammar&nbsp;again:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_grammar.png" width="640"></p> <p>And here is the complete code of a calculator that can handle valid arithmetic expressions containing integers and any number of addition, subtraction, multiplication, and division&nbsp;operators.</p> <p>The following are the main changes compared with the code from <a href="http://ruslanspivak.com/lsbasi-part4/" title="Part 4">Part 4</a>:</p> <ul> <li>The <em>Lexer</em> class can now tokenize +, -, *, and / (Nothing new here, we just combined code from previous articles into one class that supports all those&nbsp;tokens)</li> <li>Recall that each rule (production), <strong>R</strong>, defined in the grammar, becomes a method with the same name, and references to that rule become a method call: <strong>R()</strong>. As a result the <em>Interpreter</em> class now has three methods that correspond to non-terminals in the grammar: <em>expr</em>, <em>term</em>, and <em>factor</em>.</li> </ul> <p>Source&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="c1">#</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">,</span> <span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="p">(</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;PLUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MINUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MUL&#39;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="p">)</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="c1"># token type: INTEGER, PLUS, MINUS, MUL, DIV, or EOF</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="c1"># token value: non-negative integer value, &#39;+&#39;, &#39;-&#39;, &#39;*&#39;, &#39;/&#39;, or None</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(PLUS, &#39;+&#39;)</span> <span class="sd"> Token(MUL, &#39;*&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Lexer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;3 * 5&quot;, &quot;12 / 3 * 4&quot;, etc</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid character&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the `pos` pointer and set the `current_char` variable.&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">skip_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">def</span> <span class="nf">integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens. One token at a time.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_whitespace</span><span class="p">()</span> <span class="k">continue</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">integer</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;+&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;-&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MINUS</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;*&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;/&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">DIV</span><span class="p">,</span> <span class="s1">&#39;/&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">lexer</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">lexer</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;factor : INTEGER&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;term : factor ((MUL | DIV) factor)*&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DIV</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Arithmetic expression parser / interpreter.</span> <span class="sd"> calc&gt; 14 + 2 * 3 - 6 / 2</span> <span class="sd"> 17</span> <span class="sd"> expr : term ((PLUS | MINUS) term)*</span> <span class="sd"> term : factor ((MUL | DIV) factor)*</span> <span class="sd"> factor : INTEGER</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># To run under Python3 replace &#39;raw_input&#39; call</span> <span class="c1"># with &#39;input&#39;</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;calc&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">lexer</span> <span class="o">=</span> <span class="n">Lexer</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">lexer</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Save the above code into the <em>calc5.py</em> file or download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part5/calc5.py">GitHub</a>. As usual, try it out and see for yourself that the interpreter properly evaluates arithmetic expressions that have operators with different&nbsp;precedence.</p> <p>Here is a sample session on my&nbsp;laptop:</p> <div class="highlight"><pre><span></span>$ python calc5.py calc&gt; <span class="m">3</span> <span class="m">3</span> calc&gt; <span class="m">2</span> + <span class="m">7</span> * <span class="m">4</span> <span class="m">30</span> calc&gt; <span class="m">7</span> - <span class="m">8</span> / <span class="m">4</span> <span class="m">5</span> calc&gt; <span class="m">14</span> + <span class="m">2</span> * <span class="m">3</span> - <span class="m">6</span> / <span class="m">2</span> <span class="m">17</span> </pre></div> <p><br/> Here are new exercises for&nbsp;today:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part5/lsbasi_part5_exercises.png" width="240"></p> <ul> <li> <p>Write an interpreter as described in this article off the top of your head, without peeking into the code from the article. Write some tests for your interpreter, and make sure they&nbsp;pass.</p> </li> <li> <p>Extend the interpreter to handle arithmetic expressions containing parentheses so that your interpreter could evaluate deeply nested arithmetic expressions like: 7 + 3 * (10 / (12 / (3 + 1) -&nbsp;1))</p> </li> </ul> <p><br/> <strong>Check your&nbsp;understanding.</strong></p> <ol> <li>What does it mean for an operator to be <em>left-associative</em>?</li> <li>Are operators + and - <em>left-associative</em> or <em>right-associative</em>? What about * and /&nbsp;?</li> <li>Does operator + have <em>higher precedence</em> than operator *&nbsp;?</li> </ol> <p><br/> Hey, you read all the way to the end! That’s really great. I’ll be back next time with a new article - stay tuned, be brilliant, and, as usual, don’t forget to do the&nbsp;exercises.</p> <p><br/> Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 4.2015-09-11T07:00:00-04:002015-09-11T07:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-09-11:/lsbasi-part4/<p>Have you been passively learning the material in these articles or have you been actively practicing it? I hope you’ve been actively practicing it. I really do&nbsp;:)</p> <p><br/> Remember what Confucius&nbsp;said?</p> <blockquote> <p><em><span class="dquo">&#8220;</span>I hear and I&nbsp;forget.&#8221;</em></p> </blockquote> <p><img alt="Hear" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_hear.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>I see and I&nbsp;remember.&#8221;</em></p> </blockquote> <p><img alt="See" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_see.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>I do and I&nbsp;understand.&#8221;</em></p> </blockquote> <p><img alt="Do" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_do.png" width="640"></p> <p><br/> In the previous …</p><p>Have you been passively learning the material in these articles or have you been actively practicing it? I hope you’ve been actively practicing it. I really do&nbsp;:)</p> <p><br/> Remember what Confucius&nbsp;said?</p> <blockquote> <p><em><span class="dquo">&#8220;</span>I hear and I&nbsp;forget.&#8221;</em></p> </blockquote> <p><img alt="Hear" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_hear.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>I see and I&nbsp;remember.&#8221;</em></p> </blockquote> <p><img alt="See" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_see.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>I do and I&nbsp;understand.&#8221;</em></p> </blockquote> <p><img alt="Do" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_do.png" width="640"></p> <p><br/> In the previous article you learned how to parse (recognize) and interpret arithmetic expressions with any number of plus or minus operators in them, for example &#8220;7 - 3 + 2 - 1&#8221;. You also learned about syntax diagrams and how they can be used to specify the syntax of a programming&nbsp;language.</p> <p>Today you’re going to learn how to parse and interpret arithmetic expressions with any number of multiplication and division operators in them, for example &#8220;7 * 4 / 2 * 3&#8221;. The division in this article will be an integer division, so if the expression is &#8220;9 / 4&#8221;, then the answer will be an integer:&nbsp;2.</p> <p>I will also talk quite a bit today about another widely used notation for specifying the syntax of a programming language. It’s called <em><strong>context-free grammars</strong></em> (<em><strong>grammars</strong></em>, for short) or <em><strong><span class="caps">BNF</span></strong></em> (Backus-Naur Form). For the purpose of this article I will not use pure <a href="https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form"><span class="caps">BNF</span></a> notation but more like a modified <a href="https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form"><span class="caps">EBNF</span></a>&nbsp;notation.</p> <p>Here are a couple of reasons to use&nbsp;grammars:</p> <ol> <li>A grammar specifies the syntax of a programming language in a concise manner. Unlike syntax diagrams, grammars are very compact. You will see me using grammars more and more in future&nbsp;articles.</li> <li>A grammar can serve as great&nbsp;documentation.</li> <li>A grammar is a good starting point even if you manually write your parser from scratch. Quite often you can just convert the grammar to code by following a set of simple&nbsp;rules.</li> <li>There is a set of tools, called <em>parser generators</em>, which accept a grammar as an input and automatically generate a parser for you based on that grammar. I will talk about those tools later on in the&nbsp;series.</li> </ol> <p>Now, let’s talk about the mechanical aspects of grammars, shall&nbsp;we?</p> <p>Here is a grammar that describes arithmetic expressions like &#8220;7 * 4 / 2 * 3&#8221; (it’s just one of the many expressions that can be generated by the&nbsp;grammar):</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf1.png" width="640"></p> <p>A grammar consists of a sequence of <em>rules</em>, also known as <em>productions</em>. There are two rules in our&nbsp;grammar:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf2.png" width="640"></p> <p>A rule consists of a <em>non-terminal</em>, called the <em><strong>head</strong></em> or <em><strong>left-hand side</strong></em> of the production, a colon, and a sequence of terminals and/or non-terminals, called the <em><strong>body</strong></em> or <em><strong>right-hand side</strong></em> of the&nbsp;production:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf3.png" width="640"></p> <p>In the grammar I showed above, tokens like <span class="caps">MUL</span>, <span class="caps">DIV</span>, and <span class="caps">INTEGER</span> are called <em><strong>terminals</strong></em> and variables like <em>expr</em> and <em>factor</em> are called <em><strong>non-terminals</strong></em>. Non-terminals usually consist of a sequence of terminals and/or&nbsp;non-terminals:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf4.png" width="640"></p> <p>The non-terminal symbol on the left side of the first rule is called the <em><strong>start symbol</strong></em>. In the case of our grammar, the start symbol is <em>expr</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf5.png" width="640"></p> <p>You can read the rule <em>expr</em> as &#8220;An <em>expr</em> can be a <em>factor</em> optionally followed by a <em>multiplication</em> or <em>division</em> operator followed by another <em>factor</em>, which in turn is optionally followed by a <em>multiplication</em> or <em>division</em> operator followed by another <em>factor</em> and so on and so&nbsp;forth.&#8221;</p> <p>What is a <em>factor</em>? For the purpose of this article a <em>factor</em> is just an&nbsp;integer.</p> <p>Let’s quickly go over the symbols used in the grammar and their&nbsp;meaning.</p> <ul> <li><strong>|</strong> - Alternatives. A bar means “or”. So (<span class="caps">MUL</span> | <span class="caps">DIV</span>) means either <span class="caps">MUL</span> or <span class="caps">DIV</span>.</li> <li><strong>( &#8230; )</strong> - An open and closing parentheses mean grouping of terminals and/or non-terminals as in (<span class="caps">MUL</span> | <span class="caps">DIV</span>).</li> <li><strong>( &#8230; )</strong>* - Match contents within the group zero or more&nbsp;times.</li> </ul> <p>If you worked with regular expressions in the past, then the symbols <strong>|</strong>, <strong>()</strong>, and <strong>(&#8230;)</strong>* should be pretty familiar to&nbsp;you.</p> <p>A grammar defines a <em>language</em> by explaining what sentences it can form. This is how you can <em>derive</em> an arithmetic expression using the grammar: first you begin with the start symbol <em>expr</em> and then repeatedly replace a non-terminal by the body of a rule for that non-terminal until you have generated a sentence consisting solely of terminals. Those sentences form a <em>language</em> defined by the&nbsp;grammar.</p> <p>If the grammar cannot derive a certain arithmetic expression, then it doesn’t support that expression and the parser will generate a syntax error when it tries to recognize the&nbsp;expression.</p> <p>I think a couple of examples are in order. This is how the grammar derives the expression <em>3</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_derive1.png" width="600"></p> <p>This is how the grammar derives the expression <em>3 * 7</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_derive2.png" width="600"></p> <p>And this is how the grammar derives the expression <em>3 * 7 / 2</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_derive3.png" width="600"></p> <p>Whoa, quite a bit of theory right&nbsp;there!</p> <p>I think when I first read about grammars, the related terminology, and all that jazz, I felt something like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf_hmm.png" width="280"></p> <p>I can assure you that I definitely was not like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf_yes.png" width="280"></p> <p>It took me some time to get comfortable with the notation, how it works, and its relationship with parsers and lexers, but I have to tell you that it pays to learn it in the long run because it’s so widely used in practice and compiler literature that you’re bound to run into it at some point. So, why not sooner rather than later?&nbsp;:)</p> <p>Now, let’s map that grammar to code,&nbsp;okay?</p> <p>Here are the guidelines that we will use to convert the grammar to source code. By following them, you can literally translate the grammar to a working&nbsp;parser:</p> <ol> <li>Each rule, <strong>R</strong>, defined in the grammar, becomes a method with the same name, and references to that rule become a method call: <em><strong>R()</strong></em>. The body of the method follows the flow of the body of the rule using the very same&nbsp;guidelines.</li> <li>Alternatives <strong>(a1 | a2 | aN)</strong> become an <em><strong>if-elif-else</strong></em>&nbsp;statement</li> <li>An optional grouping <strong>(&#8230;)*</strong> becomes a <em><strong>while</strong></em> statement that can loop over zero or more&nbsp;times</li> <li>Each token reference <strong>T</strong> becomes a call to the method <em><strong>eat</strong></em>: <em><strong>eat(T)</strong></em>. The way the <em>eat</em> method works is that it consumes the token <em>T</em> if it matches the current <em>lookahead</em> token, then it gets a new token from the lexer and assigns that token to the <em>current_token</em> internal&nbsp;variable.</li> </ol> <p>Visually the guidelines look like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_rules.png" width="780"></p> <p>Let’s get moving and convert our grammar to code following the above&nbsp;guidelines.</p> <p>There are two rules in our grammar: one <em>expr</em> rule and one <em>factor</em> rule. Let’s start with the <em>factor</em> rule (production). According to the guidelines, you need to create a method called <em>factor</em> (guideline 1) that has a single call to the <em>eat</em> method to consume the <span class="caps">INTEGER</span> token (guideline&nbsp;4):</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> </pre></div> <p>That was easy, wasn’t&nbsp;it?</p> <p>Onward!</p> <p>The rule <em>expr</em> becomes the <em>expr</em> method (again according to the guideline 1). The body of the rule starts with a reference to <em>factor</em> that becomes a <em>factor()</em> method call. The optional grouping <em>(&#8230;)*</em> becomes a <em>while</em> loop and <em>(<span class="caps">MUL</span> | <span class="caps">DIV</span>)</em> alternatives become an <em>if-elif-else</em> statement. By combining those pieces together we get the following <em>expr</em>&nbsp;method:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DIV</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> </pre></div> <p>Please spend some time and study how I mapped the grammar to the source code. Make sure you understand that part because it’ll come in handy later&nbsp;on.</p> <p>For your convenience I put the above code into the <em>parser.py</em> file that contains a lexer and a parser without an interpreter. You can download the file directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part4/parser.py">GitHub</a> and play with it. It has an interactive prompt where you can enter expressions and see if they are valid: that is, if the parser built according to the grammar can recognize the&nbsp;expressions.</p> <p>Here is a sample session that I ran on my&nbsp;computer:</p> <div class="highlight"><pre><span></span>$ python parser.py calc&gt; <span class="m">3</span> calc&gt; <span class="m">3</span> * <span class="m">7</span> calc&gt; <span class="m">3</span> * <span class="m">7</span> / <span class="m">2</span> calc&gt; <span class="m">3</span> * Traceback <span class="o">(</span>most recent call last<span class="o">)</span>: File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">155</span>, in &lt;module&gt; main<span class="o">()</span> File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">151</span>, in main parser.parse<span class="o">()</span> File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">136</span>, in parse self.expr<span class="o">()</span> File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">130</span>, in expr self.factor<span class="o">()</span> File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">114</span>, in factor self.eat<span class="o">(</span>INTEGER<span class="o">)</span> File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">107</span>, in eat self.error<span class="o">()</span> File <span class="s2">&quot;parser.py&quot;</span>, line <span class="m">97</span>, in error raise Exception<span class="o">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="o">)</span> Exception: Invalid syntax </pre></div> <p>Try it&nbsp;out!</p> <p>I couldn’t help but mention syntax diagrams again. This is how a syntax diagram for the same <em>expr</em> rule will&nbsp;look:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_sd.png" width="640"></p> <p>It’s about time we dug into the source code of our new arithmetic expression interpreter. Below is the code of a calculator that can handle valid arithmetic expressions containing integers and any number of multiplication and division (integer division) operators. You can also see that I refactored the lexical analyzer into a separate class <em>Lexer</em> and updated the <em>Interpreter</em> class to take the <em>Lexer</em> instance as a&nbsp;parameter:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="c1">#</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;MUL&#39;</span><span class="p">,</span> <span class="s1">&#39;DIV&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="c1"># token type: INTEGER, MUL, DIV, or EOF</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="c1"># token value: non-negative integer value, &#39;*&#39;, &#39;/&#39;, or None</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(MUL, &#39;*&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Lexer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;3 * 5&quot;, &quot;12 / 3 * 4&quot;, etc</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid character&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the `pos` pointer and set the `current_char` variable.&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">skip_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">def</span> <span class="nf">integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens. One token at a time.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_whitespace</span><span class="p">()</span> <span class="k">continue</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">integer</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;*&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="s1">&#39;*&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;/&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">DIV</span><span class="p">,</span> <span class="s1">&#39;/&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">lexer</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">lexer</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return an INTEGER token value.</span> <span class="sd"> factor : INTEGER</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Arithmetic expression parser / interpreter.</span> <span class="sd"> expr : factor ((MUL | DIV) factor)*</span> <span class="sd"> factor : INTEGER</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">MUL</span><span class="p">,</span> <span class="n">DIV</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MUL</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MUL</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">DIV</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DIV</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># To run under Python3 replace &#39;raw_input&#39; call</span> <span class="c1"># with &#39;input&#39;</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;calc&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">lexer</span> <span class="o">=</span> <span class="n">Lexer</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">lexer</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Save the above code into the <em>calc4.py</em> file or download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part4/calc4.py">GitHub</a>. As usual, try it out and see for yourself that it&nbsp;works.</p> <p>This is a sample session that I ran on my&nbsp;laptop:</p> <div class="highlight"><pre><span></span>$ python calc4.py calc&gt; <span class="m">7</span> * <span class="m">4</span> / <span class="m">2</span> <span class="m">14</span> calc&gt; <span class="m">7</span> * <span class="m">4</span> / <span class="m">2</span> * <span class="m">3</span> <span class="m">42</span> calc&gt; <span class="m">10</span> * <span class="m">4</span> * <span class="m">2</span> * <span class="m">3</span> / <span class="m">8</span> <span class="m">30</span> </pre></div> <p><br/> I know you couldn’t wait for this part :) Here are new exercises for&nbsp;today:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_exercises.png" width="280"></p> <ul> <li>Write a grammar that describes arithmetic expressions containing any number of +, -, *, or / operators. With the grammar you should be able to derive expressions like &#8220;2 + 7 * 4&#8221;, &#8220;7 - 8 / 4&#8221;, &#8220;14 + 2 * 3 - 6 / 2&#8221;, and so&nbsp;on.</li> <li>Using the grammar, write an interpreter that can evaluate arithmetic expressions containing any number of +, -, *, or / operators. Your interpreter should be able to handle expressions like &#8220;2 + 7 * 4&#8221;, &#8220;7 - 8 / 4&#8221;, &#8220;14 + 2 * 3 - 6 / 2&#8221;, and so&nbsp;on.</li> <li>If you’ve finished the above exercises, relax and enjoy&nbsp;:)</li> </ul> <p><br/> <strong>Check your&nbsp;understanding.</strong></p> <p>Keeping in mind the grammar from today’s article, answer the following questions, referring to the picture below as&nbsp;needed:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part4/lsbasi_part4_bnf1.png" width="640"></p> <ol> <li>What is a context-free grammar&nbsp;(grammar)?</li> <li>How many rules / productions does the grammar&nbsp;have?</li> <li>What is a terminal? (Identify all terminals in the&nbsp;picture)</li> <li>What is a non-terminal? (Identify all non-terminals in the&nbsp;picture)</li> <li>What is a head of a rule? (Identify all heads / left-hand sides in the&nbsp;picture)</li> <li>What is a body of the rule? (Identify all bodies / right-hand sides in the&nbsp;picture)</li> <li>What is the start symbol of a&nbsp;grammar?</li> </ol> <p><br/> Hey, you read all the way to the end! This post contained quite a bit of theory, so I’m really proud of you that you finished&nbsp;it.</p> <p>I’ll be back next time with a new article - stay tuned and don’t forget to do the exercises, they will do you&nbsp;good.</p> <p><br/> Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 3.2015-08-12T07:00:00-04:002015-08-12T07:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-08-12:/lsbasi-part3/<p>I woke up this morning and I thought to myself: &#8220;Why do we find it so difficult to learn a new&nbsp;skill?&#8221;</p> <p>I don’t think it’s just because of the hard work. I think that one of the reasons might be that we spend a lot of time …</p><p>I woke up this morning and I thought to myself: &#8220;Why do we find it so difficult to learn a new&nbsp;skill?&#8221;</p> <p>I don’t think it’s just because of the hard work. I think that one of the reasons might be that we spend a lot of time and hard work acquiring knowledge by reading and watching and not enough time translating that knowledge into a skill by practicing it. Take swimming, for example. You can spend a lot of time reading hundreds of books about swimming, talk for hours with experienced swimmers and coaches, watch all the training videos available, and you still will sink like a rock the first time you jump in the&nbsp;pool.</p> <p>The bottom line is: it doesn’t matter how well you think you know the subject - you have to put that knowledge into practice to turn it into a skill. To help you with the practice part I put exercises into <a href="http://ruslanspivak.com/lsbasi-part1/" title="Part 1">Part 1</a> and <a href="http://ruslanspivak.com/lsbasi-part2/" title="Part 2">Part 2</a> of the series. And yes, you will see more exercises in today’s article and in future articles, I promise&nbsp;:)</p> <p>Okay, let’s get started with today’s material, shall&nbsp;we?</p> <p><br/> So far, you’ve learned how to interpret arithmetic expressions that add or subtract two integers like &#8220;7 + 3&#8221; or &#8220;12 - 9&#8221;. Today I’m going to talk about how to parse (recognize) and interpret arithmetic expressions that have any number of plus or minus operators in it, for example &#8220;7 - 3 + 2 -&nbsp;1&#8221;.</p> <p>Graphically, the arithmetic expressions in this article can be represented with the following syntax&nbsp;diagram:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part3/lsbasi_part3_syntax_diagram.png" width="640"></p> <p>What is a syntax diagram? A <strong>syntax diagram</strong> is a graphical representation of a programming language’s syntax rules. Basically, a syntax diagram visually shows you which statements are allowed in your programming language and which are&nbsp;not.</p> <p>Syntax diagrams are pretty easy to read: just follow the paths indicated by the arrows. Some paths indicate choices. And some paths indicate&nbsp;loops.</p> <p>You can read the above syntax diagram as following: a term optionally followed by a plus or minus sign, followed by another term, which in turn is optionally followed by a plus or minus sign followed by another term and so on. You get the picture, literally. You might wonder what a <em>&#8220;term&#8221;</em> is. For the purpose of this article a <em>&#8220;term&#8221;</em> is just an&nbsp;integer.</p> <p>Syntax diagrams serve two main&nbsp;purposes:</p> <ul> <li>They graphically represent the specification (grammar) of a programming&nbsp;language.</li> <li>They can be used to help you write your parser - you can map a diagram to code by following simple&nbsp;rules.</li> </ul> <p>You’ve learned that the process of recognizing a phrase in the stream of tokens is called <strong>parsing</strong>. And the part of an interpreter or compiler that performs that job is called a <strong>parser</strong>. Parsing is also called <strong>syntax analysis</strong>, and the parser is also aptly called, you guessed it right, a <strong>syntax analyzer</strong>.</p> <p>According to the syntax diagram above, all of the following arithmetic expressions are&nbsp;valid:</p> <ul> <li>3</li> <li>3 +&nbsp;4</li> <li>7 - 3 + 2 -&nbsp;1</li> </ul> <p>Because syntax rules for arithmetic expressions in different programming languages are very similar we can use a Python shell to &#8220;test&#8221; our syntax diagram. Launch your Python shell and see for&nbsp;yourself:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; <span class="m">3</span> <span class="m">3</span> &gt;&gt;&gt; <span class="m">3</span> + <span class="m">4</span> <span class="m">7</span> &gt;&gt;&gt; <span class="m">7</span> - <span class="m">3</span> + <span class="m">2</span> - <span class="m">1</span> <span class="m">5</span> </pre></div> <p>No surprises&nbsp;here.</p> <p>The expression &#8220;3 + &#8221; is not a valid arithmetic expression though because according to the syntax diagram the plus sign must be followed by a <em>term</em> (integer), otherwise it’s a syntax error. Again, try it with a Python shell and see for&nbsp;yourself:</p> <div class="highlight"><pre><span></span>&gt;&gt;&gt; <span class="m">3</span> + File <span class="s2">&quot;&lt;stdin&gt;&quot;</span>, line <span class="m">1</span> <span class="m">3</span> + ^ SyntaxError: invalid syntax </pre></div> <p>It’s great to be able to use a Python shell to do some testing but let’s map the above syntax diagram to code and use our own interpreter for testing, all&nbsp;right?</p> <p>You know from the previous articles (<a href="http://ruslanspivak.com/lsbasi-part1/" title="Part 1">Part 1</a> and <a href="http://ruslanspivak.com/lsbasi-part2/" title="Part 2">Part 2</a>) that the <em>expr</em> method is where both our parser and interpreter live. Again, the parser just recognizes the structure making sure that it corresponds to some specifications and the interpreter actually evaluates the expression once the parser has successfully recognized (parsed)&nbsp;it.</p> <p>The following code snippet shows the parser code corresponding to the diagram. The rectangular box from the syntax diagram (<em>term</em>) becomes a <em>term</em> method that parses an integer and the <em>expr</em> method just follows the syntax diagram&nbsp;flow:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> </pre></div> <p>You can see that <em>expr</em> first calls the <em>term</em> method. Then the <em>expr</em> method has a <em>while</em> loop which can execute zero or more times. And inside the loop the parser makes a choice based on the token (whether it’s a plus or minus sign). Spend some time proving to yourself that the code above does indeed follow the syntax diagram flow for arithmetic&nbsp;expressions.</p> <p>The parser itself does not interpret anything though: if it recognizes an expression it’s silent and if it doesn’t, it throws out a syntax error. Let’s modify the <em>expr</em> method and add the interpreter&nbsp;code:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return an INTEGER token value&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Parser / Interpreter &quot;&quot;&quot;</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> </pre></div> <p>Because the interpreter needs to evaluate an expression the <em>term</em> method was modified to return an integer value and the <em>expr</em> method was modified to perform addition and subtraction at the appropriate places and return the result of interpretation. Even though the code is pretty straightforward I recommend spending some time studying&nbsp;it.</p> <p>Le’s get moving and see the complete code of the interpreter now,&nbsp;okay?</p> <p>Here is the source code for your new version of the calculator that can handle valid arithmetic expressions containing integers and any number of addition and subtraction&nbsp;operators:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="c1">#</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;PLUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MINUS&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="c1"># token type: INTEGER, PLUS, MINUS, or EOF</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="c1"># token value: non-negative integer value, &#39;+&#39;, &#39;-&#39;, or None</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(PLUS, &#39;+&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;3 + 5&quot;, &quot;12 - 5 + 3&quot;, etc</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="c1"># current token instance</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">None</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="c1">##########################################################</span> <span class="c1"># Lexer code #</span> <span class="c1">##########################################################</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the `pos` pointer and set the `current_char` variable.&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">skip_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">def</span> <span class="nf">integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens. One token at a time.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_whitespace</span><span class="p">()</span> <span class="k">continue</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">integer</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;+&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;-&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MINUS</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="c1">##########################################################</span> <span class="c1"># Parser / Interpreter code #</span> <span class="c1">##########################################################</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return an INTEGER token value.&quot;&quot;&quot;</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="k">return</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Arithmetic expression parser / interpreter.&quot;&quot;&quot;</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">):</span> <span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">elif</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MINUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">result</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">term</span><span class="p">()</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># To run under Python3 replace &#39;raw_input&#39; call</span> <span class="c1"># with &#39;input&#39;</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;calc&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Save the above code into the <em>calc3.py</em> file or download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part3/calc3.py">GitHub</a>. Try it out. See for yourself that it can handle arithmetic expressions that you can derive from the syntax diagram I showed you&nbsp;earlier.</p> <p>Here is a sample session that I ran on my&nbsp;laptop:</p> <div class="highlight"><pre><span></span>$ python calc3.py calc&gt; <span class="m">3</span> <span class="m">3</span> calc&gt; <span class="m">7</span> - <span class="m">4</span> <span class="m">3</span> calc&gt; <span class="m">10</span> + <span class="m">5</span> <span class="m">15</span> calc&gt; <span class="m">7</span> - <span class="m">3</span> + <span class="m">2</span> - <span class="m">1</span> <span class="m">5</span> calc&gt; <span class="m">10</span> + <span class="m">1</span> + <span class="m">2</span> - <span class="m">3</span> + <span class="m">4</span> + <span class="m">6</span> - <span class="m">15</span> <span class="m">5</span> calc&gt; <span class="m">3</span> + Traceback <span class="o">(</span>most recent call last<span class="o">)</span>: File <span class="s2">&quot;calc3.py&quot;</span>, line <span class="m">147</span>, in &lt;module&gt; main<span class="o">()</span> File <span class="s2">&quot;calc3.py&quot;</span>, line <span class="m">142</span>, in main <span class="nv">result</span> <span class="o">=</span> interpreter.expr<span class="o">()</span> File <span class="s2">&quot;calc3.py&quot;</span>, line <span class="m">123</span>, in expr <span class="nv">result</span> <span class="o">=</span> result + self.term<span class="o">()</span> File <span class="s2">&quot;calc3.py&quot;</span>, line <span class="m">110</span>, in term self.eat<span class="o">(</span>INTEGER<span class="o">)</span> File <span class="s2">&quot;calc3.py&quot;</span>, line <span class="m">105</span>, in eat self.error<span class="o">()</span> File <span class="s2">&quot;calc3.py&quot;</span>, line <span class="m">45</span>, in error raise Exception<span class="o">(</span><span class="s1">&#39;Invalid syntax&#39;</span><span class="o">)</span> Exception: Invalid syntax </pre></div> <p><br/> Remember those exercises I mentioned at the beginning of the article: here they are, as promised&nbsp;:)</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part3/lsbasi_part3_exercises.png" width="280"></p> <ul> <li>Draw a syntax diagram for arithmetic expressions that contain only multiplication and division, for example &#8220;7 * 4 / 2 * 3&#8221;. Seriously, just grab a pen or a pencil and try to draw&nbsp;one.</li> <li>Modify the source code of the calculator to interpret arithmetic expressions that contain only multiplication and division, for example &#8220;7 * 4 / 2 *&nbsp;3&#8221;.</li> <li>Write an interpreter that handles arithmetic expressions like &#8220;7 - 3 + 2 - 1&#8221; from scratch. Use any programming language you’re comfortable with and write it off the top of your head without looking at the examples. When you do that, think about components involved: a <em>lexer</em> that takes an input and converts it into a stream of tokens, a <em>parser</em> that feeds off the stream of the tokens provided by the <em>lexer</em> and tries to recognize a structure in that stream, and an <em>interpreter</em> that generates results after the <em>parser</em> has successfully parsed (recognized) a valid arithmetic expression. String those pieces together. Spend some time translating the knowledge you’ve acquired into a working interpreter for arithmetic&nbsp;expressions.</li> </ul> <p><strong>Check your&nbsp;understanding.</strong></p> <ol> <li>What is a syntax&nbsp;diagram?</li> <li>What is syntax&nbsp;analysis?</li> <li>What is a syntax&nbsp;analyzer?</li> </ol> <p><br/> Hey, look! You read all the way to the end. Thanks for hanging out here today and don’t forget to do the exercises. :) I’ll be back next time with a new article - stay&nbsp;tuned.</p> <p>Here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Simple Interpreter. Part 2.2015-07-03T07:00:00-04:002015-07-03T07:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-07-03:/lsbasi-part2/<p>In their amazing book &#8220;The 5 Elements of Effective Thinking&#8221; the authors Burger and Starbird share a story about how they observed Tony Plog, an internationally acclaimed trumpet virtuoso, conduct a master class for accomplished trumpet players. The students first played complex music phrases, which they played perfectly well. But …</p><p>In their amazing book &#8220;The 5 Elements of Effective Thinking&#8221; the authors Burger and Starbird share a story about how they observed Tony Plog, an internationally acclaimed trumpet virtuoso, conduct a master class for accomplished trumpet players. The students first played complex music phrases, which they played perfectly well. But then they were asked to play very basic, simple notes. When they played the notes, the notes sounded childish compared to the previously played complex phrases. After they finished playing, the master teacher also played the same notes, but when he played them, they did not sound childish. The difference was stunning. Tony explained that mastering the performance of simple notes allows one to play complex pieces with greater control. The lesson was clear - to build true virtuosity one must focus on mastering simple, basic ideas.<sup id="fnref-1"><a class="footnote-ref" href="#fn-1">1</a></sup></p> <p>The lesson in the story clearly applies not only to music but also to software development. The story is a good reminder to all of us to not lose sight of the importance of deep work on simple, basic ideas even if it sometimes feels like a step back. While it is important to be proficient with a tool or framework you use, it is also extremely important to know the principles behind them. As Ralph Waldo Emerson&nbsp;said:</p> <blockquote> <p><em><span class="dquo">&#8220;</span>If you learn only methods, you’ll be tied to your methods. But if you learn principles, you can devise your own&nbsp;methods.&#8221;</em></p> </blockquote> <p>On that note, let’s dive into interpreters and compilers&nbsp;again.</p> <p>Today I will show you a new version of the calculator from <a href="http://ruslanspivak.com/lsbasi-part1/" title="Part 1">Part 1</a> that will be able&nbsp;to:</p> <ol> <li>Handle whitespace characters anywhere in the input&nbsp;string</li> <li>Consume multi-digit integers from the&nbsp;input</li> <li>Subtract two integers (currently it can only add&nbsp;integers)</li> </ol> <p>Here is the source code for your new version of the calculator that can do all of the&nbsp;above:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">MINUS</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;PLUS&#39;</span><span class="p">,</span> <span class="s1">&#39;MINUS&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="c1"># token type: INTEGER, PLUS, MINUS, or EOF</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="c1"># token value: non-negative integer value, &#39;+&#39;, &#39;-&#39;, or None</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(PLUS &#39;+&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;3 + 5&quot;, &quot;12 - 5&quot;, etc</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="c1"># current token instance</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">None</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Error parsing input&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">advance</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Advance the &#39;pos&#39; pointer and set the &#39;current_char&#39; variable.&quot;&quot;&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">None</span> <span class="c1"># Indicates end of input</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="k">def</span> <span class="nf">skip_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">def</span> <span class="nf">integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Return a (multidigit) integer consumed from the input.&quot;&quot;&quot;</span> <span class="n">result</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isspace</span><span class="p">():</span> <span class="bp">self</span><span class="o">.</span><span class="n">skip_whitespace</span><span class="p">()</span> <span class="k">continue</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">integer</span><span class="p">())</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;+&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;-&#39;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">MINUS</span><span class="p">,</span> <span class="s1">&#39;-&#39;</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Parser / Interpreter</span> <span class="sd"> expr -&gt; INTEGER PLUS INTEGER</span> <span class="sd"> expr -&gt; INTEGER MINUS INTEGER</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="c1"># we expect the current token to be an integer</span> <span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="c1"># we expect the current token to be either a &#39;+&#39; or &#39;-&#39;</span> <span class="n">op</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="k">if</span> <span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">MINUS</span><span class="p">)</span> <span class="c1"># we expect the current token to be an integer</span> <span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="c1"># after the above call the self.current_token is set to</span> <span class="c1"># EOF token</span> <span class="c1"># at this point either the INTEGER PLUS INTEGER or</span> <span class="c1"># the INTEGER MINUS INTEGER sequence of tokens</span> <span class="c1"># has been successfully found and the method can just</span> <span class="c1"># return the result of adding or subtracting two integers,</span> <span class="c1"># thus effectively interpreting client input</span> <span class="k">if</span> <span class="n">op</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="o">+</span> <span class="n">right</span><span class="o">.</span><span class="n">value</span> <span class="k">else</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="o">-</span> <span class="n">right</span><span class="o">.</span><span class="n">value</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># To run under Python3 replace &#39;raw_input&#39; call</span> <span class="c1"># with &#39;input&#39;</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;calc&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Save the above code into the <em>calc2.py</em> file or download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part2/calc2.py">GitHub</a>. Try it out. See for yourself that it works as expected: it can handle whitespace characters anywhere in the input; it can accept multi-digit integers, and it can also subtract two integers as well as add two&nbsp;integers.</p> <p>Here is a sample session that I ran on my&nbsp;laptop:</p> <div class="highlight"><pre><span></span>$ python calc2.py calc&gt; <span class="m">27</span> + <span class="m">3</span> <span class="m">30</span> calc&gt; <span class="m">27</span> - <span class="m">7</span> <span class="m">20</span> calc&gt; </pre></div> <p>The major code changes compared with the version from <a href="http://ruslanspivak.com/lsbasi-part1/" title="Part 1">Part 1</a>&nbsp;are:</p> <ol> <li>The <em>get_next_token</em> method was refactored a bit. The logic to increment the <em>pos</em> pointer was factored into a separate method <em>advance</em>.</li> <li>Two more methods were added: <em>skip_whitespace</em> to ignore whitespace characters and <em>integer</em> to handle multi-digit integers in the&nbsp;input.</li> <li>The <em>expr</em> method was modified to recognize <span class="caps">INTEGER</span> -&gt; <span class="caps">MINUS</span> -&gt; <span class="caps">INTEGER</span> phrase in addition to <span class="caps">INTEGER</span> -&gt; <span class="caps">PLUS</span> -&gt; <span class="caps">INTEGER</span> phrase. The method now also interprets both addition and subtraction after having successfully recognized the corresponding&nbsp;phrase.</li> </ol> <p>In <a href="http://ruslanspivak.com/lsbasi-part1/" title="Part 1">Part 1</a> you learned two important concepts, namely that of a <strong>token</strong> and a <strong>lexical analyzer</strong>. Today I would like to talk a little bit about <strong>lexemes</strong>, <strong>parsing</strong>, and <strong>parsers</strong>.</p> <p>You already know about tokens. But in order for me to round out the discussion of tokens I need to mention lexemes. What is a lexeme? A <strong>lexeme</strong> is a sequence of characters that form a token. In the following picture you can see some examples of tokens and sample lexemes and hopefully it will make the relationship between them&nbsp;clear:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part2/lsbasi_part2_lexemes.png"></p> <p>Now, remember our friend, the <em>expr</em> method? I said before that that’s where the interpretation of an arithmetic expression actually happens. But before you can interpret an expression you first need to recognize what kind of phrase it is, whether it is addition or subtraction, for example. That’s what the <em>expr</em> method essentially does: it finds the structure in the stream of tokens it gets from the <em>get_next_token</em> method and then it interprets the phrase that is has recognized, generating the result of the arithmetic&nbsp;expression.</p> <p>The process of finding the structure in the stream of tokens, or put differently, the process of recognizing a phrase in the stream of tokens is called <strong>parsing</strong>. The part of an interpreter or compiler that performs that job is called a <strong>parser</strong>.</p> <p>So now you know that the <em>expr</em> method is the part of your interpreter where both <strong>parsing</strong> and <strong>interpreting</strong> happens - the <em>expr</em> method first tries to recognize (<strong>parse</strong>) the <span class="caps">INTEGER</span> -&gt; <span class="caps">PLUS</span> -&gt; <span class="caps">INTEGER</span> or the <span class="caps">INTEGER</span> -&gt; <span class="caps">MINUS</span> -&gt; <span class="caps">INTEGER</span> phrase in the stream of tokens and after it has successfully recognized (<strong>parsed</strong>) one of those phrases, the method interprets it and returns the result of either addition or subtraction of two integers to the&nbsp;caller.</p> <p>And now it’s time for exercises&nbsp;again.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part2/lsbasi_part2_exercises.png" width="280"></p> <ol> <li>Extend the calculator to handle multiplication of two&nbsp;integers</li> <li>Extend the calculator to handle division of two&nbsp;integers</li> <li>Modify the code to interpret expressions containing an arbitrary number of additions and subtractions, for example &#8220;9 - 5 + 3 +&nbsp;11&#8221;</li> </ol> <p><strong>Check your&nbsp;understanding.</strong></p> <ol> <li>What is a&nbsp;lexeme?</li> <li>What is the name of the process that finds the structure in the stream of tokens, or put differently, what is the name of the process that recognizes a certain phrase in that stream of&nbsp;tokens?</li> <li>What is the name of the part of the interpreter (compiler) that does&nbsp;parsing?</li> </ol> <p><br/> I hope you liked today’s material. In the next article of the series you will extend your calculator to handle more complex arithmetic expressions. Stay&nbsp;tuned.</p> <p>And here is a list of books I recommend that will help you in your study of interpreters and&nbsp;compilers:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p> <div class="footnote"> <hr> <ol> <li id="fn-1"> <p><a href="http://www.amazon.com/gp/product/0691156662/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0691156662&linkCode=as2&tag=russblo0b-20&linkId=B7GSVLONUPCIBIVY">The 5 Elements of Effective Thinking</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0691156662" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />&#160;<a class="footnote-backref" href="#fnref-1" title="Jump back to footnote 1 in the text">&#8617;</a></p> </li> </ol> </div>Let’s Build A Simple Interpreter. Part 1.2015-06-15T06:00:00-04:002015-06-15T06:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-06-15:/lsbasi-part1/<p><br/></p> <blockquote> <p><em><strong><span class="dquo">&#8220;</span>If you don&#8217;t know how compilers work, then you don&#8217;t know how computers work. If you&#8217;re not 100% sure whether you know how compilers work, then you don&#8217;t know how they work.&#8221;</strong> &#8212; Steve&nbsp;Yegge</em></p> </blockquote> <p>There you have it. Think about it. It doesn’t really matter …</p><p><br/></p> <blockquote> <p><em><strong><span class="dquo">&#8220;</span>If you don&#8217;t know how compilers work, then you don&#8217;t know how computers work. If you&#8217;re not 100% sure whether you know how compilers work, then you don&#8217;t know how they work.&#8221;</strong> &#8212; Steve&nbsp;Yegge</em></p> </blockquote> <p>There you have it. Think about it. It doesn’t really matter whether you’re a newbie or a seasoned software developer: if you don’t know how compilers and interpreters work, then you don’t know how computers work. It’s that&nbsp;simple.</p> <p>So, do you know how compilers and interpreters work? And I mean, are you 100% sure that you know how they work? If you&nbsp;don’t.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_i_dont_know.png" width="480"></p> <p>Or if you don’t and you’re really agitated about&nbsp;it.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_omg.png" width="480"></p> <p>Do not worry. If you stick around and work through the series and build an interpreter and a compiler with me you will know how they work in the end. And you will become a confident happy camper too. At least I hope&nbsp;so.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_i_know.png" width="480"></p> <p>Why would you study interpreters and compilers? I will give you three&nbsp;reasons.</p> <ol> <li>To write an interpreter or a compiler you have to have a lot of technical skills that you need to use together. Writing an interpreter or a compiler will help you improve those skills and become a better software developer. As well, the skills you will learn are useful in writing any software, not just interpreters or&nbsp;compilers.</li> <li>You really want to know how computers work. Often interpreters and compilers look like magic. And you shouldn’t be comfortable with that magic. You want to demystify the process of building an interpreter and a compiler, understand how they work, and get in control of&nbsp;things.</li> <li>You want to create your own programming language or domain specific language. If you create one, you will also need to create either an interpreter or a compiler for it. Recently, there has been a resurgence of interest in new programming languages. And you can see a new programming language pop up almost every day: Elixir, Go, Rust just to name a&nbsp;few.</li> </ol> <p><br/> Okay, but what are interpreters and&nbsp;compilers?</p> <p>The goal of an <strong>interpreter</strong> or a <strong>compiler</strong> is to translate a source program in some high-level language into some other form. Pretty vague, isn’t it? Just bear with me, later in the series you will learn exactly what the source program is translated&nbsp;into.</p> <p>At this point you may also wonder what the difference is between an interpreter and a compiler. For the purpose of this series, let&#8217;s agree that if a translator translates a source program into machine language, it is a <strong>compiler</strong>. If a translator processes and executes the source program without translating it into machine language first, it is an <strong>interpreter</strong>. Visually it looks something like&nbsp;this:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_compiler_interpreter.png" width="700"></p> <p>I hope that by now you’re convinced that you really want to study and build an interpreter and a compiler. What can you expect from this series on&nbsp;interpreters?</p> <p>Here is the deal. You and I are going to create a simple interpreter for a large subset of <a href="https://en.wikipedia.org/wiki/Pascal_%28programming_language%29">Pascal</a> language. At the end of this series you will have a working Pascal interpreter and a source-level debugger like Python’s <a href="https://docs.python.org/2/library/pdb.html">pdb</a>.</p> <p>You might ask, why Pascal? For one thing, it’s not a made-up language that I came up with just for this series: it’s a real programming language that has many important language constructs. And some old, but useful, <span class="caps">CS</span> books use Pascal programming language in their examples (I understand that that’s not a particularly compelling reason to choose a language to build an interpreter for, but I thought it would be nice for a change to learn a non-mainstream language&nbsp;:)</p> <p>Here is an example of a factorial function in Pascal that you will be able to interpret with your own interpreter and debug with the interactive source-level debugger that you will create along the&nbsp;way:</p> <div class="highlight"><pre><span></span><span class="k">program</span> <span class="n">factorial</span><span class="o">;</span> <span class="k">function</span> <span class="nf">factorial</span><span class="p">(</span><span class="n">n</span><span class="o">:</span> <span class="kt">integer</span><span class="p">)</span><span class="o">:</span> <span class="kt">longint</span><span class="o">;</span> <span class="k">begin</span> <span class="k">if</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">then</span> <span class="n">factorial</span> <span class="o">:=</span> <span class="mi">1</span> <span class="k">else</span> <span class="n">factorial</span> <span class="o">:=</span> <span class="n">n</span> <span class="o">*</span> <span class="n">factorial</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span><span class="o">;</span> <span class="k">end</span><span class="o">;</span> <span class="k">var</span> <span class="n">n</span><span class="o">:</span> <span class="kt">integer</span><span class="o">;</span> <span class="k">begin</span> <span class="k">for</span> <span class="n">n</span> <span class="o">:=</span> <span class="mi">0</span> <span class="k">to</span> <span class="mi">16</span> <span class="k">do</span> <span class="nb">writeln</span><span class="p">(</span><span class="n">n</span><span class="o">,</span> <span class="s">&#39;! = &#39;</span><span class="o">,</span> <span class="n">factorial</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="o">;</span> <span class="k">end</span><span class="o">.</span> </pre></div> <p>The implementation language of the Pascal interpreter will be Python, but you can use any language you want because the ideas presented don’t depend on any particular implementation language. Okay, let’s get down to business. Ready, set,&nbsp;go!</p> <p>You will start your first foray into interpreters and compilers by writing a simple interpreter of arithmetic expressions, also known as a calculator. Today the goal is pretty minimalistic: to make your calculator handle the addition of two single digit integers like <strong>3+5</strong>. Here is the source code for your calculator, sorry,&nbsp;interpreter:</p> <div class="highlight"><pre><span></span><span class="c1"># Token types</span> <span class="c1">#</span> <span class="c1"># EOF (end-of-file) token is used to indicate that</span> <span class="c1"># there is no more input left for lexical analysis</span> <span class="n">INTEGER</span><span class="p">,</span> <span class="n">PLUS</span><span class="p">,</span> <span class="n">EOF</span> <span class="o">=</span> <span class="s1">&#39;INTEGER&#39;</span><span class="p">,</span> <span class="s1">&#39;PLUS&#39;</span><span class="p">,</span> <span class="s1">&#39;EOF&#39;</span> <span class="k">class</span> <span class="nc">Token</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">type</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span> <span class="c1"># token type: INTEGER, PLUS, or EOF</span> <span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="nb">type</span> <span class="c1"># token value: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, &#39;+&#39;, or None</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;String representation of the class instance.</span> <span class="sd"> Examples:</span> <span class="sd"> Token(INTEGER, 3)</span> <span class="sd"> Token(PLUS &#39;+&#39;)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="k">return</span> <span class="s1">&#39;Token({type}, {value})&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="nb">type</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="fm">__str__</span><span class="p">()</span> <span class="k">class</span> <span class="nc">Interpreter</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="c1"># client string input, e.g. &quot;3+5&quot;</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="c1"># self.pos is an index into self.text</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">=</span> <span class="mi">0</span> <span class="c1"># current token instance</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">None</span> <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Error parsing input&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Lexical analyzer (also known as scanner or tokenizer)</span> <span class="sd"> This method is responsible for breaking a sentence</span> <span class="sd"> apart into tokens. One token at a time.</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">text</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="c1"># is self.pos index past the end of the self.text ?</span> <span class="c1"># if so, then return EOF token because there is no more</span> <span class="c1"># input left to convert into tokens</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">&gt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span> <span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="c1"># get a character at the position self.pos and decide</span> <span class="c1"># what token to create based on the single character</span> <span class="n">current_char</span> <span class="o">=</span> <span class="n">text</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">pos</span><span class="p">]</span> <span class="c1"># if the character is a digit then convert it to</span> <span class="c1"># integer, create an INTEGER token, increment self.pos</span> <span class="c1"># index to point to the next character after the digit,</span> <span class="c1"># and return the INTEGER token</span> <span class="k">if</span> <span class="n">current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span> <span class="n">token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">current_char</span><span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">return</span> <span class="n">token</span> <span class="k">if</span> <span class="n">current_char</span> <span class="o">==</span> <span class="s1">&#39;+&#39;</span><span class="p">:</span> <span class="n">token</span> <span class="o">=</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="n">current_char</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">return</span> <span class="n">token</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">eat</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token_type</span><span class="p">):</span> <span class="c1"># compare the current token type with the passed token</span> <span class="c1"># type and if they match then &quot;eat&quot; the current token</span> <span class="c1"># and assign the next token to the self.current_token,</span> <span class="c1"># otherwise raise an exception.</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">token_type</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="k">else</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span> <span class="k">def</span> <span class="nf">expr</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;expr -&gt; INTEGER PLUS INTEGER&quot;&quot;&quot;</span> <span class="c1"># set current token to the first token taken from the input</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="c1"># we expect the current token to be a single-digit integer</span> <span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="c1"># we expect the current token to be a &#39;+&#39; token</span> <span class="n">op</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span> <span class="c1"># we expect the current token to be a single-digit integer</span> <span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span> <span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">)</span> <span class="c1"># after the above call the self.current_token is set to</span> <span class="c1"># EOF token</span> <span class="c1"># at this point INTEGER PLUS INTEGER sequence of tokens</span> <span class="c1"># has been successfully found and the method can just</span> <span class="c1"># return the result of adding two integers, thus</span> <span class="c1"># effectively interpreting client input</span> <span class="n">result</span> <span class="o">=</span> <span class="n">left</span><span class="o">.</span><span class="n">value</span> <span class="o">+</span> <span class="n">right</span><span class="o">.</span><span class="n">value</span> <span class="k">return</span> <span class="n">result</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="c1"># To run under Python3 replace &#39;raw_input&#39; call</span> <span class="c1"># with &#39;input&#39;</span> <span class="n">text</span> <span class="o">=</span> <span class="nb">raw_input</span><span class="p">(</span><span class="s1">&#39;calc&gt; &#39;</span><span class="p">)</span> <span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span> <span class="k">break</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span> <span class="k">continue</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="n">text</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p><br/> Save the above code into <em>calc1.py</em> file or download it directly from <a href="https://github.com/rspivak/lsbasi/blob/master/part1/calc1.py">GitHub</a>. Before you start digging deeper into the code, run the calculator on the command line and see it in action. Play with it! Here is a sample session on my laptop (if you want to run the calculator under Python3 you will need to replace <em>raw_input</em> with <em>input</em>):</p> <div class="highlight"><pre><span></span>$ python calc1.py calc&gt; <span class="m">3</span>+4 <span class="m">7</span> calc&gt; <span class="m">3</span>+5 <span class="m">8</span> calc&gt; <span class="m">3</span>+9 <span class="m">12</span> calc&gt; </pre></div> <p>For your simple calculator to work properly without throwing an exception, your input needs to follow certain&nbsp;rules:</p> <ul> <li>Only single digit integers are allowed in the&nbsp;input</li> <li>The only arithmetic operation supported at the moment is&nbsp;addition</li> <li>No whitespace characters are allowed anywhere in the&nbsp;input</li> </ul> <p>Those restrictions are necessary to make the calculator simple. Don’t worry, you’ll make it pretty complex pretty&nbsp;soon.</p> <p>Okay, now let’s dive in and see how your interpreter works and how it evaluates arithmetic&nbsp;expressions.</p> <p>When you enter an expression <em>3+5</em> on the command line your interpreter gets a string <em>&#8220;3+5&#8221;</em>. In order for the interpreter to actually understand what to do with that string it first needs to break the input <em>&#8220;3+5&#8221;</em> into components called <strong>tokens</strong>. A <strong>token</strong> is an object that has a type and a value. For example, for the string <em>&#8220;3&#8221;</em> the type of the token will be <span class="caps">INTEGER</span> and the corresponding value will be integer <em>3</em>.</p> <p>The process of breaking the input string into tokens is called <strong>lexical analysis</strong>. So, the first step your interpreter needs to do is read the input of characters and convert it into a stream of tokens. The part of the interpreter that does it is called a <strong>lexical analyzer</strong>, or <strong>lexer</strong> for short. You might also encounter other names for the same component, like <strong>scanner</strong> or <strong>tokenizer</strong>. They all mean the same: the part of your interpreter or compiler that turns the input of characters into a stream of&nbsp;tokens.</p> <p>The method <em>get_next_token</em> of the <em>Interpreter</em> class is your lexical analyzer. Every time you call it, you get the next token created from the input of characters passed to the interpreter. Let’s take a closer look at the method itself and see how it actually does its job of converting characters into tokens. The input is stored in the variable <em>text</em> that holds the input string and <em>pos</em> is an index into that string (think of the string as an array of characters). <em>pos</em> is initially set to 0 and points to the character <em>&#8216;3&#8217;</em>. The method first checks whether the character is a digit and if so, it increments <em>pos</em> and returns a token instance with the type <span class="caps">INTEGER</span> and the value set to the integer value of the string <em>&#8216;3&#8217;</em>, which is an integer <em>3</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer1.png" width="640"></p> <p>The <em>pos</em> now points to the <em>&#8216;+&#8217;</em> character in the <em>text</em>. The next time you call the method, it tests if a character at the position <em>pos</em> is a digit and then it tests if the character is a plus sign, which it is. As a result the method increments <em>pos</em> and returns a newly created token with the type <span class="caps">PLUS</span> and value <em>&#8216;+&#8217;</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer2.png" width="640"></p> <p>The <em>pos</em> now points to character <em>&#8216;5&#8217;</em>. When you call the <em>get_next_token</em> method again the method checks if it’s a digit, which it is, so it increments <em>pos</em> and returns a new <span class="caps">INTEGER</span> token with the value of the token set to integer <em>5</em>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer3.png" width="640"></p> <p>Because the <em>pos</em> index is now past the end of the string <em>&#8220;3+5&#8221;</em> the <em>get_next_token</em> method returns the <span class="caps">EOF</span> token every time you call&nbsp;it:</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_lexer4.png" width="640"></p> <p>Try it out and see for yourself how the lexer component of your calculator&nbsp;works:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">from</span> <span class="nn">calc1</span> <span class="kn">import</span> <span class="n">Interpreter</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">interpreter</span> <span class="o">=</span> <span class="n">Interpreter</span><span class="p">(</span><span class="s1">&#39;3+5&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="n">Token</span><span class="p">(</span><span class="n">PLUS</span><span class="p">,</span> <span class="s1">&#39;+&#39;</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">interpreter</span><span class="o">.</span><span class="n">get_next_token</span><span class="p">()</span> <span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> </pre></div> <p>So now that your interpreter has access to the stream of tokens made from the input characters, the interpreter needs to do something with it: it needs to find the structure in the flat stream of tokens it gets from the lexer <em>get_next_token</em>. Your interpreter expects to find the following structure in that stream: <span class="caps">INTEGER</span> -&gt; <span class="caps">PLUS</span> -&gt; <span class="caps">INTEGER</span>. That is, it tries to find a sequence of tokens: integer followed by a plus sign followed by an&nbsp;integer.</p> <p>The method responsible for finding and interpreting that structure is <em>expr</em>. This method verifies that the sequence of tokens does indeed correspond to the expected sequence of tokens, i.e <span class="caps">INTEGER</span> -&gt; <span class="caps">PLUS</span> -&gt; <span class="caps">INTEGER</span>. After it’s successfully confirmed the structure, it generates the result by adding the value of the token on the left side of the <span class="caps">PLUS</span> and the right side of the <span class="caps">PLUS</span>, thus successfully interpreting the arithmetic expression you passed to the&nbsp;interpreter.</p> <p>The <em>expr</em> method itself uses the helper method <em>eat</em> to verify that the token type passed to the <em>eat</em> method matches the current token type. After matching the passed token type the <em>eat</em> method gets the next token and assigns it to the <em>current_token</em> variable, thus effectively &#8220;eating&#8221; the currently matched token and advancing the imaginary pointer in the stream of tokens. If the structure in the stream of tokens doesn’t correspond to the expected <span class="caps">INTEGER</span> <span class="caps">PLUS</span> <span class="caps">INTEGER</span> sequence of tokens the <em>eat</em> method throws an&nbsp;exception.</p> <p>Let’s recap what your interpreter does to evaluate an arithmetic&nbsp;expression:</p> <ul> <li>The interpreter accepts an input string, let’s say&nbsp;“3+5”</li> <li>The interpreter calls the <em>expr</em> method to find a structure in the stream of tokens returned by the lexical analyzer <em>get_next_token</em>. The structure it tries to find is of the form <span class="caps">INTEGER</span> <span class="caps">PLUS</span> <span class="caps">INTEGER</span>. After it’s confirmed the structure, it interprets the input by adding the values of two <span class="caps">INTEGER</span> tokens because it’s clear to the interpreter at that point that what it needs to do is add two integers, 3 and&nbsp;5.</li> </ul> <p>Congratulate yourself. You’ve just learned how to build your very first&nbsp;interpreter!</p> <p>Now it’s time for&nbsp;exercises.</p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_exercises2.png" width="320"></p> <p>You didn’t think you would just read this article and that would be enough, did you? Okay, get your hands dirty and do the following&nbsp;exercises:</p> <ol> <li>Modify the code to allow multiple-digit integers in the input, for example&nbsp;&#8220;12+3&#8221;</li> <li>Add a method that skips whitespace characters so that your calculator can handle inputs with whitespace characters like &#8221; 12 +&nbsp;3&#8221;</li> <li>Modify the code and instead of &#8216;+&#8217; handle &#8216;-&#8216; to evaluate subtractions like&nbsp;&#8220;7-5&#8221;</li> </ol> <p><strong>Check your&nbsp;understanding</strong></p> <ol> <li>What is an&nbsp;interpreter?</li> <li>What is a&nbsp;compiler?</li> <li>What’s the difference between an interpreter and a&nbsp;compiler?</li> <li>What is a&nbsp;token?</li> <li>What is the name of the process that breaks input apart into&nbsp;tokens?</li> <li>What is the part of the interpreter that does lexical analysis&nbsp;called?</li> <li>What are the other common names for that part of an interpreter or a&nbsp;compiler?</li> </ol> <p>Before I finish this article, I really want you to commit to studying interpreters and compilers. And I want you to do it right now. Don’t put it on the back burner. Don’t wait. If you’ve skimmed the article, start over. If you’ve read it carefully but haven’t done exercises - do them now. If you’ve done only some of them, finish the rest. You get the idea. And you know what? Sign the commitment pledge to start learning about interpreters and compilers today! <br/> <br/></p> <p><i> I, <strong><em>_</em></strong><strong><em>_</em></strong><strong><em>_</em></strong><strong><em>_</em></strong>____, of being sound mind and body, do hereby pledge to commit to studying interpreters and compilers starting today and get to a point where I know 100% how they&nbsp;work!</p> <p>Signature:</p> <p>Date: </i></p> <p><img alt="" src="https://ruslanspivak.com/lsbasi-part1/lsbasi_part1_commitment_pledge.png" width="480"></p> <p>Sign it, date it, and put it somewhere where you can see it every day to make sure that you stick to your commitment. And keep in mind the definition of&nbsp;commitment:</p> <blockquote> <p><span class="dquo">&#8220;</span>Commitment is doing the thing you said you were going to do long after the mood you said it in has left you.&#8221; &#8212; Darren&nbsp;Hardy</p> </blockquote> <p>Okay, that’s it for today. In the next article of the mini series you will extend your calculator to handle more arithmetic expressions. Stay&nbsp;tuned.</p> <p>If you can’t wait for the second article and are chomping at the bit to start digging deeper into interpreters and compilers, here is a list of books I recommend that will help you along the&nbsp;way:</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0470177071/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0470177071&linkCode=as2&tag=russblo0b-20&linkId=UCLGQTPIYSWYKRRM">Writing Compilers and Interpreters: A Software Engineering Approach</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0470177071" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/052182060X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=052182060X&linkCode=as2&tag=russblo0b-20&linkId=ZSKKZMV7YWR22NMW">Modern Compiler Implementation in Java</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=052182060X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1461446988/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1461446988&linkCode=as2&tag=russblo0b-20&linkId=PAXWJP5WCPZ7RKRD">Modern Compiler Design</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1461446988" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p></p> <p></p>Let’s Build A Web Server. Part 3.2015-05-20T06:00:00-04:002015-05-20T06:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-05-20:/lsbaws-part3/<blockquote> <p><em><span class="dquo">&#8220;</span>We learn most when we have to invent&#8221;&nbsp;&#8212;Piaget</em></p> </blockquote> <p>In <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a> you created a minimalistic <span class="caps">WSGI</span> server that could handle basic <span class="caps">HTTP</span> <span class="caps">GET</span> requests. And I asked you a question, &#8220;How can you make your server handle more than one request at a time?&#8221; In this article you will …</p><blockquote> <p><em><span class="dquo">&#8220;</span>We learn most when we have to invent&#8221;&nbsp;&#8212;Piaget</em></p> </blockquote> <p>In <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a> you created a minimalistic <span class="caps">WSGI</span> server that could handle basic <span class="caps">HTTP</span> <span class="caps">GET</span> requests. And I asked you a question, &#8220;How can you make your server handle more than one request at a time?&#8221; In this article you will find the answer. So, buckle up and shift into high gear. You’re about to have a really fast ride. Have your Linux, Mac <span class="caps">OS</span> X (or any *nix system) and Python ready. All source code from the article is available on <a href="https://github.com/rspivak/lsbaws/blob/master/part3/">GitHub</a>.</p> <p>First let’s remember what a very basic Web server looks like and what the server needs to do to service client requests. The server you created in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a> and <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a> is an iterative server that handles one client request at a time. It cannot accept a new connection until after it has finished processing a current client request. Some clients might be unhappy with it because they will have to wait in line, and for busy servers the line might be too&nbsp;long.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it1.png" width="640"></p> <p>Here is the code of the iterative server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3a.py">webserver3a.py</a>:</p> <div class="highlight"><pre><span></span><span class="c1">#####################################################################</span> <span class="c1"># Iterative server - webserver3a.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1">#####################################################################</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">5</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>To observe your server handling only one client request at a time, modify the server a little bit and add a 60 second delay after sending a response to a client. The change is only one line to tell the server process to sleep for 60&nbsp;seconds.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it2.png" width="640"></p> <p>And here is the code of the sleeping server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a>:</p> <div class="highlight"><pre><span></span><span class="c1">#########################################################################</span> <span class="c1"># Iterative server - webserver3b.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1"># #</span> <span class="c1"># - Server sleeps for 60 seconds after sending a response to a client #</span> <span class="c1">#########################################################################</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="kn">import</span> <span class="nn">time</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">5</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">60</span><span class="p">)</span> <span class="c1"># sleep and block the process for 60 seconds</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>Start the server&nbsp;with:</p> <div class="highlight"><pre><span></span>$ python webserver3b.py </pre></div> <p>Now open up a new terminal window and run the <em>curl</em> command. You should instantly see the <em>&#8220;Hello, World!&#8221;</em> string printed on the&nbsp;screen:</p> <div class="highlight"><pre><span></span>$ curl http://localhost:8888/hello Hello, World! </pre></div> <p>And without delay open up a second terminal window and run the same <em>curl</em>&nbsp;command:</p> <div class="highlight"><pre><span></span>$ curl http://localhost:8888/hello </pre></div> <p>If you’ve done that within 60 seconds then the second <em>curl</em> should not produce any output right away and should just hang there. The server shouldn’t print a new request body on its standard output either. Here is how it looks like on my Mac (the window at the bottom right corner highlighted in yellow shows the second <em>curl</em> command hanging, waiting for the connection to be accepted by the&nbsp;server):</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it3.png"></p> <p>After you’ve waited long enough (more than 60 seconds) you should see the first <em>curl</em> terminate and the second <em>curl</em> print <em>&#8220;Hello, World!&#8221;</em> on the screen, then hang for 60 seconds, and then&nbsp;terminate:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it4.png"></p> <p>The way it works is that the server finishes servicing the first <em>curl</em> client request and then it starts handling the second request only after it sleeps for 60 seconds. It all happens sequentially, or iteratively, one step, or in our case one client request, at a&nbsp;time.</p> <p>Let’s talk about the communication between clients and servers for a bit. In order for two programs to communicate with each other over a network, they have to use sockets. And you saw sockets both in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a> and <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a>. But what is a&nbsp;socket?</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_socket.png" width="480"></p> <p>A <em>socket</em> is an abstraction of a communication endpoint and it allows your program to communicate with another program using file descriptors. In this article I’ll be talking specifically about <span class="caps">TCP</span>/<span class="caps">IP</span> sockets on Linux/Mac <span class="caps">OS</span> X. An important notion to understand is the <span class="caps">TCP</span> socket&nbsp;pair.</p> <blockquote> <p>The <em>socket pair</em> for a <span class="caps">TCP</span> connection is a 4-tuple that identifies two endpoints of the <span class="caps">TCP</span> connection: the local <span class="caps">IP</span> address, local port, foreign <span class="caps">IP</span> address, and foreign port. A socket pair uniquely identifies every <span class="caps">TCP</span> connection on a network. The two values that identify each endpoint, an <span class="caps">IP</span> address and a port number, are often called a <em>socket</em>.<sup id="fnref2-1"><a class="footnote-ref" href="#fn-1">1</a></sup></p> </blockquote> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_socketpair.png"></p> <p>So, the tuple {10.10.10.2:49152, 12.12.12.3:8888} is a socket pair that uniquely identifies two endpoints of the <span class="caps">TCP</span> connection on the client and the tuple {12.12.12.3:8888, 10.10.10.2:49152} is a socket pair that uniquely identifies the same two endpoints of the <span class="caps">TCP</span> connection on the server. The two values that identify the server endpoint of the <span class="caps">TCP</span> connection, the <span class="caps">IP</span> address 12.12.12.3 and the port 8888, are referred to as a socket in this case (the same applies to the client&nbsp;endpoint).</p> <p>The standard sequence a server usually goes through to create a socket and start accepting client connections is the&nbsp;following:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_server_socket_sequence.png" width="420"></p> <ol> <li> <p>The server creates a <span class="caps">TCP</span>/<span class="caps">IP</span> socket. This is done with the following statement in&nbsp;Python:</p> <div class="highlight"><pre><span></span>listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) </pre></div> </li> <li> <p>The server might set some socket options (this is optional, but you can see that the server code above does just that to be able to re-use the same address over and over again if you decide to kill and re-start the server right&nbsp;away).</p> <div class="highlight"><pre><span></span>listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) </pre></div> </li> <li> <p>Then, the server binds the address. The <em>bind</em> function assigns a local protocol address to the socket. With <span class="caps">TCP</span>, calling <em>bind</em> lets you specify a port number, an <span class="caps">IP</span> address, both, or neither.<sup id="fnref-1"><a class="footnote-ref" href="#fn-1">1</a></sup></p> <div class="highlight"><pre><span></span>listen_socket.bind(SERVER_ADDRESS) </pre></div> </li> <li> <p>Then, the server makes the socket a listening&nbsp;socket</p> <div class="highlight"><pre><span></span>listen_socket.listen(REQUEST_QUEUE_SIZE) </pre></div> </li> </ol> <p>The <em>listen</em> method is only called by <em>servers</em>. It tells the kernel that it should accept incoming connection requests for this&nbsp;socket.</p> <p>After that’s done, the server starts accepting client connections one connection at a time in a loop. When there is a connection available the <em>accept</em> call returns the connected client socket. Then, the server reads the request data from the connected client socket, prints the data on its standard output and sends a message back to the client. Then, the server closes the client connection and it is ready again to accept a new client&nbsp;connection.</p> <p>Here is what a client needs to do to communicate with the server over <span class="caps">TCP</span>/<span class="caps">IP</span>:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_client_socket_sequence.png" width="460"></p> <p>Here is the sample code for a client to connect to your server, send a request and print the&nbsp;response:</p> <div class="highlight"><pre><span></span> <span class="kn">import</span> <span class="nn">socket</span> <span class="c1"># create a socket and connect to a server</span> <span class="n">sock</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">sock</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s1">&#39;localhost&#39;</span><span class="p">,</span> <span class="mi">8888</span><span class="p">))</span> <span class="c1"># send and receive some data</span> <span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="sa">b</span><span class="s1">&#39;test&#39;</span><span class="p">)</span> <span class="n">data</span> <span class="o">=</span> <span class="n">sock</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> </pre></div> <p>After creating the socket, the client needs to connect to the server. This is done with the <em>connect</em>&nbsp;call:</p> <div class="highlight"><pre><span></span><span class="n">sock</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s1">&#39;localhost&#39;</span><span class="p">,</span> <span class="mi">8888</span><span class="p">))</span> </pre></div> <p>The client only needs to provide the remote <span class="caps">IP</span> address or host name and the remote port number of a server to connect&nbsp;to.</p> <p>You’ve probably noticed that the client doesn’t call <em>bind</em> and <em>accept</em>. The client doesn’t need to call <em>bind</em> because the client doesn&#8217;t care about the local <span class="caps">IP</span> address and the local port number. The <span class="caps">TCP</span>/<span class="caps">IP</span> stack within the kernel automatically assigns the local <span class="caps">IP</span> address and the local port when the client calls <em>connect</em>. The local port is called an <em>ephemeral port</em>, i.e. a short-lived&nbsp;port.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_ephemeral_port.png"></p> <p>A port on a server that identifies a well-known service that a client connects to is called a <em>well-known</em> port (for example, 80 for <span class="caps">HTTP</span> and 22 for <span class="caps">SSH</span>). Fire up your Python shell and make a client connection to the server you run on localhost and see what ephemeral port the kernel assigns to the socket you’ve created (start the server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3a.py">webserver3a.py</a> or <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a> before trying the following&nbsp;example):</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sock</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sock</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s1">&#39;localhost&#39;</span><span class="p">,</span> <span class="mi">8888</span><span class="p">))</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">host</span><span class="p">,</span> <span class="n">port</span> <span class="o">=</span> <span class="n">sock</span><span class="o">.</span><span class="n">getsockname</span><span class="p">()[:</span><span class="mi">2</span><span class="p">]</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">host</span><span class="p">,</span> <span class="n">port</span> <span class="p">(</span><span class="s1">&#39;127.0.0.1&#39;</span><span class="p">,</span> <span class="mi">60589</span><span class="p">)</span> </pre></div> <p>In the case above the kernel assigned the <em>ephemeral port</em> 60589 to the&nbsp;socket.</p> <p>There are some other important concepts that I need to cover quickly before I get to answer the question from <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a>. You will see shortly why this is important. The two concepts are that of a <em>process</em> and a <em>file descriptor</em>.</p> <p>What is a process? A <em>process</em> is just an instance of an executing program. When the server code is executed, for example, it’s loaded into memory and an instance of that executing program is called a process. The kernel records a bunch of information about the process - its process <span class="caps">ID</span> would be one example - to keep track of it. When you run your iterative server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3a.py">webserver3a.py</a> or <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a> you run just one&nbsp;process.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_server_process.png"></p> <p>Start the server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a> in a terminal&nbsp;window:</p> <div class="highlight"><pre><span></span>$ python webserver3b.py </pre></div> <p>And in a different terminal window use the <em>ps</em> command to get the information about that&nbsp;process:</p> <div class="highlight"><pre><span></span>$ ps <span class="p">|</span> grep webserver3b <span class="p">|</span> grep -v grep <span class="m">7182</span> ttys003 <span class="m">0</span>:00.04 python webserver3b.py </pre></div> <p>The <em>ps</em> command shows you that you have indeed run just one Python process <em>webserver3b</em>. When a process gets created the kernel assigns a process <span class="caps">ID</span> to it, <span class="caps">PID</span>. In <span class="caps">UNIX</span>, every user process also has a parent that, in turn, has its own process <span class="caps">ID</span> called parent process <span class="caps">ID</span>, or <span class="caps">PPID</span> for short. I assume that you run a <span class="caps">BASH</span> shell by default and when you start the server, a new process gets created with a <span class="caps">PID</span> and its parent <span class="caps">PID</span> is set to the <span class="caps">PID</span> of the <span class="caps">BASH</span>&nbsp;shell.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_ppid_pid.png"></p> <p>Try it out and see for yourself how it all works. Fire up your Python shell again, which will create a new process, and then get the <span class="caps">PID</span> of the Python shell process and the parent <span class="caps">PID</span> (the <span class="caps">PID</span> of your <span class="caps">BASH</span> shell) using <a href="https://docs.python.org/2.7/library/os.html#os.getpid">os.getpid()</a> and <a href="https://docs.python.org/2.7/library/os.html#os.getppid">os.getppid()</a> system calls. Then, in another terminal window run <em>ps</em> command and grep for the <span class="caps">PPID</span> (parent process <span class="caps">ID</span>, which in my case is 3148). In the screenshot below you can see an example of a parent-child relationship between my child Python shell process and the parent <span class="caps">BASH</span> shell process on my Mac <span class="caps">OS</span>&nbsp;X:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_pid_ppid_screenshot.png"></p> <p>Another important concept to know is that of a <em>file descriptor</em>. So what is a file descriptor? A <em>file descriptor</em> is a non-negative integer that the kernel returns to a process when it opens an existing file, creates a new file or when it creates a new socket. You’ve probably heard that in <span class="caps">UNIX</span> everything is a file. The kernel refers to the open files of a process by a file descriptor. When you need to read or write a file you identify it with the file descriptor. Python gives you high-level objects to deal with files (and sockets) and you don’t have to use file descriptors directly to identify a file but, under the hood, that’s how files and sockets are identified in <span class="caps">UNIX</span>: by their integer file&nbsp;descriptors.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_process_descriptors.png"></p> <p>By default, <span class="caps">UNIX</span> shells assign file descriptor 0 to the standard input of a process, file descriptor 1 to the standard output of the process and file descriptor 2 to the standard&nbsp;error.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it_default_descriptors.png" width="640"></p> <p>As I mentioned before, even though Python gives you a high-level file or file-like object to work with, you can always use the <em>fileno()</em> method on the object to get the file descriptor associated with the file. Back to your Python shell to see how you can do&nbsp;that:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdin</span> <span class="o">&lt;</span><span class="nb">open</span> <span class="nb">file</span> <span class="s1">&#39;&lt;stdin&gt;&#39;</span><span class="p">,</span> <span class="n">mode</span> <span class="s1">&#39;r&#39;</span> <span class="n">at</span> <span class="mh">0x102beb0c0</span><span class="o">&gt;</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdin</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span> <span class="mi">0</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span> <span class="mi">1</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span> <span class="mi">2</span> </pre></div> <p>And while working with files and sockets in Python, you’ll usually be using a high-level file/socket object, but there may be times where you need to use a file descriptor directly. Here is an example of how you can write a string to the standard output using a <a href="https://docs.python.org/2.7/library/os.html#os.write">write</a> system call that takes a file descriptor integer as a&nbsp;parameter:</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">os</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">res</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span> <span class="s1">&#39;hello</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="n">hello</span> </pre></div> <p>And here is an interesting part - which should not be surprising to you anymore because you already know that everything is a file in Unix - your socket also has a file descriptor associated with it. Again, when you create a socket in Python you get back an object and not a non-negative integer, but you can always get direct access to the integer file descriptor of the socket with the <em>fileno()</em> method that I mentioned&nbsp;earlier.</p> <div class="highlight"><pre><span></span><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sock</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="o">&gt;&gt;&gt;</span> <span class="n">sock</span><span class="o">.</span><span class="n">fileno</span><span class="p">()</span> <span class="mi">3</span> </pre></div> <p>One more thing I wanted to mention: have you noticed that in the second example of the iterative server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a>, when the server process was sleeping for 60 seconds you could still connect to the server with the second <em>curl</em> command? Sure, the <em>curl</em> didn’t output anything right away and it was just hanging out there but how come the server was not <em>accept</em> ing a connection at the time and the client was not rejected right away, but instead was able to connect to the server? The answer to that is the <em>listen</em> method of a socket object and its <span class="caps">BACKLOG</span> argument, which I called REQUEST_QUEUE_SIZE in the code. The <span class="caps">BACKLOG</span> argument determines the size of a queue within the kernel for incoming connection requests. When the server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a> was sleeping, the second <em>curl</em> command that you ran was able to connect to the server because the kernel had enough space available in the incoming connection request queue for the server&nbsp;socket.</p> <p>While increasing the <span class="caps">BACKLOG</span> argument does not magically turn your server into a server that can handle multiple client requests at a time, it is important to have a fairly large backlog parameter for busy servers so that the <em>accept</em> call would not have to wait for a new connection to be established but could grab the new connection off the queue right away and start processing a client request without&nbsp;delay.</p> <p>Whoo-hoo! You’ve covered a lot of ground. Let’s quickly recap what you’ve learned (or refreshed if it’s all basics to you) so&nbsp;far.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_checkpoint.png"></p> <blockquote> <ul> <li>Iterative&nbsp;server</li> <li>Server socket creation sequence (socket, bind, listen,&nbsp;accept)</li> <li>Client connection creation sequence (socket,&nbsp;connect)</li> <li>Socket&nbsp;pair</li> <li>Socket</li> <li>Ephemeral port and well-known&nbsp;port</li> <li>Process</li> <li>Process <span class="caps">ID</span> (<span class="caps">PID</span>), parent process <span class="caps">ID</span> (<span class="caps">PPID</span>), and the parent-child&nbsp;relationship.</li> <li>File&nbsp;descriptors</li> <li>The meaning of the <span class="caps">BACKLOG</span> argument of the <em>listen</em> socket&nbsp;method</li> </ul> </blockquote> <p><br/> Now I am ready to answer the question from <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a>: “How can you make your server handle more than one request at a time?” Or put another way, “How do you write a concurrent&nbsp;server?”</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc2_service_clients.png"></p> <p>The simplest way to write a concurrent server under Unix is to use a <a href="https://docs.python.org/2.7/library/os.html#os.fork">fork()</a> system&nbsp;call.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_fork.png"></p> <p>Here is the code of your new shiny concurrent server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3c.py">webserver3c.py</a> that can handle multiple client requests at the same time (as in our iterative server example <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a>, every child process sleeps for 60&nbsp;secs):</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_it2.png" width="640"></p> <div class="highlight"><pre><span></span><span class="c1">###########################################################################</span> <span class="c1"># Concurrent server - webserver3c.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1"># #</span> <span class="c1"># - Child process sleeps for 60 seconds after handling a client&#39;s request #</span> <span class="c1"># - Parent and child processes close duplicate descriptors #</span> <span class="c1"># #</span> <span class="c1">###########################################################################</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="kn">import</span> <span class="nn">time</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">5</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span> <span class="s1">&#39;Child PID: {pid}. Parent PID {ppid}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span> <span class="n">pid</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">(),</span> <span class="n">ppid</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getppid</span><span class="p">(),</span> <span class="p">)</span> <span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">60</span><span class="p">)</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Parent PID (PPID): {pid}</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">pid</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">getpid</span><span class="p">()))</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># child</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close child copy</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">os</span><span class="o">.</span><span class="n">_exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># child exits here</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># parent</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close parent copy and loop over</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>Before diving in and discussing how <em>fork</em> works, try it, and see for yourself that the server can indeed handle multiple client requests at the same time, unlike its iterative counterparts <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3a.py">webserver3a.py</a> and <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3b.py">webserver3b.py</a>. Start the server on the command line&nbsp;with:</p> <div class="highlight"><pre><span></span>$ python webserver3c.py </pre></div> <p>And try the same two <em>curl</em> commands you’ve tried before with the iterative server and see for yourself that, now, even though the server child process sleeps for 60 seconds after serving a client request, it doesn’t affect other clients because they are served by different and completely independent processes. You should see your <em>curl</em> commands output <em>&#8220;Hello, World!&#8221;</em> instantly and then hang for 60 secs. You can keep on running as many <em>curl</em> commands as you want (well, almost as many as you want :) and all of them will output the server’s response <em>&#8220;Hello, World&#8221;</em> immediately and without any noticeable delay. Try&nbsp;it.</p> <p>The most important point to understand about <a href="https://docs.python.org/2.7/library/os.html#os.fork">fork()</a> is that you call <em>fork</em> once but it returns twice: once in the parent process and once in the child process. When you fork a new process the process <span class="caps">ID</span> returned to the child process is 0. When the <em>fork</em> returns in the parent process it returns the child’s <span class="caps">PID</span>.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc2_how_fork_works.png" width="580"></p> <p>I still remember how fascinated I was by <em>fork</em> when I first read about it and tried it. It looked like magic to me. Here I was reading a sequential code and then “boom!”: the code cloned itself and now there were two instances of the same code running concurrently. I thought it was nothing short of magic,&nbsp;seriously.</p> <p>When a parent forks a new child, the child process gets a copy of the parent’s file&nbsp;descriptors:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc2_shared_descriptors.png" width="580"></p> <p>You’ve probably noticed that the parent process in the code above closed the client&nbsp;connection:</p> <div class="highlight"><pre><span></span><span class="k">else</span><span class="p">:</span> <span class="c1"># parent</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close parent copy and loop over</span> </pre></div> <p>So how come a child process is still able to read the data from a client socket if its parent closed the very same socket? The answer is in the picture above. The kernel uses descriptor reference counts to decide whether to close a socket or not. It closes the socket only when its descriptor reference count becomes 0. When your server creates a child process, the child gets the copy of the parent’s file descriptors and the kernel increments the reference counts for those descriptors. In the case of one parent and one child, the descriptor reference count would be 2 for the client socket and when the parent process in the code above closes the client connection socket, it merely decrements its reference count which becomes 1, not small enough to cause the kernel to close the socket. The child process also closes the duplicate copy of the parent’s <em>listen_socket</em> because the child doesn’t care about accepting new client connections, it cares only about processing requests from the established client&nbsp;connection:</p> <div class="highlight"><pre><span></span><span class="n">listen_socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close child copy</span> </pre></div> <p>I’ll talk about what happens if you do not close duplicate descriptors later in the&nbsp;article.</p> <p>As you can see from the source code of your concurrent server, the sole role of the server parent process now is to accept a new client connection, fork a new child process to handle that client request, and loop over to accept another client connection, and nothing more. The server parent process does not process client requests - its children&nbsp;do.</p> <p>A little aside. What does it mean when we say that two events are&nbsp;concurrent?</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc2_concurrent_events.png" width="580"></p> <p>When we say that two events are concurrent we usually mean that they happen at the same time. As a shorthand that definition is fine, but you should remember the strict&nbsp;definition:</p> <blockquote> <p>Two events are <em>concurrent</em> if you cannot tell by looking at the program which will happen first.<sup id="fnref-2"><a class="footnote-ref" href="#fn-2">2</a></sup></p> </blockquote> <p>Again, it’s time to recap the main ideas and concepts you’ve covered so&nbsp;far.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_checkpoint.png"></p> <blockquote> <ul> <li>The simplest way to write a concurrent server in Unix is to use the <a href="https://docs.python.org/2.7/library/os.html#os.fork">fork()</a> system&nbsp;call</li> <li>When a process forks a new process it becomes a parent process to that newly forked child&nbsp;process.</li> <li>Parent and child share the same file descriptors after the call to <em>fork</em>.</li> <li>The kernel uses descriptor reference counts to decide whether to close the file/socket or&nbsp;not</li> <li>The role of a server parent process: all it does now is accept a new connection from a client, fork a child to handle the client request, and loop over to accept a new client&nbsp;connection.</li> </ul> </blockquote> <p><br/> Let’s see what is going to happen if you don’t close duplicate socket descriptors in the parent and child processes. Here is a modified version of the concurrent server where the server does not close duplicate descriptors, <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3d.py">webserver3d.py</a>:</p> <div class="highlight"><pre><span></span><span class="c1">###########################################################################</span> <span class="c1"># Concurrent server - webserver3d.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1">###########################################################################</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">5</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="n">clients</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="c1"># store the reference otherwise it&#39;s garbage collected</span> <span class="c1"># on the next loop run</span> <span class="n">clients</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># child</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close child copy</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">os</span><span class="o">.</span><span class="n">_exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># child exits here</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># parent</span> <span class="c1"># client_connection.close()</span> <span class="k">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">clients</span><span class="p">))</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>Start the server&nbsp;with:</p> <div class="highlight"><pre><span></span>$ python webserver3d.py </pre></div> <p>Use <em>curl</em> to connect to the&nbsp;server:</p> <div class="highlight"><pre><span></span>$ curl http://localhost:8888/hello Hello, World! </pre></div> <p>Okay, the <em>curl</em> printed the response from the concurrent server but it did not terminate and kept hanging. What is happening here? The server no longer sleeps for 60 seconds: its child process actively handles a client request, closes the client connection and exits, but the client <em>curl</em> still does not&nbsp;terminate.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc3_child_is_active.png" width="640"></p> <p>So why does the <em>curl</em> not terminate? The reason is the duplicate file descriptors. When the child process closed the client connection, the kernel decremented the reference count of that client socket and the count became 1. The server child process exited, but the client socket was not closed by the kernel because the reference count for that socket descriptor was not 0, and, as a result, the termination packet (called <span class="caps">FIN</span> in <span class="caps">TCP</span>/<span class="caps">IP</span> parlance) was not sent to the client and the client stayed on the line, so to speak. There is also another problem. If your long-running server doesn’t close duplicate file descriptors, it will eventually run out of available file&nbsp;descriptors:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc3_out_of_descriptors.png"></p> <p>Stop your server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3d.py">webserver3d.py</a> with <em>Control-C</em> and check out the default resources available to your server process set up by your shell with the shell built-in command <em>ulimit</em>:</p> <div class="highlight"><pre><span></span>$ <span class="nb">ulimit</span> -a core file size <span class="o">(</span>blocks, -c<span class="o">)</span> <span class="m">0</span> data seg size <span class="o">(</span>kbytes, -d<span class="o">)</span> unlimited scheduling priority <span class="o">(</span>-e<span class="o">)</span> <span class="m">0</span> file size <span class="o">(</span>blocks, -f<span class="o">)</span> unlimited pending signals <span class="o">(</span>-i<span class="o">)</span> <span class="m">3842</span> max locked memory <span class="o">(</span>kbytes, -l<span class="o">)</span> <span class="m">64</span> max memory size <span class="o">(</span>kbytes, -m<span class="o">)</span> unlimited open files <span class="o">(</span>-n<span class="o">)</span> <span class="m">1024</span> pipe size <span class="o">(</span><span class="m">512</span> bytes, -p<span class="o">)</span> <span class="m">8</span> POSIX message queues <span class="o">(</span>bytes, -q<span class="o">)</span> <span class="m">819200</span> real-time priority <span class="o">(</span>-r<span class="o">)</span> <span class="m">0</span> stack size <span class="o">(</span>kbytes, -s<span class="o">)</span> <span class="m">8192</span> cpu <span class="nb">time</span> <span class="o">(</span>seconds, -t<span class="o">)</span> unlimited max user processes <span class="o">(</span>-u<span class="o">)</span> <span class="m">3842</span> virtual memory <span class="o">(</span>kbytes, -v<span class="o">)</span> unlimited file locks <span class="o">(</span>-x<span class="o">)</span> unlimited </pre></div> <p>As you can see above, the maximum number of open file descriptors (<em>open files</em>) available to the server process on my Ubuntu box is&nbsp;1024.</p> <p>Now let’s see how your server can run out of available file descriptors if it doesn’t close duplicate descriptors. In an existing or new terminal window, set the maximum number of open file descriptors for your server to be&nbsp;256:</p> <div class="highlight"><pre><span></span>$ <span class="nb">ulimit</span> -n <span class="m">256</span> </pre></div> <p>Start the server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3d.py">webserver3d.py</a> in the same terminal where you’ve just run the <em>$ ulimit -n 256</em>&nbsp;command:</p> <div class="highlight"><pre><span></span>$ python webserver3d.py </pre></div> <p>and use the following client <a href="https://github.com/rspivak/lsbaws/blob/master/part3/client3.py">client3.py</a> to test the&nbsp;server.</p> <div class="highlight"><pre><span></span><span class="c1">#####################################################################</span> <span class="c1"># Test client - client3.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1">#####################################################################</span> <span class="kn">import</span> <span class="nn">argparse</span> <span class="kn">import</span> <span class="nn">errno</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="s1">&#39;localhost&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">GET /hello HTTP/1.1</span> <span class="s2">Host: localhost:8888</span> <span class="s2">&quot;&quot;&quot;</span> <span class="k">def</span> <span class="nf">main</span><span class="p">(</span><span class="n">max_clients</span><span class="p">,</span> <span class="n">max_conns</span><span class="p">):</span> <span class="n">socks</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">client_num</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_clients</span><span class="p">):</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="k">for</span> <span class="n">connection_num</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_conns</span><span class="p">):</span> <span class="n">sock</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">sock</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">sock</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">REQUEST</span><span class="p">)</span> <span class="n">socks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">sock</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">connection_num</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">_exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">(</span> <span class="n">description</span><span class="o">=</span><span class="s1">&#39;Test client for LSBAWS.&#39;</span><span class="p">,</span> <span class="n">formatter_class</span><span class="o">=</span><span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentDefaultsHelpFormatter</span><span class="p">,</span> <span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> <span class="s1">&#39;--max-conns&#39;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">1024</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Maximum number of connections per client.&#39;</span> <span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> <span class="s1">&#39;--max-clients&#39;</span><span class="p">,</span> <span class="nb">type</span><span class="o">=</span><span class="nb">int</span><span class="p">,</span> <span class="n">default</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;Maximum number of clients.&#39;</span> <span class="p">)</span> <span class="n">args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span> <span class="n">main</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">max_clients</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">max_conns</span><span class="p">)</span> </pre></div> <p>In a new terminal window, start the <a href="https://github.com/rspivak/lsbaws/blob/master/part3/client3.py">client3.py</a> and tell it to create 300 simultaneous connections to the&nbsp;server:</p> <div class="highlight"><pre><span></span>$ python client3.py --max-clients<span class="o">=</span><span class="m">300</span> </pre></div> <p>Soon enough your server will explode. Here is a screenshot of the exception on my&nbsp;box:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc3_too_many_fds_exc.png"></p> <p>The lesson is clear - your server should close duplicate descriptors. But even if you close duplicate descriptors, you are not out of the woods yet because there is another problem with your server, and that problem is&nbsp;zombies!</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc3_zombies.png" width="640"></p> <p>Yes, your server code actually creates zombies. Let’s see how. Start up your server&nbsp;again:</p> <div class="highlight"><pre><span></span>$ python webserver3d.py </pre></div> <p>Run the following <em>curl</em> command in another terminal&nbsp;window:</p> <div class="highlight"><pre><span></span>$ curl http://localhost:8888/hello </pre></div> <p>And now run the <em>ps</em> command to show running Python processes. This the example of <em>ps</em> output on my Ubuntu&nbsp;box:</p> <div class="highlight"><pre><span></span>$ ps auxw <span class="p">|</span> grep -i python <span class="p">|</span> grep -v grep vagrant <span class="m">9099</span> <span class="m">0</span>.0 <span class="m">1</span>.2 <span class="m">31804</span> <span class="m">6256</span> pts/0 S+ <span class="m">16</span>:33 <span class="m">0</span>:00 python webserver3d.py vagrant <span class="m">9102</span> <span class="m">0</span>.0 <span class="m">0</span>.0 <span class="m">0</span> <span class="m">0</span> pts/0 Z+ <span class="m">16</span>:33 <span class="m">0</span>:00 <span class="o">[</span>python<span class="o">]</span> &lt;defunct&gt; </pre></div> <p>Do you see the second line above where it says the status of the process with <span class="caps">PID</span> 9102 is <strong>Z+</strong> and the name of the process is <strong>&lt;defunct&gt;</strong>? That’s our zombie there. The problem with zombies is that you can’t kill&nbsp;them.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc3_kill_zombie.png" width="640"></p> <p>Even if you try to kill zombies with <em>$ kill -9 <PID></em>, they will survive. Try it and see for&nbsp;yourself.</p> <p>What is a zombie anyway and why does our server create them? A <em>zombie</em> is a process that has terminated, but its parent has not <em>waited</em> for it and has not received its termination status yet. When a child process exits before its parent, the kernel turns the child process into a zombie and stores some information about the process for its parent process to retrieve later. The information stored is usually the process <span class="caps">ID</span>, the process termination status, and the resource usage by the process. Okay, so zombies serve a purpose, but if your server doesn’t take care of these zombies your system will get clogged up. Let’s see how that happens. First stop your running server and, in a new terminal window, use the <em>ulimit</em> command to set the <em>max user processess</em> to 400(make sure to set <em>open files</em> to a high number, let’s say 500&nbsp;too):</p> <div class="highlight"><pre><span></span>$ <span class="nb">ulimit</span> -u <span class="m">400</span> $ <span class="nb">ulimit</span> -n <span class="m">500</span> </pre></div> <p>Start the server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3d.py">webserver3d.py</a> in the same terminal where you’ve just run the <em>$ ulimit -u 400</em>&nbsp;command:</p> <div class="highlight"><pre><span></span>$ python webserver3d.py </pre></div> <p>In a new terminal window, start the <a href="https://github.com/rspivak/lsbaws/blob/master/part3/client3.py">client3.py</a> and tell it to create 500 simultaneous connections to the&nbsp;server:</p> <div class="highlight"><pre><span></span>$ python client3.py --max-clients<span class="o">=</span><span class="m">500</span> </pre></div> <p>And, again, soon enough your server will blow up with an <strong>OSError: Resource temporarily unavailable</strong> exception when it tries to create a new child process, but it can’t because it has reached the limit for the maximum number of child processes it’s allowed to create. Here is a screenshot of the exception on my&nbsp;box:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc3_resource_unavailable.png"></p> <p>As you can see, zombies create problems for your long-running server if it doesn’t take care of them. I will discuss shortly how the server should deal with that zombie&nbsp;problem.</p> <p>Let’s recap the main points you’ve covered so&nbsp;far:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_checkpoint.png"></p> <blockquote> <ul> <li>If you don’t close duplicate descriptors, the clients won’t terminate because the client connections won’t get&nbsp;closed.</li> <li>If you don’t close duplicate descriptors, your long-running server will eventually run out of available file descriptors (<em>max open files</em>).</li> <li>When you fork a child process and it exits and the parent process doesn’t <em>wait</em> for it and doesn’t collect its termination status, it becomes a <em>zombie</em>.</li> <li>Zombies need to eat something and, in our case, it’s memory. Your server will eventually run out of available processes (<em>max user processes</em>) if it doesn’t take care of&nbsp;zombies.</li> <li>You can’t <em>kill</em> a zombie, you need to <em>wait</em> for&nbsp;it.</li> </ul> </blockquote> <p><br/> So what do you need to do to take care of zombies? You need to modify your server code to <em>wait</em> for zombies to get their termination status. You can do that by modifying your server to call a <a href="https://docs.python.org/2.7/library/os.html#os.wait">wait</a> system call. Unfortunately, that’s far from ideal because if you call <em>wait</em> and there is no terminated child process the call to <em>wait</em> will block your server, effectively preventing your server from handling new client connection requests. Are there any other options? Yes, there are, and one of them is the combination of a <em>signal handler</em> with the <em>wait</em> system&nbsp;call.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc4_signaling.png" width="240"></p> <p>Here is how it works. When a child process exits, the kernel sends a <em><span class="caps">SIGCHLD</span></em> signal. The parent process can set up a signal handler to be asynchronously notified of that <em><span class="caps">SIGCHLD</span></em> event and then it can <em>wait</em> for the child to collect its termination status, thus preventing the zombie process from being left&nbsp;around.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part_conc4_sigchld_async.png" width="640"></p> <p>By the way, an asynchronous event means that the parent process doesn’t know ahead of time that the event is going to&nbsp;happen.</p> <p>Modify your server code to set up a <em><span class="caps">SIGCHLD</span></em> event handler and <em>wait</em> for a terminated child in the event handler. The code is available in <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3e.py">webserver3e.py</a>&nbsp;file:</p> <div class="highlight"><pre><span></span><span class="c1">###########################################################################</span> <span class="c1"># Concurrent server - webserver3e.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1">###########################################################################</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">signal</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="kn">import</span> <span class="nn">time</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">5</span> <span class="k">def</span> <span class="nf">grim_reaper</span><span class="p">(</span><span class="n">signum</span><span class="p">,</span> <span class="n">frame</span><span class="p">):</span> <span class="n">pid</span><span class="p">,</span> <span class="n">status</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span> <span class="s1">&#39;Child {pid} terminated with status {status}&#39;</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">pid</span><span class="o">=</span><span class="n">pid</span><span class="p">,</span> <span class="n">status</span><span class="o">=</span><span class="n">status</span><span class="p">)</span> <span class="p">)</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="c1"># sleep to allow the parent to loop over to &#39;accept&#39; and block there</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGCHLD</span><span class="p">,</span> <span class="n">grim_reaper</span><span class="p">)</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># child</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close child copy</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">os</span><span class="o">.</span><span class="n">_exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># parent</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>Start the&nbsp;server:</p> <div class="highlight"><pre><span></span>$ python webserver3e.py </pre></div> <p>Use your old friend <em>curl</em> to send a request to the modified concurrent&nbsp;server:</p> <div class="highlight"><pre><span></span>$ curl http://localhost:8888/hello </pre></div> <p>Look at the&nbsp;server:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc4_eintr.png"></p> <p>What just happened? The call to <em>accept</em> failed with the error <em><span class="caps">EINTR</span></em>.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc4_eintr_error.png" width="640"></p> <p>The parent process was blocked in <em>accept</em> call when the child process exited which caused <em><span class="caps">SIGCHLD</span></em> event, which in turn activated the signal handler and when the signal handler finished the <em>accept</em> system call got&nbsp;interrupted:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc4_eintr_accept.png" width="640"></p> <p>Don’t worry, it’s a pretty simple problem to solve, though. All you need to do is to re-start the <em>accept</em> system call. Here is the modified version of the server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3f.py">webserver3f.py</a> that handles that&nbsp;problem:</p> <div class="highlight"><pre><span></span><span class="c1">###########################################################################</span> <span class="c1"># Concurrent server - webserver3f.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1">###########################################################################</span> <span class="kn">import</span> <span class="nn">errno</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">signal</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">1024</span> <span class="k">def</span> <span class="nf">grim_reaper</span><span class="p">(</span><span class="n">signum</span><span class="p">,</span> <span class="n">frame</span><span class="p">):</span> <span class="n">pid</span><span class="p">,</span> <span class="n">status</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGCHLD</span><span class="p">,</span> <span class="n">grim_reaper</span><span class="p">)</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="k">except</span> <span class="ne">IOError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span> <span class="n">code</span><span class="p">,</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">args</span> <span class="c1"># restart &#39;accept&#39; if it was interrupted</span> <span class="k">if</span> <span class="n">code</span> <span class="o">==</span> <span class="n">errno</span><span class="o">.</span><span class="n">EINTR</span><span class="p">:</span> <span class="k">continue</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># child</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close child copy</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">os</span><span class="o">.</span><span class="n">_exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># parent</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close parent copy and loop over</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>Start the updated server <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3f.py">webserver3f.py</a>:</p> <div class="highlight"><pre><span></span>$ python webserver3f.py </pre></div> <p>Use <em>curl</em> to send a request to the modified concurrent&nbsp;server:</p> <div class="highlight"><pre><span></span>$ curl http://localhost:8888/hello </pre></div> <p>See? No <em><span class="caps">EINTR</span></em> exceptions any more. Now, verify that there are no more zombies either and that your <em><span class="caps">SIGCHLD</span></em> event handler with <em>wait</em> call took care of terminated children. To do that, just run the <em>ps</em> command and see for yourself that there are no more Python processes with <strong>Z+</strong> status (no more <strong>&lt;defunct&gt;</strong> processes). Great! It feels safe without zombies running&nbsp;around.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_checkpoint.png"></p> <blockquote> <ul> <li>If you <em>fork</em> a child and don’t wait for it, it becomes a <em>zombie</em>.</li> <li>Use the <em><span class="caps">SIGCHLD</span></em> event handler to asynchronously <em>wait</em> for a terminated child to get its termination&nbsp;status</li> <li>When using an event handler you need to keep in mind that system calls might get interrupted and you need to be prepared for that&nbsp;scenario</li> </ul> </blockquote> <p><br/> Okay, so far so good. No problems, right? Well, almost. Try your <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3f.py">webserver3f.py</a> again, but instead of making one request with <em>curl</em> use <a href="https://github.com/rspivak/lsbaws/blob/master/part3/client3.py">client3.py</a> to create 128 simultaneous&nbsp;connections:</p> <div class="highlight"><pre><span></span>$ python client3.py --max-clients <span class="m">128</span> </pre></div> <p>Now run the <em>ps</em> command&nbsp;again</p> <div class="highlight"><pre><span></span>$ ps auxw <span class="p">|</span> grep -i python <span class="p">|</span> grep -v grep </pre></div> <p>and see that, oh boy, zombies are back&nbsp;again!</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc5_zombies_again.png" width="640"></p> <p>What went wrong this time? When you ran 128 simultaneous clients and established 128 connections, the child processes on the server handled the requests and exited almost at the same time causing a flood of <em><span class="caps">SIGCHLD</span></em> signals being sent to the parent process. The problem is that the signals are not queued and your server process missed several signals, which left several zombies running around&nbsp;unattended:</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc5_signals_not_queued.png" width="640"></p> <p>The solution to the problem is to set up a <em><span class="caps">SIGCHLD</span></em> event handler but instead of <em>wait</em> use a <a href="https://docs.python.org/2.7/library/os.html#os.waitpid">waitpid</a> system call with a <em><span class="caps">WNOHANG</span></em> option in a loop to make sure that all terminated child processes are taken care of. Here is the modified server code, <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3g.py">webserver3g.py</a>:</p> <div class="highlight"><pre><span></span><span class="c1">###########################################################################</span> <span class="c1"># Concurrent server - webserver3g.py #</span> <span class="c1"># #</span> <span class="c1"># Tested with Python 2.7.9 &amp; Python 3.4 on Ubuntu 14.04 &amp; Mac OS X #</span> <span class="c1">###########################################################################</span> <span class="kn">import</span> <span class="nn">errno</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">signal</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">REQUEST_QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">1024</span> <span class="k">def</span> <span class="nf">grim_reaper</span><span class="p">(</span><span class="n">signum</span><span class="p">,</span> <span class="n">frame</span><span class="p">):</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="n">pid</span><span class="p">,</span> <span class="n">status</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">waitpid</span><span class="p">(</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="c1"># Wait for any child process</span> <span class="n">os</span><span class="o">.</span><span class="n">WNOHANG</span> <span class="c1"># Do not block and return EWOULDBLOCK error</span> <span class="p">)</span> <span class="k">except</span> <span class="ne">OSError</span><span class="p">:</span> <span class="k">return</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># no more zombies</span> <span class="k">return</span> <span class="k">def</span> <span class="nf">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">):</span> <span class="n">request</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">():</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="n">REQUEST_QUEUE_SIZE</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;Serving HTTP on port {port} ...&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">port</span><span class="o">=</span><span class="n">PORT</span><span class="p">))</span> <span class="n">signal</span><span class="o">.</span><span class="n">signal</span><span class="p">(</span><span class="n">signal</span><span class="o">.</span><span class="n">SIGCHLD</span><span class="p">,</span> <span class="n">grim_reaper</span><span class="p">)</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="k">try</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="k">except</span> <span class="ne">IOError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span> <span class="n">code</span><span class="p">,</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">e</span><span class="o">.</span><span class="n">args</span> <span class="c1"># restart &#39;accept&#39; if it was interrupted</span> <span class="k">if</span> <span class="n">code</span> <span class="o">==</span> <span class="n">errno</span><span class="o">.</span><span class="n">EINTR</span><span class="p">:</span> <span class="k">continue</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">fork</span><span class="p">()</span> <span class="k">if</span> <span class="n">pid</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="c1"># child</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close child copy</span> <span class="n">handle_request</span><span class="p">(</span><span class="n">client_connection</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">os</span><span class="o">.</span><span class="n">_exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="c1"># parent</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="c1"># close parent copy and loop over</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>Start the&nbsp;server:</p> <div class="highlight"><pre><span></span>$ python webserver3g.py </pre></div> <p>Use the test client <a href="https://github.com/rspivak/lsbaws/blob/master/part3/client3.py">client3.py</a>:</p> <div class="highlight"><pre><span></span>$ python client3.py --max-clients <span class="m">128</span> </pre></div> <p>And now verify that there are no more zombies. Yay! Life is good without zombies&nbsp;:)</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_conc5_no_zombies.png" width="640"></p> <p>Congratulations! It&#8217;s been a pretty long journey but I hope you liked it. Now you have your own simple concurrent server and the code can serve as a foundation for your further work towards a production grade Web&nbsp;server.</p> <p>I’ll leave it as an exercise for you to update the <span class="caps">WSGI</span> server from <a href="http://ruslanspivak.com/lsbaws-part2/" title="Part 2">Part 2</a> and make it concurrent. You can find the modified version <a href="https://github.com/rspivak/lsbaws/blob/master/part3/webserver3h.py">here</a>. But look at my code only after you’ve implemented your own version. You have all the necessary information to do that. So go and just do it&nbsp;:)</p> <p>What’s next? As Josh Billings&nbsp;said,</p> <blockquote> <p><em><span class="dquo">&#8220;</span>Be like a postage stamp — stick to one thing until you get&nbsp;there.&#8221;</em></p> </blockquote> <p>Start mastering the basics. Question what you already know. And always dig&nbsp;deeper.</p> <p><img alt="" src="https://ruslanspivak.com/lsbaws-part3/lsbaws_part3_dig_deeper.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>If you learn only methods, you’ll be tied to your methods. But if you learn principles, you can devise your own methods.&#8221; &#8212;Ralph Waldo&nbsp;Emerson</em></p> </blockquote> <p>Below is a list of books that I’ve drawn on for most of the material in this article. They will help you broaden and deepen your knowledge about the topics I’ve covered. I highly recommend you to get those books somehow: borrow them from your friends, check them out from your local library, or just buy them on Amazon. They are the keepers(links are affiliate&nbsp;links):</p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/0131411551/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0131411551&linkCode=as2&tag=russblo0b-20&linkId=2F4NYRBND566JJQL">Unix Network Programming, Volume 1: The Sockets Networking <span class="caps">API</span> (3rd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0131411551" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321637739/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321637739&linkCode=as2&tag=russblo0b-20&linkId=3ZYAKB537G6TM22J">Advanced Programming in the <span class="caps">UNIX</span> Environment, 3rd Edition</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321637739" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1593272200/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1593272200&linkCode=as2&tag=russblo0b-20&linkId=CHFOMNYXN35I2MON">The Linux Programming Interface: A Linux and <span class="caps">UNIX</span> System Programming Handbook</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1593272200" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321336313/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321336313&linkCode=as2&tag=russblo0b-20&linkId=K467DRFYMXJ5RWAY"><span class="caps">TCP</span>/<span class="caps">IP</span> Illustrated, Volume 1: The Protocols (2nd Edition) (Addison-Wesley Professional Computing Series)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321336313" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1441418687/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1441418687&linkCode=as2&tag=russblo0b-20&linkId=QFOAWARN62OWTWUG">The Little Book of <span class="caps">SEMAPHORES</span> (2nd Edition): The Ins and Outs of Concurrency Control and Common Mistakes</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1441418687" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />. Also available for free on the author’s site <a href="http://greenteapress.com/semaphores/">here</a>.</p> </li> </ol> <p><br/></p> <p></p> <p><br/> <strong>All articles in this&nbsp;series:</strong></p> <ul> <li><a href="https://ruslanspivak.com/lsbaws-part1/">Let&#8217;s Build A Web Server. Part&nbsp;1.</a></li> <li><a href="https://ruslanspivak.com/lsbaws-part2/">Let&#8217;s Build A Web Server. Part&nbsp;2.</a></li> <li><a href="https://ruslanspivak.com/lsbaws-part3/">Let&#8217;s Build A Web Server. Part&nbsp;3.</a></li> </ul> <div class="footnote"> <hr> <ol> <li id="fn-1"> <p><a href="http://www.amazon.com/gp/product/0131411551/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0131411551&linkCode=as2&tag=russblo0b-20&linkId=2F4NYRBND566JJQL">Unix Network Programming, Volume 1: The Sockets Networking <span class="caps">API</span> (3rd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0131411551" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />&#160;<a class="footnote-backref" href="#fnref-1" title="Jump back to footnote 1 in the text">&#8617;</a><a class="footnote-backref" href="#fnref2-1" title="Jump back to footnote 1 in the text">&#8617;</a></p> </li> <li id="fn-2"> <p><a href="http://www.amazon.com/gp/product/1441418687/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1441418687&linkCode=as2&tag=russblo0b-20&linkId=QFOAWARN62OWTWUG">The Little Book of <span class="caps">SEMAPHORES</span> (2nd Edition): The Ins and Outs of Concurrency Control and Common Mistakes</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1441418687" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />.&#160;<a class="footnote-backref" href="#fnref-2" title="Jump back to footnote 2 in the text">&#8617;</a></p> </li> </ol> </div>Let’s Build A Web Server. Part 2.2015-04-06T07:00:00-04:002015-04-06T07:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-04-06:/lsbaws-part2/<p>Remember, in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a> I asked you a question: &#8220;How do you run a Django application, Flask application, and Pyramid application under your freshly minted Web server without making a single change to the server to accommodate all those different Web frameworks?&#8221; Read on to find out the&nbsp;answer.</p> <p>In …</p><p>Remember, in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a> I asked you a question: &#8220;How do you run a Django application, Flask application, and Pyramid application under your freshly minted Web server without making a single change to the server to accommodate all those different Web frameworks?&#8221; Read on to find out the&nbsp;answer.</p> <p>In the past, your choice of a Python Web framework would limit your choice of usable Web servers, and vice versa. If the framework and the server were designed to work together, then you were&nbsp;okay:</p> <p><img alt="Server Framework Fit" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_before_wsgi.png" width="640"></p> <p>But you could have been faced (and maybe you were) with the following problem when trying to combine a server and a framework that weren’t designed to work&nbsp;together:</p> <p><img alt="Server Framework Clash" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_after_wsgi.png" width="640"></p> <p>Basically you had to use what worked together and not what you might have wanted to&nbsp;use.</p> <p>So, how do you then make sure that you can run your Web server with multiple Web frameworks without making code changes either to the Web server or to the Web frameworks? And the answer to that problem became the <strong>Python Web Server Gateway Interface</strong> (or <a href="https://www.python.org/dev/peps/pep-0333/" title="WSGI"><span class="caps">WSGI</span></a> for short, pronounced <em>&#8220;wizgy&#8221;</em>).</p> <p><img alt="WSGI Interface" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_wsgi_idea.png" width="480"></p> <p><a href="https://www.python.org/dev/peps/pep-0333/" title="WSGI"><span class="caps">WSGI</span></a> allowed developers to separate choice of a Web framework from choice of a Web server. Now you can actually mix and match Web servers and Web frameworks and choose a pairing that suits your needs. You can run <a href="https://www.djangoproject.com/" title="Django">Django</a>, <a href="http://flask.pocoo.org/" title="Flask">Flask</a>, or <a href="http://trypyramid.com/" title="Pyramid">Pyramid</a>, for example, with <a href="http://gunicorn.org/" title="Gunicorn">Gunicorn</a> or <a href="http://uwsgi-docs.readthedocs.org" title="uWSGI">Nginx/uWSGI</a> or <a href="http://waitress.readthedocs.org" title="Waitress">Waitress</a>. Real mix and match, thanks to the <span class="caps">WSGI</span> support in both servers and&nbsp;frameworks:</p> <p><img alt="Mix &amp; Match" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_wsgi_interop.png" width="640"></p> <p>So, <a href="https://www.python.org/dev/peps/pep-0333/" title="WSGI"><span class="caps">WSGI</span></a> is the answer to the question I asked you in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a> and repeated at the beginning of this article. Your Web server must implement the server portion of a <span class="caps">WSGI</span> interface and all modern Python Web Frameworks already implement the framework side of the <span class="caps">WSGI</span> interface, which allows you to use them with your Web server without ever modifying your server&#8217;s code to accommodate a particular Web&nbsp;framework.</p> <p>Now you know that <span class="caps">WSGI</span> support by Web servers and Web frameworks allows you to choose a pairing that suits you, but it is also beneficial to server and framework developers because they can focus on their preferred area of specialization and not step on each other’s toes. Other languages have similar interfaces too: Java, for example, has <a href="http://en.wikipedia.org/wiki/Java_servlet" title="Servlet API">Servlet <span class="caps">API</span></a> and Ruby has <a href="http://en.wikipedia.org/wiki/Rack_%28web_server_interface%29" title="Rack">Rack</a>.</p> <p>It’s all good, but I bet you are saying: &#8220;Show me the code!&#8221; Okay, take a look at this pretty minimalistic <span class="caps">WSGI</span> server&nbsp;implementation:</p> <div class="highlight"><pre><span></span><span class="c1"># Tested with Python 3.7+ (Mac OS X)</span> <span class="kn">import</span> <span class="nn">io</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="k">class</span> <span class="nc">WSGIServer</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="n">address_family</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span> <span class="n">socket_type</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span> <span class="n">request_queue_size</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">server_address</span><span class="p">):</span> <span class="c1"># Create a listening socket</span> <span class="bp">self</span><span class="o">.</span><span class="n">listen_socket</span> <span class="o">=</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span> <span class="bp">self</span><span class="o">.</span><span class="n">address_family</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">socket_type</span> <span class="p">)</span> <span class="c1"># Allow to reuse the same address</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># Bind</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">server_address</span><span class="p">)</span> <span class="c1"># Activate</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">request_queue_size</span><span class="p">)</span> <span class="c1"># Get server host name and port</span> <span class="n">host</span><span class="p">,</span> <span class="n">port</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">listen_socket</span><span class="o">.</span><span class="n">getsockname</span><span class="p">()[:</span><span class="mi">2</span><span class="p">]</span> <span class="bp">self</span><span class="o">.</span><span class="n">server_name</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">getfqdn</span><span class="p">(</span><span class="n">host</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">server_port</span> <span class="o">=</span> <span class="n">port</span> <span class="c1"># Return headers set by Web framework/Web application</span> <span class="bp">self</span><span class="o">.</span><span class="n">headers_set</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">def</span> <span class="nf">set_app</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">application</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">application</span> <span class="o">=</span> <span class="n">application</span> <span class="k">def</span> <span class="nf">serve_forever</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">listen_socket</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="c1"># New client connection</span> <span class="bp">self</span><span class="o">.</span><span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="c1"># Handle one request and close the client connection. Then</span> <span class="c1"># loop over to wait for another client connection</span> <span class="bp">self</span><span class="o">.</span><span class="n">handle_one_request</span><span class="p">()</span> <span class="k">def</span> <span class="nf">handle_one_request</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">request_data</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="bp">self</span><span class="o">.</span><span class="n">request_data</span> <span class="o">=</span> <span class="n">request_data</span> <span class="o">=</span> <span class="n">request_data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">&#39;utf-8&#39;</span><span class="p">)</span> <span class="c1"># Print formatted request data a la &#39;curl -v&#39;</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span> <span class="n">f</span><span class="s1">&#39;&lt; {line}</span><span class="se">\n</span><span class="s1">&#39;</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">request_data</span><span class="o">.</span><span class="n">splitlines</span><span class="p">()</span> <span class="p">))</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_request</span><span class="p">(</span><span class="n">request_data</span><span class="p">)</span> <span class="c1"># Construct environment dictionary using request data</span> <span class="n">env</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_environ</span><span class="p">()</span> <span class="c1"># It&#39;s time to call our application callable and get</span> <span class="c1"># back a result that will become HTTP response body</span> <span class="n">result</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">application</span><span class="p">(</span><span class="n">env</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">start_response</span><span class="p">)</span> <span class="c1"># Construct a response and send it back to the client</span> <span class="bp">self</span><span class="o">.</span><span class="n">finish_response</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">def</span> <span class="nf">parse_request</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span> <span class="n">request_line</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">splitlines</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span> <span class="n">request_line</span> <span class="o">=</span> <span class="n">request_line</span><span class="o">.</span><span class="n">rstrip</span><span class="p">(</span><span class="s1">&#39;</span><span class="se">\r\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="c1"># Break down the request line into components</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">request_method</span><span class="p">,</span> <span class="c1"># GET</span> <span class="bp">self</span><span class="o">.</span><span class="n">path</span><span class="p">,</span> <span class="c1"># /hello</span> <span class="bp">self</span><span class="o">.</span><span class="n">request_version</span> <span class="c1"># HTTP/1.1</span> <span class="p">)</span> <span class="o">=</span> <span class="n">request_line</span><span class="o">.</span><span class="n">split</span><span class="p">()</span> <span class="k">def</span> <span class="nf">get_environ</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">env</span> <span class="o">=</span> <span class="p">{}</span> <span class="c1"># The following code snippet does not follow PEP8 conventions</span> <span class="c1"># but it&#39;s formatted the way it is for demonstration purposes</span> <span class="c1"># to emphasize the required variables and their values</span> <span class="c1">#</span> <span class="c1"># Required WSGI variables</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.version&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.url_scheme&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="s1">&#39;http&#39;</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.input&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">request_data</span><span class="p">)</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.errors&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">stderr</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.multithread&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="bp">False</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.multiprocess&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="bp">False</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;wsgi.run_once&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="bp">False</span> <span class="c1"># Required CGI variables</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;REQUEST_METHOD&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">request_method</span> <span class="c1"># GET</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;PATH_INFO&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">path</span> <span class="c1"># /hello</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;SERVER_NAME&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">server_name</span> <span class="c1"># localhost</span> <span class="n">env</span><span class="p">[</span><span class="s1">&#39;SERVER_PORT&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">server_port</span><span class="p">)</span> <span class="c1"># 8888</span> <span class="k">return</span> <span class="n">env</span> <span class="k">def</span> <span class="nf">start_response</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span><span class="p">,</span> <span class="n">exc_info</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="c1"># Add necessary server headers</span> <span class="n">server_headers</span> <span class="o">=</span> <span class="p">[</span> <span class="p">(</span><span class="s1">&#39;Date&#39;</span><span class="p">,</span> <span class="s1">&#39;Mon, 15 Jul 2019 5:54:48 GMT&#39;</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;Server&#39;</span><span class="p">,</span> <span class="s1">&#39;WSGIServer 0.2&#39;</span><span class="p">),</span> <span class="p">]</span> <span class="bp">self</span><span class="o">.</span><span class="n">headers_set</span> <span class="o">=</span> <span class="p">[</span><span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span> <span class="o">+</span> <span class="n">server_headers</span><span class="p">]</span> <span class="c1"># To adhere to WSGI specification the start_response must return</span> <span class="c1"># a &#39;write&#39; callable. We simplicity&#39;s sake we&#39;ll ignore that detail</span> <span class="c1"># for now.</span> <span class="c1"># return self.finish_response</span> <span class="k">def</span> <span class="nf">finish_response</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">result</span><span class="p">):</span> <span class="k">try</span><span class="p">:</span> <span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">headers_set</span> <span class="n">response</span> <span class="o">=</span> <span class="n">f</span><span class="s1">&#39;HTTP/1.1 {status}</span><span class="se">\r\n</span><span class="s1">&#39;</span> <span class="k">for</span> <span class="n">header</span> <span class="ow">in</span> <span class="n">response_headers</span><span class="p">:</span> <span class="n">response</span> <span class="o">+=</span> <span class="s1">&#39;{0}: {1}</span><span class="se">\r\n</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="o">*</span><span class="n">header</span><span class="p">)</span> <span class="n">response</span> <span class="o">+=</span> <span class="s1">&#39;</span><span class="se">\r\n</span><span class="s1">&#39;</span> <span class="k">for</span> <span class="n">data</span> <span class="ow">in</span> <span class="n">result</span><span class="p">:</span> <span class="n">response</span> <span class="o">+=</span> <span class="n">data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">&#39;utf-8&#39;</span><span class="p">)</span> <span class="c1"># Print formatted response data a la &#39;curl -v&#39;</span> <span class="k">print</span><span class="p">(</span><span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span> <span class="n">f</span><span class="s1">&#39;&gt; {line}</span><span class="se">\n</span><span class="s1">&#39;</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">response</span><span class="o">.</span><span class="n">splitlines</span><span class="p">()</span> <span class="p">))</span> <span class="n">response_bytes</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">encode</span><span class="p">()</span> <span class="bp">self</span><span class="o">.</span><span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">response_bytes</span><span class="p">)</span> <span class="k">finally</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">)</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="k">def</span> <span class="nf">make_server</span><span class="p">(</span><span class="n">server_address</span><span class="p">,</span> <span class="n">application</span><span class="p">):</span> <span class="n">server</span> <span class="o">=</span> <span class="n">WSGIServer</span><span class="p">(</span><span class="n">server_address</span><span class="p">)</span> <span class="n">server</span><span class="o">.</span><span class="n">set_app</span><span class="p">(</span><span class="n">application</span><span class="p">)</span> <span class="k">return</span> <span class="n">server</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">2</span><span class="p">:</span> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="s1">&#39;Provide a WSGI application object as module:callable&#39;</span><span class="p">)</span> <span class="n">app_path</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="n">module</span><span class="p">,</span> <span class="n">application</span> <span class="o">=</span> <span class="n">app_path</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">)</span> <span class="n">module</span> <span class="o">=</span> <span class="nb">__import__</span><span class="p">(</span><span class="n">module</span><span class="p">)</span> <span class="n">application</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">module</span><span class="p">,</span> <span class="n">application</span><span class="p">)</span> <span class="n">httpd</span> <span class="o">=</span> <span class="n">make_server</span><span class="p">(</span><span class="n">SERVER_ADDRESS</span><span class="p">,</span> <span class="n">application</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;WSGIServer: Serving HTTP on port {PORT} ...</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">)</span> <span class="n">httpd</span><span class="o">.</span><span class="n">serve_forever</span><span class="p">()</span> </pre></div> <p>It’s definitely bigger than the server code in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a>, but it’s also small enough (just under 150 lines) for you to understand without getting bogged down in details. The above server also does more - it can run your basic Web application written with your beloved Web framework, be it Pyramid, Flask, Django, or some other Python <span class="caps">WSGI</span>&nbsp;framework.</p> <p>Don’t believe me? Try it and see for yourself. Save the above code as <em>webserver2.py</em> or download it directly from <a href="https://github.com/rspivak/lsbaws/blob/master/part2/webserver2.py">GitHub</a>. If you try to run it without any parameters it’s going to complain and&nbsp;exit.</p> <div class="highlight"><pre><span></span>$ python webserver2.py Provide a WSGI application object as module:callable </pre></div> <p>It really wants to serve your Web application and that’s where the fun begins. To run the server the only thing you need installed is Python (Python 3.7+, to be exact). But to run applications written with Pyramid, Flask, and Django you need to install those frameworks first. Let’s install all three of them. My preferred method is by using <a href="https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments" target="_blank">venv</a> (it is available by default in Python 3.3 and later). Just follow the steps below to create and activate a virtual environment and then install all three Web&nbsp;frameworks.</p> <div class="highlight"><pre><span></span>$ python3 -m venv lsbaws $ ls lsbaws bin include lib pyvenv.cfg $ <span class="nb">source</span> lsbaws/bin/activate <span class="o">(</span>lsbaws<span class="o">)</span> $ pip install -U pip <span class="o">(</span>lsbaws<span class="o">)</span> $ pip install pyramid <span class="o">(</span>lsbaws<span class="o">)</span> $ pip install flask <span class="o">(</span>lsbaws<span class="o">)</span> $ pip install django </pre></div> <p>At this point you need to create a Web application. Let’s start with <a href="http://trypyramid.com/" title="Pyramid">Pyramid</a> first. Save the following code as <em>pyramidapp.py</em> to the same directory where you saved <em>webserver2.py</em> or download the file directly from <a href="https://github.com/rspivak/lsbaws/blob/master/part2/pyramidapp.py">GitHub</a>:</p> <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyramid.config</span> <span class="kn">import</span> <span class="n">Configurator</span> <span class="kn">from</span> <span class="nn">pyramid.response</span> <span class="kn">import</span> <span class="n">Response</span> <span class="k">def</span> <span class="nf">hello_world</span><span class="p">(</span><span class="n">request</span><span class="p">):</span> <span class="k">return</span> <span class="n">Response</span><span class="p">(</span> <span class="s1">&#39;Hello world from Pyramid!</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">content_type</span><span class="o">=</span><span class="s1">&#39;text/plain&#39;</span><span class="p">,</span> <span class="p">)</span> <span class="n">config</span> <span class="o">=</span> <span class="n">Configurator</span><span class="p">()</span> <span class="n">config</span><span class="o">.</span><span class="n">add_route</span><span class="p">(</span><span class="s1">&#39;hello&#39;</span><span class="p">,</span> <span class="s1">&#39;/hello&#39;</span><span class="p">)</span> <span class="n">config</span><span class="o">.</span><span class="n">add_view</span><span class="p">(</span><span class="n">hello_world</span><span class="p">,</span> <span class="n">route_name</span><span class="o">=</span><span class="s1">&#39;hello&#39;</span><span class="p">)</span> <span class="n">app</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">make_wsgi_app</span><span class="p">()</span> </pre></div> <p>Now you’re ready to serve your Pyramid application with your very own Web&nbsp;server:</p> <div class="highlight"><pre><span></span><span class="o">(</span>lsbaws<span class="o">)</span> $ python webserver2.py pyramidapp:app WSGIServer: Serving HTTP on port <span class="m">8888</span> ... </pre></div> <p>You just told your server to load the <em>&#8216;app&#8217;</em> callable from the python module <em>&#8216;pyramidapp&#8217;</em> Your server is now ready to take requests and forward them to your Pyramid application. The application only handles one route now: the <em>/hello</em> route. Type <a href="http://localhost:8888/hello">http://localhost:8888/hello</a> address into your browser, press Enter, and observe the&nbsp;result:</p> <p><img alt="Pyramid" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_browser_pyramid.png"></p> <p>You can also test the server on the command line using the <em>&#8216;curl&#8217;</em>&nbsp;utility:</p> <div class="highlight"><pre><span></span>$ curl -v http://localhost:8888/hello ... </pre></div> <p>Check what the server and <em>curl</em> prints to standard&nbsp;output.</p> <p>Now onto <a href="http://flask.pocoo.org/" title="Flask">Flask</a>. Let’s follow the same&nbsp;steps.</p> <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span> <span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Response</span> <span class="n">flask_app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="s1">&#39;flaskapp&#39;</span><span class="p">)</span> <span class="nd">@flask_app.route</span><span class="p">(</span><span class="s1">&#39;/hello&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">hello_world</span><span class="p">():</span> <span class="k">return</span> <span class="n">Response</span><span class="p">(</span> <span class="s1">&#39;Hello world from Flask!</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="n">mimetype</span><span class="o">=</span><span class="s1">&#39;text/plain&#39;</span> <span class="p">)</span> <span class="n">app</span> <span class="o">=</span> <span class="n">flask_app</span><span class="o">.</span><span class="n">wsgi_app</span> </pre></div> <p>Save the above code as <em>flaskapp.py</em> or download it from <a href="https://github.com/rspivak/lsbaws/blob/master/part2/flaskapp.py">GitHub</a> and run the server&nbsp;as:</p> <div class="highlight"><pre><span></span><span class="o">(</span>lsbaws<span class="o">)</span> $ python webserver2.py flaskapp:app WSGIServer: Serving HTTP on port <span class="m">8888</span> ... </pre></div> <p>Now type in the <a href="http://localhost:8888/hello">http://localhost:8888/hello</a> into your browser and press&nbsp;Enter:</p> <p><img alt="Flask" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_browser_flask.png"></p> <p>Again, try <em>&#8216;curl&#8217;</em> and see for yourself that the server returns a message generated by the Flask&nbsp;application:</p> <div class="highlight"><pre><span></span>$ curl -v http://localhost:8888/hello ... </pre></div> <p>Can the server also handle a <a href="https://www.djangoproject.com/" title="Django">Django</a> application? Try it out! It’s a little bit more involved, though, and I would recommend cloning the whole repo and use <a href="https://github.com/rspivak/lsbaws/blob/master/part2/djangoapp.py">djangoapp.py</a>, which is part of the <a href="https://github.com/rspivak/lsbaws/">GitHub repository</a>. Here is the source code which basically adds the Django <em>&#8216;helloworld&#8217;</em> project (pre-created using Django&#8217;s <em>django-admin.py startproject</em> command) to the current Python path and then imports the project&#8217;s <span class="caps">WSGI</span>&nbsp;application.</p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">sys</span> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s1">&#39;./helloworld&#39;</span><span class="p">)</span> <span class="kn">from</span> <span class="nn">helloworld</span> <span class="kn">import</span> <span class="n">wsgi</span> <span class="n">app</span> <span class="o">=</span> <span class="n">wsgi</span><span class="o">.</span><span class="n">application</span> </pre></div> <p>Save the above code as <em>djangoapp.py</em> and run the Django application with your Web&nbsp;server:</p> <div class="highlight"><pre><span></span><span class="o">(</span>lsbaws<span class="o">)</span> $ python webserver2.py djangoapp:app WSGIServer: Serving HTTP on port <span class="m">8888</span> ... </pre></div> <p>Type in the following address and press&nbsp;Enter:</p> <p><img alt="Django" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_browser_django.png"></p> <p>And as you&#8217;ve already done a couple of times before, you can test it on the command line, too, and confirm that it&#8217;s the Django application that handles your requests this time&nbsp;around:</p> <div class="highlight"><pre><span></span>$ curl -v http://localhost:8888/hello ... </pre></div> <p>Did you try it? Did you make sure the server works with those three frameworks? If not, then please do so. Reading is important, but this series is about rebuilding and that means you need to get your hands dirty. Go and try it. I will wait for you, don’t worry. No seriously, you must try it and, better yet, retype everything yourself and make sure that it works as&nbsp;expected.</p> <p>Okay, you’ve experienced the power of <span class="caps">WSGI</span>: it allows you to mix and match your Web servers and Web frameworks. <span class="caps">WSGI</span> provides a minimal interface between Python Web servers and Python Web Frameworks. It’s very simple and it’s easy to implement on both the server and the framework side. The following code snippet shows the server and the framework side of the&nbsp;interface:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">run_application</span><span class="p">(</span><span class="n">application</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;Server code.&quot;&quot;&quot;</span> <span class="c1"># This is where an application/framework stores</span> <span class="c1"># an HTTP status and HTTP response headers for the server</span> <span class="c1"># to transmit to the client</span> <span class="n">headers_set</span> <span class="o">=</span> <span class="p">[]</span> <span class="c1"># Environment dictionary with WSGI/CGI variables</span> <span class="n">environ</span> <span class="o">=</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">start_response</span><span class="p">(</span><span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span><span class="p">,</span> <span class="n">exc_info</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span> <span class="n">headers_set</span><span class="p">[:]</span> <span class="o">=</span> <span class="p">[</span><span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span><span class="p">]</span> <span class="c1"># Server invokes the ‘application&#39; callable and gets back the</span> <span class="c1"># response body</span> <span class="n">result</span> <span class="o">=</span> <span class="n">application</span><span class="p">(</span><span class="n">environ</span><span class="p">,</span> <span class="n">start_response</span><span class="p">)</span> <span class="c1"># Server builds an HTTP response and transmits it to the client</span> <span class="o">...</span> <span class="k">def</span> <span class="nf">app</span><span class="p">(</span><span class="n">environ</span><span class="p">,</span> <span class="n">start_response</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;A barebones WSGI app.&quot;&quot;&quot;</span> <span class="n">start_response</span><span class="p">(</span><span class="s1">&#39;200 OK&#39;</span><span class="p">,</span> <span class="p">[(</span><span class="s1">&#39;Content-Type&#39;</span><span class="p">,</span> <span class="s1">&#39;text/plain&#39;</span><span class="p">)])</span> <span class="k">return</span> <span class="p">[</span><span class="sa">b</span><span class="s1">&#39;Hello world!&#39;</span><span class="p">]</span> <span class="n">run_application</span><span class="p">(</span><span class="n">app</span><span class="p">)</span> </pre></div> <p>Here is how it&nbsp;works:</p> <ol> <li>The framework provides an <em>&#8216;application&#8217;</em> callable (The <span class="caps">WSGI</span> specification doesn&#8217;t prescribe how that should be&nbsp;implemented)</li> <li>The server invokes the <em>&#8216;application&#8217;</em> callable for each request it receives from an <span class="caps">HTTP</span> client. It passes a dictionary <em>&#8216;environ&#8217;</em> containing <span class="caps">WSGI</span>/<span class="caps">CGI</span> variables and a <em>&#8216;start_response&#8217;</em> callable as arguments to the <em>&#8216;application&#8217;</em>&nbsp;callable.</li> <li>The framework/application generates an <span class="caps">HTTP</span> status and <span class="caps">HTTP</span> response headers and passes them to the <em>&#8216;start_response&#8217;</em> callable for the server to store them. The framework/application also returns a response&nbsp;body.</li> <li>The server combines the status, the response headers, and the response body into an <span class="caps">HTTP</span> response and transmits it to the client (This step is not part of the specification but it&#8217;s the next logical step in the flow and I added it for&nbsp;clarity)</li> </ol> <p>And here is a visual representation of the&nbsp;interface:</p> <p><img alt="WSGI Interface" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_wsgi_interface.png"></p> <p>So far, you&#8217;ve seen the Pyramid, Flask, and Django Web applications and you&#8217;ve seen the server code that implements the server side of the <span class="caps">WSGI</span> specification. You&#8217;ve even seen the barebones <span class="caps">WSGI</span> application code snippet that doesn&#8217;t use any&nbsp;framework.</p> <p>The thing is that when you write a Web application using one of those frameworks you work at a higher level and don&#8217;t work with <span class="caps">WSGI</span> directly, but I know you&#8217;re curious about the framework side of the <span class="caps">WSGI</span> interface, too because you’re reading this article. So, let&#8217;s create a minimalistic <span class="caps">WSGI</span> Web application/Web framework without using Pyramid, Flask, or Django and run it with your&nbsp;server:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">app</span><span class="p">(</span><span class="n">environ</span><span class="p">,</span> <span class="n">start_response</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;A barebones WSGI application.</span> <span class="sd"> This is a starting point for your own Web framework :)</span> <span class="sd"> &quot;&quot;&quot;</span> <span class="n">status</span> <span class="o">=</span> <span class="s1">&#39;200 OK&#39;</span> <span class="n">response_headers</span> <span class="o">=</span> <span class="p">[(</span><span class="s1">&#39;Content-Type&#39;</span><span class="p">,</span> <span class="s1">&#39;text/plain&#39;</span><span class="p">)]</span> <span class="n">start_response</span><span class="p">(</span><span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span><span class="p">)</span> <span class="k">return</span> <span class="p">[</span><span class="sa">b</span><span class="s1">&#39;Hello world from a simple WSGI application!</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">]</span> </pre></div> <p>Again, save the above code in <em>wsgiapp.py</em> file or download it from <a href="https://github.com/rspivak/lsbaws/blob/master/part2/wsgiapp.py">GitHub</a> directly and run the application under your Web server&nbsp;as:</p> <div class="highlight"><pre><span></span><span class="o">(</span>lsbaws<span class="o">)</span> $ python webserver2.py wsgiapp:app WSGIServer: Serving HTTP on port <span class="m">8888</span> ... </pre></div> <p>Type in the following address and press Enter. This is the result you should&nbsp;see:</p> <p><img alt="Simple WSGI Application" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_browser_simple_wsgi_app.png"></p> <p>You just wrote your very own minimalistic <span class="caps">WSGI</span> Web framework while learning about how to create a Web server!&nbsp;Outrageous.</p> <p>Now, let&#8217;s get back to what the server transmits to the client. Here is the <span class="caps">HTTP</span> response the server generates when you call your Pyramid application using an <span class="caps">HTTP</span>&nbsp;client:</p> <p><img alt="HTTP Response Part 1" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_http_response.png" width="640"></p> <p>The response has some familiar parts that you saw in <a href="http://ruslanspivak.com/lsbaws-part1/" title="Part 1">Part 1</a> but it also has something new. It has, for example, four <a href="http://en.wikipedia.org/wiki/List_of_HTTP_header_fields" title="HTTP header fields"><span class="caps">HTTP</span> headers</a> that you haven&#8217;t seen before: <em>Content-Type</em>, <em>Content-Length</em>, <em>Date</em>, and <em>Server</em>. Those are the headers that a response from a Web server generally should have. None of them are strictly required, though. The purpose of the headers is to transmit additional information about the <span class="caps">HTTP</span>&nbsp;request/response.</p> <p>Now that you know more about the <span class="caps">WSGI</span> interface, here is the same <span class="caps">HTTP</span> response with some more information about what parts produced&nbsp;it:</p> <p><img alt="HTTP Response Part 2" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_http_response_explanation.png"></p> <p>I haven’t said anything about the <strong>&#8216;environ&#8217;</strong> dictionary yet, but basically it&#8217;s a Python dictionary that must contain certain <span class="caps">WSGI</span> and <span class="caps">CGI</span> variables prescribed by the <span class="caps">WSGI</span> specification. The server takes the values for the dictionary from the <span class="caps">HTTP</span> request after parsing the request. This is what the contents of the dictionary look&nbsp;like:</p> <p><img alt="Environ Python Dictionary" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_environ.png" width="720"></p> <p>A Web framework uses the information from that dictionary to decide which view to use based on the specified route, request method etc., where to read the request body from and where to write errors, if&nbsp;any.</p> <p>By now you&#8217;ve created your own <span class="caps">WSGI</span> Web server and you&#8217;ve made Web applications written with different Web frameworks. And, you&#8217;ve also created your barebones Web application/Web framework along the way. It&#8217;s been a heck of a journey. Let&#8217;s recap what your <span class="caps">WSGI</span> Web server has to do to serve requests aimed at a <span class="caps">WSGI</span>&nbsp;application:</p> <ul> <li>First, the server starts and loads an <em>&#8216;application&#8217;</em> callable provided by your Web&nbsp;framework/application</li> <li>Then, the server reads a&nbsp;request</li> <li>Then, the server parses&nbsp;it</li> <li>Then, it builds an <em>&#8216;environ&#8217;</em> dictionary using the request&nbsp;data</li> <li>Then, it calls the <em>&#8216;application&#8217;</em> callable with the <em>&#8216;environ&#8217;</em> dictionary and a <em>&#8216;start_response&#8217;</em> callable as parameters and gets back a response&nbsp;body.</li> <li>Then, the server constructs an <span class="caps">HTTP</span> response using the data returned by the call to the <em>&#8216;application&#8217;</em> object and the status and response headers set by the <em>&#8216;start_response&#8217;</em>&nbsp;callable.</li> <li>And finally, the server transmits the <span class="caps">HTTP</span> response back to the&nbsp;client</li> </ul> <p><img alt="Server Summary" src="https://ruslanspivak.com/lsbaws-part2/lsbaws_part2_server_summary.png" width="700"></p> <p>That&#8217;s about all there is to it. You now have a working <span class="caps">WSGI</span> server that can serve basic Web applications written with <span class="caps">WSGI</span> compliant Web frameworks like <a href="https://www.djangoproject.com/" title="Django">Django</a>, <a href="http://flask.pocoo.org/" title="Flask">Flask</a>, <a href="http://trypyramid.com/" title="Pyramid">Pyramid</a>, or your very own <span class="caps">WSGI</span> framework. The best part is that the server can be used with multiple Web frameworks without any changes to the server code base. Not bad at&nbsp;all.</p> <p>Before you go, here is another question for you to think about, <em>&#8220;How do you make your server handle more than one request at a&nbsp;time?&#8221;</em></p> <p>Stay tuned and I will show you a way to do that in <a href="https://ruslanspivak.com/lsbaws-part3/">Part 3</a>.&nbsp;Cheers!</p> <p><br/> <em>Resources used in preparation for this article (some links are affiliate&nbsp;links):</em></p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/0131411551/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0131411551&linkCode=as2&tag=russblo0b-20&linkId=2F4NYRBND566JJQL">Unix Network Programming, Volume 1: The Sockets Networking <span class="caps">API</span> (3rd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0131411551" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321637739/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321637739&linkCode=as2&tag=russblo0b-20&linkId=3ZYAKB537G6TM22J">Advanced Programming in the <span class="caps">UNIX</span> Environment, 3rd Edition</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321637739" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1593272200/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1593272200&linkCode=as2&tag=russblo0b-20&linkId=CHFOMNYXN35I2MON">The Linux Programming Interface: A Linux and <span class="caps">UNIX</span> System Programming Handbook</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1593272200" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="https://www.python.org/dev/peps/pep-0333/" target="_blank"><span class="caps">PEP</span> 333 &#8212; Python Web Server Gateway&nbsp;Interface</a></p> </li> </ol> <p><br/></p> <blockquote> <p><em><strong><span class="caps">UPDATE</span>: Mon, July 15,&nbsp;2019</strong></em></p> <ul> <li> <p>Updated the server code to run under Python&nbsp;3.7+</p> </li> <li> <p>Added resources used in preparation for the&nbsp;article</p> </li> </ul> </blockquote> <p></p> <p><br/> <strong>All articles in this&nbsp;series:</strong></p> <ul> <li><a href="https://ruslanspivak.com/lsbaws-part1/">Let&#8217;s Build A Web Server. Part&nbsp;1.</a></li> <li><a href="https://ruslanspivak.com/lsbaws-part2/">Let&#8217;s Build A Web Server. Part&nbsp;2.</a></li> <li><a href="https://ruslanspivak.com/lsbaws-part3/">Let&#8217;s Build A Web Server. Part&nbsp;3.</a></li> </ul>Let’s Build A Web Server. Part 1.2015-03-09T08:00:00-04:002015-03-09T08:00:00-04:00Ruslan Spivaktag:ruslanspivak.com,2015-03-09:/lsbaws-part1/<p>Out for a walk one day, a woman came across a construction site and saw three men working. She asked the first man, “What are you doing?” Annoyed by the question, the first man barked, “Can’t you see that I’m laying bricks?” Not satisfied with the answer, she …</p><p>Out for a walk one day, a woman came across a construction site and saw three men working. She asked the first man, “What are you doing?” Annoyed by the question, the first man barked, “Can’t you see that I’m laying bricks?” Not satisfied with the answer, she asked the second man what he was doing. The second man answered, “I’m building a brick wall.” Then, turning his attention to the first man, he said, “Hey, you just passed the end of the wall. You need to take off that last brick.” Again not satisfied with the answer, she asked the third man what he was doing. And the man said to her while looking up in the sky, “I am building the biggest cathedral this world has ever known.” While he was standing there and looking up in the sky the other two men started arguing about the errant brick. The man turned to the first two men and said, “Hey guys, don’t worry about that brick. It’s an inside wall, it will get plastered over and no one will ever see that brick. Just move on to another&nbsp;layer.”</p> <p>The moral of the story is that when you know the whole system and understand how different pieces fit together (bricks, walls, cathedral), you can identify and fix problems faster (errant&nbsp;brick).</p> <p>What does it have to do with creating your own Web server from&nbsp;scratch?</p> <p><strong>I believe to become a better developer you <span class="caps">MUST</span> get a better understanding of the underlying software systems you use on a daily basis and that includes programming languages, compilers and interpreters, databases and operating systems, web servers and web frameworks. And, to get a better and deeper understanding of those systems you <span class="caps">MUST</span> re-build them from scratch, brick by brick, wall by&nbsp;wall.</strong></p> <p>Confucius put it this&nbsp;way:</p> <blockquote> <p><em><span class="dquo">&#8220;</span>I hear and I&nbsp;forget.&#8221;</em></p> </blockquote> <p><img alt="Hear" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_hear.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>I see and I&nbsp;remember.&#8221;</em></p> </blockquote> <p><img alt="See" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_see.png" width="640"></p> <blockquote> <p><em><span class="dquo">&#8220;</span>I do and I&nbsp;understand.&#8221;</em></p> </blockquote> <p><img alt="Do" src="https://ruslanspivak.com/lsbasi-part4/LSBAWS_confucius_do.png" width="640"></p> <p>I hope at this point you’re convinced that it’s a good idea to start re-building different software systems to learn how they&nbsp;work.</p> <p>In this three-part series I will show you how to build your own basic Web server. Let’s get&nbsp;started.</p> <p>First things first, what is a Web&nbsp;server?</p> <p><img alt="HTTP Request/Response" src="https://ruslanspivak.com/lsbaws-part1/LSBAWS_HTTP_request_response.png"></p> <p>In a nutshell it’s a networking server that sits on a physical server (oops, a server on a server) and waits for a client to send a request. When it receives a request, it generates a response and sends it back to the client. The communication between a client and a server happens using <span class="caps">HTTP</span> protocol. A client can be your browser or any other software that speaks <span class="caps">HTTP</span>.</p> <p>What would a very simple implementation of a Web server look like? Here is my take on it. The example is in Python (tested on Python3.7+) but even if you don’t know Python (it’s a very easy language to pick up, try it!) you still should be able to understand concepts from the code and explanations&nbsp;below:</p> <div class="highlight"><pre><span></span><span class="c1"># Python3.7+</span> <span class="kn">import</span> <span class="nn">socket</span> <span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="mi">8888</span> <span class="n">listen_socket</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">setsockopt</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">SOL_SOCKET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SO_REUSEADDR</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">bind</span><span class="p">((</span><span class="n">HOST</span><span class="p">,</span> <span class="n">PORT</span><span class="p">))</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">listen</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s1">&#39;Serving HTTP on port {PORT} ...&#39;</span><span class="p">)</span> <span class="k">while</span> <span class="bp">True</span><span class="p">:</span> <span class="n">client_connection</span><span class="p">,</span> <span class="n">client_address</span> <span class="o">=</span> <span class="n">listen_socket</span><span class="o">.</span><span class="n">accept</span><span class="p">()</span> <span class="n">request_data</span> <span class="o">=</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="mi">1024</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">request_data</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">&#39;utf-8&#39;</span><span class="p">))</span> <span class="n">http_response</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&quot;&quot;&quot;</span><span class="se">\</span> <span class="s2">HTTP/1.1 200 OK</span> <span class="s2">Hello, World!</span> <span class="s2">&quot;&quot;&quot;</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">sendall</span><span class="p">(</span><span class="n">http_response</span><span class="p">)</span> <span class="n">client_connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span> </pre></div> <p>Save the above code as <em>webserver1.py</em> or download it directly from <a href="https://github.com/rspivak/lsbaws/blob/master/part1/webserver1.py" title="GitHub">GitHub</a> and run it on the command line like&nbsp;this</p> <div class="highlight"><pre><span></span>$ python webserver1.py Serving HTTP on port <span class="m">8888</span> … </pre></div> <p>Now type in the following <span class="caps">URL</span> in your Web browser’s address bar <a href="http://localhost:8888/hello" title="Hello">http://localhost:8888/hello</a>, hit Enter, and see magic in action. You should see <em>&#8220;Hello, World!&#8221;</em> displayed in your browser like&nbsp;this:</p> <p><img alt="Browser &quot;Hello, World!&quot;" src="https://ruslanspivak.com/lsbaws-part1/browser_hello_world.png"></p> <p>Just do it, seriously. I will wait for you while you’re testing&nbsp;it.</p> <p>Done? Great. Now let’s discuss how it all actually&nbsp;works.</p> <p>First let’s start with the Web address you’ve entered. It’s called an <a href="http://en.wikipedia.org/wiki/Uniform_resource_locator"><span class="caps">URL</span></a> and here is its basic&nbsp;structure:</p> <p><img alt="URL Structure" src="https://ruslanspivak.com/lsbaws-part1/LSBAWS_URL_Web_address.png" width="480"></p> <p>This is how you tell your browser the address of the Web server it needs to find and connect to and the page (path) on the server to fetch for you. Before your browser can send a <span class="caps">HTTP</span> request though, it first needs to establish a <span class="caps">TCP</span> connection with the Web server. Then it sends an <span class="caps">HTTP</span> request over the <span class="caps">TCP</span> connection to the server and waits for the server to send an <span class="caps">HTTP</span> response back. And when your browser receives the response it displays it, in this case it displays “Hello,&nbsp;World!”</p> <p>Let’s explore in more detail how the client and the server establish a <span class="caps">TCP</span> connection before sending <span class="caps">HTTP</span> requests and responses. To do that they both use so-called <em>sockets</em>. Instead of using a browser directly you are going to simulate your browser manually by using <em>telnet</em> on the command&nbsp;line.</p> <p>On the same computer you’re running the Web server fire up a telnet session on the command line specifying a host to connect to <em>localhost</em> and the port to connect to <em>8888</em> and then press&nbsp;Enter:</p> <div class="highlight"><pre><span></span>$ telnet localhost <span class="m">8888</span> Trying <span class="m">127</span>.0.0.1 … Connected to localhost. </pre></div> <p>At this point you’ve established a <span class="caps">TCP</span> connection with the server running on your local host and ready to send and receive <span class="caps">HTTP</span> messages. In the picture below you can see a standard procedure a server has to go through to be able to accept new <span class="caps">TCP</span> connections. <img alt="Socket accept" src="https://ruslanspivak.com/lsbaws-part1/LSBAWS_socket.png" width="780"></p> <p>In the same telnet session type <strong><em><span class="caps">GET</span> /hello <span class="caps">HTTP</span>/1.1</em></strong> and hit&nbsp;Enter:</p> <div class="highlight"><pre><span></span>$ telnet localhost <span class="m">8888</span> Trying <span class="m">127</span>.0.0.1 … Connected to localhost. GET /hello HTTP/1.1 HTTP/1.1 <span class="m">200</span> OK Hello, World! </pre></div> <p>You’ve just manually simulated your browser! You sent an <span class="caps">HTTP</span> request and got an <span class="caps">HTTP</span> response back. This is the basic structure of an <span class="caps">HTTP</span>&nbsp;request:</p> <p><img alt="HTTP Request Aanatomy" src="https://ruslanspivak.com/lsbaws-part1/LSBAWS_HTTP_request_anatomy.png" width="560"></p> <p>The <span class="caps">HTTP</span> request consists of the line indicating the <span class="caps">HTTP</span> method (<strong><em><span class="caps">GET</span></em></strong>, because we are asking our server to return us something), the path <em>/hello</em> that indicates a <em>“page”</em> on the server we want and the protocol&nbsp;version.</p> <p>For simplicity’s sake our Web server at this point completely ignores the above request line. You could just as well type in any garbage instead of <em>&#8220;<span class="caps">GET</span> /hello <span class="caps">HTTP</span>/1.1&#8221;</em> and you would still get back a <em>&#8220;Hello, World!&#8221;</em>&nbsp;response.</p> <p>Once you’ve typed the request line and hit Enter the client sends the request to the server, the server reads the request line, prints it and returns the proper <span class="caps">HTTP</span>&nbsp;response.</p> <p>Here is the <span class="caps">HTTP</span> response that the server sends back to your client (<em>telnet</em> in this case): <img alt="HTTP Response Anatomy" src="https://ruslanspivak.com/lsbaws-part1/LSBAWS_HTTP_response_anatomy.png" width="560"></p> <p>Let’s dissect it. The response consists of a status line <em><span class="caps">HTTP</span>/1.1 200 <span class="caps">OK</span></em>, followed by a required empty line, and then the <span class="caps">HTTP</span> response&nbsp;body.</p> <p>The response status line <em><span class="caps">HTTP</span>/1.1 200 <span class="caps">OK</span></em> consists of the <em><span class="caps">HTTP</span> Version</em>, the <em><span class="caps">HTTP</span> status code</em> and the <em><span class="caps">HTTP</span> status code reason</em> phrase <em><span class="caps">OK</span></em>. When the browser gets the response, it displays the body of the response and that’s why you see <em>&#8220;Hello, World!&#8221;</em> in your&nbsp;browser.</p> <p>And that’s the basic model of how a Web server works. To sum it up: The Web server creates a listening socket and starts accepting new connections in a loop. The client initiates a <span class="caps">TCP</span> connection and, after successfully establishing it, the client sends an <span class="caps">HTTP</span> request to the server and the server responds with an <span class="caps">HTTP</span> response that gets displayed to the user. To establish a <span class="caps">TCP</span> connection both clients and servers use <em>sockets</em>.</p> <p>Now you have a very basic working Web server that you can test with your browser or some other <span class="caps">HTTP</span> client. As you’ve seen and hopefully tried, you can also be a human <span class="caps">HTTP</span> client too, by using <em>telnet</em> and typing <span class="caps">HTTP</span> requests&nbsp;manually.</p> <p>Here’s a question for you: “How do you run a Django application, Flask application, and Pyramid application under your freshly minted Web server without making a single change to the server to accommodate all those different Web&nbsp;frameworks?”</p> <p>I will show you exactly how in <a href="https://ruslanspivak.com/lsbaws-part2/">Part 2</a> of the series. Stay&nbsp;tuned.</p> <p><br/> <em>Resources used in preparation for this article (links are affiliate&nbsp;links):</em></p> <ol> <li> <p><a href="http://www.amazon.com/gp/product/0131411551/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0131411551&linkCode=as2&tag=russblo0b-20&linkId=2F4NYRBND566JJQL">Unix Network Programming, Volume 1: The Sockets Networking <span class="caps">API</span> (3rd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0131411551" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0321637739/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321637739&linkCode=as2&tag=russblo0b-20&linkId=3ZYAKB537G6TM22J">Advanced Programming in the <span class="caps">UNIX</span> Environment, 3rd Edition</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321637739" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/1593272200/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1593272200&linkCode=as2&tag=russblo0b-20&linkId=CHFOMNYXN35I2MON">The Linux Programming Interface: A Linux and <span class="caps">UNIX</span> System Programming Handbook</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=1593272200" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> <li> <p><a href="http://www.amazon.com/gp/product/0814420303/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0814420303&linkCode=as2&tag=russblo0b-20&linkId=HY2LNXTSGPPFZ2EV">Lead with a Story</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0814420303" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p> </li> </ol> <p><br/></p> <blockquote> <p><em><strong><span class="caps">UPDATE</span>: Sat, July 13,&nbsp;2019</strong></em></p> <ul> <li> <p>Updated the server code to run under Python&nbsp;3.7+</p> </li> <li> <p>Added resources used in preparation for the&nbsp;article</p> </li> </ul> </blockquote> <p></p> <p><br/> <strong>All articles in this&nbsp;series:</strong></p> <ul> <li><a href="https://ruslanspivak.com/lsbaws-part1/">Let&#8217;s Build A Web Server. Part&nbsp;1.</a></li> <li><a href="https://ruslanspivak.com/lsbaws-part2/">Let&#8217;s Build A Web Server. Part&nbsp;2.</a></li> <li><a href="https://ruslanspivak.com/lsbaws-part3/">Let&#8217;s Build A Web Server. Part&nbsp;3.</a></li> </ul>