953 lines
No EOL
62 KiB
HTML
953 lines
No EOL
62 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en"
|
||
xmlns:og="http://ogp.me/ns#"
|
||
xmlns:fb="https://www.facebook.com/2008/fbml">
|
||
<head>
|
||
<title>Let’s Build A Simple Interpreter. Part 9. - Ruslan's Blog</title>
|
||
<!-- Using the latest rendering mode for IE -->
|
||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
|
||
|
||
|
||
<link rel="canonical" href="index.html">
|
||
|
||
<meta name="author" content="Ruslan Spivak" />
|
||
<meta name="description" content="I remember when I was in university (a long time ago) and learning systems programming, I believed that the only “real” languages were Assembly and C. And Pascal was - how to put it nicely - a very high-level language used by application developers who didn’t want to know what was …" />
|
||
|
||
<meta property="og:site_name" content="Ruslan's Blog" />
|
||
<meta property="og:type" content="article"/>
|
||
<meta property="og:title" content="Let’s Build A Simple Interpreter. Part 9."/>
|
||
<meta property="og:url" content="https://ruslanspivak.com/lsbasi-part9/"/>
|
||
<meta property="og:description" content="I remember when I was in university (a long time ago) and learning systems programming, I believed that the only “real” languages were Assembly and C. And Pascal was - how to put it nicely - a very high-level language used by application developers who didn’t want to know what was …"/>
|
||
<meta property="article:published_time" content="2016-05-01" />
|
||
<meta property="article:section" content="blog" />
|
||
<meta property="article:author" content="Ruslan Spivak" />
|
||
|
||
<meta name="twitter:card" content="summary">
|
||
<meta name="twitter:domain" content="https://ruslanspivak.com">
|
||
|
||
<!-- Bootstrap -->
|
||
<link rel="stylesheet" href="../theme/css/bootstrap.min.css" type="text/css"/>
|
||
<link href="../theme/css/font-awesome.min.css" rel="stylesheet">
|
||
|
||
<link href="../theme/css/pygments/tango.css" rel="stylesheet">
|
||
<link href="../theme/css/typogrify.css" rel="stylesheet">
|
||
<link rel="stylesheet" href="../theme/css/style.css" type="text/css"/>
|
||
<link href="../static/custom.css" rel="stylesheet">
|
||
|
||
<link href="../feeds/all.atom.xml" type="application/atom+xml" rel="alternate"
|
||
title="Ruslan's Blog ATOM Feed"/>
|
||
|
||
</head>
|
||
<body>
|
||
|
||
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
|
||
<div class="container">
|
||
<div class="navbar-header">
|
||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
|
||
<span class="sr-only">Toggle navigation</span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
</button>
|
||
<a href="../index.html" class="navbar-brand">
|
||
Ruslan's Blog </a>
|
||
</div>
|
||
<div class="collapse navbar-collapse navbar-ex1-collapse">
|
||
<ul class="nav navbar-nav">
|
||
</ul>
|
||
<ul class="nav navbar-nav navbar-right">
|
||
<li><a href="../pages/about.html"><i class="fa fa-question"></i><span class="icon-label">About</span></a></li>
|
||
<li><a href="../archives.html"><i class="fa fa-th-list"></i><span class="icon-label">Archives</span></a></li>
|
||
</ul>
|
||
</div>
|
||
<!-- /.navbar-collapse -->
|
||
</div>
|
||
</div> <!-- /.navbar -->
|
||
<!-- Banner -->
|
||
<!-- End Banner -->
|
||
<div class="container">
|
||
<div class="row">
|
||
<div class="col-sm-9">
|
||
|
||
<section id="content">
|
||
<article>
|
||
<header class="page-header">
|
||
<h1>
|
||
<a href="index.html"
|
||
rel="bookmark"
|
||
title="Permalink to Let’s Build A Simple Interpreter. Part 9.">
|
||
Let’s Build A Simple Interpreter. Part 9.
|
||
</a>
|
||
</h1>
|
||
</header>
|
||
<div class="entry-content">
|
||
<div class="panel">
|
||
<div class="panel-body">
|
||
<footer class="post-info">
|
||
<span class="label label-default">Date</span>
|
||
<span class="published">
|
||
<i class="fa fa-calendar"></i><time datetime="2016-05-01T06:10:00-04:00"> Sun, May 01, 2016</time>
|
||
</span>
|
||
|
||
|
||
|
||
|
||
</footer><!-- /.post-info --> </div>
|
||
</div>
|
||
<p>I remember when I was in university (a long time ago) and learning systems programming,
|
||
I believed that the only “real” languages were Assembly and C. And Pascal was - how to put it nicely - a
|
||
very high-level language used by application developers who didn’t want to know what was going on under the hood.</p>
|
||
<p>Little did I know back then that I would be writing almost everything in Python (and love every bit of it)
|
||
to pay my bills and that I would also be writing an interpreter and compiler for Pascal for the reasons
|
||
I stated in <a href="../lsbasi-part1/index.html">the very first article of the series</a>.</p>
|
||
<p>These days, I consider myself a programming languages enthusiast, and I’m fascinated by all
|
||
languages and their unique features. Having said that, I have to note that I enjoy using certain
|
||
languages way more than others. I am biased and I’ll be the first one to admit that. :)</p>
|
||
<p>This is me before:</p>
|
||
<p><img alt="" src="lsbasi_part9_story_before.png" width="720"></p>
|
||
<p>And now:</p>
|
||
<p><img alt="" src="lsbasi_part9_story_now.png" width="720"></p>
|
||
<p>Okay, let’s get down to business. Here is what you’re going to learn today:</p>
|
||
<ol>
|
||
<li>How to parse and interpret a Pascal program definition.</li>
|
||
<li>How to parse and interpret compound statements.</li>
|
||
<li>How to parse and interpret assignment statements, including variables.</li>
|
||
<li>A bit about symbol tables and how to store and lookup variables.</li>
|
||
</ol>
|
||
<p>I’ll use the following sample Pascal-like program to introduce new concepts:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">BEGIN</span>
|
||
<span class="k">BEGIN</span>
|
||
<span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span>
|
||
<span class="n">a</span> <span class="o">:=</span> <span class="n">number</span><span class="o">;</span>
|
||
<span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">number</span> <span class="o">/</span> <span class="mi">4</span><span class="o">;</span>
|
||
<span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span>
|
||
<span class="k">END</span><span class="o">;</span>
|
||
<span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span>
|
||
<span class="k">END</span><span class="o">.</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>You could say that that’s quite a jump from the command line interpreter you wrote
|
||
so far by following the previous articles in the series, but it’s a jump that I hope will bring excitement.
|
||
It’s not “just” a calculator anymore, we’re getting serious here, Pascal serious. :)</p>
|
||
<p>Let’s dive in and look at syntax diagrams for new language constructs and their corresponding grammar rules.</p>
|
||
<p>On your marks: Ready. Set. Go!</p>
|
||
<p><img alt="" src="lsbasi_part9_syntax_diagram_01.png" width="720">
|
||
<img alt="" src="lsbasi_part9_syntax_diagram_02.png" width="720">
|
||
<img alt="" src="lsbasi_part9_syntax_diagram_03.png" width="720"></p>
|
||
<ol>
|
||
<li>
|
||
<p>I’ll start with describing what a Pascal <em>program</em> is. A Pascal <em><strong>program</strong></em> consists of
|
||
a <em>compound statement</em> that ends with a dot. Here is an example of a program:</p>
|
||
<div class="highlight"><pre><span></span>“BEGIN END.”
|
||
</pre></div>
|
||
|
||
|
||
<p>I have to note that this is not a complete program definition, and we’ll extend it later in the series.</p>
|
||
</li>
|
||
<li>
|
||
<p>What is a <em>compound statement</em>? A <em><strong>compound statement</strong></em> is a block marked with <span class="caps">BEGIN</span> and <span class="caps">END</span>
|
||
that can contain a list (possibly empty) of statements including other compound statements.
|
||
Every statement inside the compound statement, except for the last one, must terminate with a semicolon.
|
||
The last statement in the block may or may not have a terminating semicolon. Here are some examples
|
||
of valid compound statements:</p>
|
||
<div class="highlight"><pre><span></span>“BEGIN END”
|
||
“BEGIN a := 5; x := 11 END”
|
||
“BEGIN a := 5; x := 11; END”
|
||
“BEGIN BEGIN a := 5 END; x := 11 END”
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>A <em><strong>statement list</strong></em> is a list of zero or more statements inside a compound statement. See above for some examples.</p>
|
||
</li>
|
||
<li>
|
||
<p>A <em><strong>statement</strong></em> can be a <em>compound statement</em>, an <em>assignment statement</em>, or it can be an <em>empty</em> statement.</p>
|
||
</li>
|
||
<li>
|
||
<p>An <em><strong>assignment statement</strong></em> is a variable followed by an <span class="caps">ASSIGN</span> token (two characters, ‘:’ and ‘=’) followed by an expression.</p>
|
||
<div class="highlight"><pre><span></span>“a := 11”
|
||
“b := a + 9 - 5 * 2”
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>A <em><strong>variable</strong></em> is an identifier. We’ll use the <span class="caps">ID</span> token for variables. The value of the token will
|
||
be a variable’s name like ‘a’, ‘number’, and so on. In the following code block ‘a’ and ‘b’ are variables:</p>
|
||
<div class="highlight"><pre><span></span>“BEGIN a := 11; b := a + 9 - 5 * 2 END”
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>An <em><strong>empty</strong></em> statement represents a grammar rule with no further productions.
|
||
We use the <em>empty_statement</em> grammar rule to indicate the end of the <em>statement_list</em>
|
||
in the parser and also to allow for empty compound statements as in ‘<span class="caps">BEGIN</span> <span class="caps">END</span>’.</p>
|
||
</li>
|
||
<li>
|
||
<p>The <em><strong>factor</strong></em> rule is updated to handle variables.</p>
|
||
</li>
|
||
</ol>
|
||
<p><br/>
|
||
Now let’s take a look at our complete grammar:</p>
|
||
<div class="highlight"><pre><span></span> program : compound_statement DOT
|
||
|
||
compound_statement : BEGIN statement_list END
|
||
|
||
statement_list : statement
|
||
| statement SEMI statement_list
|
||
|
||
statement : compound_statement
|
||
| assignment_statement
|
||
| empty
|
||
|
||
assignment_statement : variable ASSIGN expr
|
||
|
||
empty :
|
||
|
||
expr: term ((PLUS | MINUS) term)*
|
||
|
||
term: factor ((MUL | DIV) factor)*
|
||
|
||
factor : PLUS factor
|
||
| MINUS factor
|
||
| INTEGER
|
||
| LPAREN expr RPAREN
|
||
| variable
|
||
|
||
variable: ID
|
||
</pre></div>
|
||
|
||
|
||
<p>You probably noticed that I didn’t use the star <strong>‘*’</strong> symbol in the <em>compound_statement</em>
|
||
rule to represent zero or more repetitions, but instead explicitly specified the <em>statement_list</em> rule.
|
||
This is another way to represent the ‘zero or more’ operation, and it will come in handy when we look
|
||
at parser generators like <a href="http://www.dabeaz.com/ply/"><span class="caps">PLY</span></a>, later in the series. I also split the “(<span class="caps">PLUS</span> | <span class="caps">MINUS</span>) factor” sub-rule
|
||
into two separate rules.</p>
|
||
<p><br/>
|
||
In order to support the updated grammar, we need to make a number of changes to our lexer, parser, and interpreter.
|
||
Let’s go over those changes one by one.</p>
|
||
<p>Here is the summary of the changes in our lexer:
|
||
<img alt="" src="lsbasi_part9_lexer.png" width="720"></p>
|
||
<ol>
|
||
<li>
|
||
<p>To support a Pascal program’s definition, compound statements, assignment statements, and variables, our lexer needs to return new tokens:</p>
|
||
<ul>
|
||
<li><span class="caps">BEGIN</span> (to mark the beginning of a compound statement)</li>
|
||
<li><span class="caps">END</span> (to mark the end of the compound statement)</li>
|
||
<li><span class="caps">DOT</span> (a token for a dot character ‘.’ required by a Pascal program’s definition)</li>
|
||
<li><span class="caps">ASSIGN</span> (a token for a two character sequence ‘:=’). In Pascal, an assignment operator is different than in many other languages like C, Python, Java, Rust, or Go, where you would use single character ‘=’ to indicate assignment</li>
|
||
<li><span class="caps">SEMI</span> (a token for a semicolon character ‘;’ that is used to mark the end of a statement inside a compound statement)</li>
|
||
<li><span class="caps">ID</span> (A token for a valid identifier. Identifiers start with an alphabetical character followed by any number of alphanumerical characters)</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p>Sometimes, in order to be able to differentiate between different tokens that start
|
||
with the same character, (‘:’ vs ‘:=’ or ‘==’ vs ‘=>’ ) we need to peek into the input buffer
|
||
without actually consuming the next character. For this particular purpose, I introduced a <em>peek</em>
|
||
method that will help us tokenize assignment statements. The method is not strictly required,
|
||
but I thought I would introduce it earlier in the series and it will also make the <em>get_next_token</em>
|
||
method a bit cleaner. All it does is return the next character from the text buffer without
|
||
incrementing the <em>self.pos</em> variable. Here is the method itself:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">peek</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="n">peek_pos</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">pos</span> <span class="o">+</span> <span class="mi">1</span>
|
||
<span class="k">if</span> <span class="n">peek_pos</span> <span class="o">></span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
|
||
<span class="k">return</span> <span class="bp">None</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">text</span><span class="p">[</span><span class="n">peek_pos</span><span class="p">]</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>Because Pascal variables and reserved keywords are both identifiers, we will combine
|
||
their handling into one method called <em>_id</em>. The way it works is that the lexer consumes a sequence
|
||
of alphanumerical characters and then checks if the character sequence is a reserved word.
|
||
If it is, it returns a pre-constructed token for that reserved keyword. And if it’s not a reserved keyword,
|
||
it returns a new <span class="caps">ID</span> token whose value is the character string (lexeme).
|
||
I bet at this point you think, “Gosh, just show me the code.” :) Here it is:</p>
|
||
<div class="highlight"><pre><span></span><span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="p">{</span>
|
||
<span class="s1">'BEGIN'</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">'BEGIN'</span><span class="p">,</span> <span class="s1">'BEGIN'</span><span class="p">),</span>
|
||
<span class="s1">'END'</span><span class="p">:</span> <span class="n">Token</span><span class="p">(</span><span class="s1">'END'</span><span class="p">,</span> <span class="s1">'END'</span><span class="p">),</span>
|
||
<span class="p">}</span>
|
||
|
||
<span class="k">def</span> <span class="nf">_id</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""Handle identifiers and reserved keywords"""</span>
|
||
<span class="n">result</span> <span class="o">=</span> <span class="s1">''</span>
|
||
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isalnum</span><span class="p">():</span>
|
||
<span class="n">result</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span>
|
||
|
||
<span class="n">token</span> <span class="o">=</span> <span class="n">RESERVED_KEYWORDS</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">Token</span><span class="p">(</span><span class="n">ID</span><span class="p">,</span> <span class="n">result</span><span class="p">))</span>
|
||
<span class="k">return</span> <span class="n">token</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>And now let’s take a look at the changes in the main lexer method <em>get_next_token</em>:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_next_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
|
||
<span class="o">...</span>
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span><span class="o">.</span><span class="n">isalpha</span><span class="p">():</span>
|
||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_id</span><span class="p">()</span>
|
||
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">':'</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">peek</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'='</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">ASSIGN</span><span class="p">,</span> <span class="s1">':='</span><span class="p">)</span>
|
||
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">';'</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">SEMI</span><span class="p">,</span> <span class="s1">';'</span><span class="p">)</span>
|
||
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_char</span> <span class="o">==</span> <span class="s1">'.'</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">advance</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">Token</span><span class="p">(</span><span class="n">DOT</span><span class="p">,</span> <span class="s1">'.'</span><span class="p">)</span>
|
||
<span class="o">...</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
</ol>
|
||
<p>It’s time to see our shiny new lexer in all its glory and action.
|
||
Download the source code from <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python">GitHub</a> and launch your Python shell from the same directory where you saved the <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python/spi.py">spi.py</a> file:</p>
|
||
<div class="highlight"><pre><span></span>>>> from spi import Lexer
|
||
>>> <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span><span class="s1">'BEGIN a := 2; END.'</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>BEGIN, <span class="s1">'BEGIN'</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>ID, <span class="s1">'a'</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>ASSIGN, <span class="s1">':='</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>INTEGER, <span class="m">2</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>SEMI, <span class="s1">';'</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>END, <span class="s1">'END'</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>DOT, <span class="s1">'.'</span><span class="o">)</span>
|
||
>>> lexer.get_next_token<span class="o">()</span>
|
||
Token<span class="o">(</span>EOF, None<span class="o">)</span>
|
||
>>>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/>
|
||
Moving on to parser changes.</p>
|
||
<p>Here is the summary of changes in our parser:
|
||
<img alt="" src="lsbasi_part9_parser.png" width="720"></p>
|
||
<ol>
|
||
<li>
|
||
<p>Let’s start with new <span class="caps">AST</span> nodes:</p>
|
||
<ul>
|
||
<li>
|
||
<p><em>Compound</em> <span class="caps">AST</span> node represents a compound statement. It contains a list of statement nodes in its <em>children</em> variable.</p>
|
||
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Compound</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span>
|
||
<span class="sd">"""Represents a 'BEGIN ... END' block"""</span>
|
||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">children</span> <span class="o">=</span> <span class="p">[]</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p><em>Assign</em> <span class="caps">AST</span> node represents an assignment statement. Its <em>left</em> variable is for storing a <em>Var</em> node and its <em>right</em> variable is for storing a node returned by the expr parser method:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Assign</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span>
|
||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">op</span><span class="p">,</span> <span class="n">right</span><span class="p">):</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">left</span> <span class="o">=</span> <span class="n">left</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">op</span> <span class="o">=</span> <span class="n">op</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">right</span> <span class="o">=</span> <span class="n">right</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p><em>Var</em> <span class="caps">AST</span> node (you guessed it) represents a variable. The <em>self.value</em> holds the variable’s name.</p>
|
||
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Var</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span>
|
||
<span class="sd">"""The Var node is constructed out of ID token."""</span>
|
||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">token</span> <span class="o">=</span> <span class="n">token</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">value</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p><em>NoOp</em> node is used to represent an <em>empty</em> statement. For example ‘<span class="caps">BEGIN</span> <span class="caps">END</span>’ is a valid compound statement that has no statements.</p>
|
||
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">NoOp</span><span class="p">(</span><span class="n">AST</span><span class="p">):</span>
|
||
<span class="k">pass</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p>As you remember, each rule from the grammar has a corresponding method in our recursive-descent parser. This time we’re adding seven new methods. These methods are responsible for parsing new language constructs and constructing new <span class="caps">AST</span> nodes. They are pretty straightforward:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">program</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""program : compound_statement DOT"""</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">()</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">DOT</span><span class="p">)</span>
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
|
||
<span class="k">def</span> <span class="nf">compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""</span>
|
||
<span class="sd"> compound_statement: BEGIN statement_list END</span>
|
||
<span class="sd"> """</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">BEGIN</span><span class="p">)</span>
|
||
<span class="n">nodes</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">statement_list</span><span class="p">()</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">END</span><span class="p">)</span>
|
||
|
||
<span class="n">root</span> <span class="o">=</span> <span class="n">Compound</span><span class="p">()</span>
|
||
<span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">nodes</span><span class="p">:</span>
|
||
<span class="n">root</span><span class="o">.</span><span class="n">children</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
|
||
|
||
<span class="k">return</span> <span class="n">root</span>
|
||
|
||
<span class="k">def</span> <span class="nf">statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""</span>
|
||
<span class="sd"> statement_list : statement</span>
|
||
<span class="sd"> | statement SEMI statement_list</span>
|
||
<span class="sd"> """</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">statement</span><span class="p">()</span>
|
||
|
||
<span class="n">results</span> <span class="o">=</span> <span class="p">[</span><span class="n">node</span><span class="p">]</span>
|
||
|
||
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">SEMI</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">SEMI</span><span class="p">)</span>
|
||
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">statement</span><span class="p">())</span>
|
||
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">ID</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span>
|
||
|
||
<span class="k">return</span> <span class="n">results</span>
|
||
|
||
<span class="k">def</span> <span class="nf">statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""</span>
|
||
<span class="sd"> statement : compound_statement</span>
|
||
<span class="sd"> | assignment_statement</span>
|
||
<span class="sd"> | empty</span>
|
||
<span class="sd"> """</span>
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">BEGIN</span><span class="p">:</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">compound_statement</span><span class="p">()</span>
|
||
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">ID</span><span class="p">:</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">assignment_statement</span><span class="p">()</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">empty</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
|
||
<span class="k">def</span> <span class="nf">assignment_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""</span>
|
||
<span class="sd"> assignment_statement : variable ASSIGN expr</span>
|
||
<span class="sd"> """</span>
|
||
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="p">()</span>
|
||
<span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ASSIGN</span><span class="p">)</span>
|
||
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">expr</span><span class="p">()</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="n">Assign</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">token</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
|
||
<span class="k">def</span> <span class="nf">variable</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""</span>
|
||
<span class="sd"> variable : ID</span>
|
||
<span class="sd"> """</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="n">Var</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="p">)</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span>
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
|
||
<span class="k">def</span> <span class="nf">empty</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""An empty production"""</span>
|
||
<span class="k">return</span> <span class="n">NoOp</span><span class="p">()</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>We also need to update the existing <em>factor</em> method to parse variables:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="sd">"""factor : PLUS factor</span>
|
||
<span class="sd"> | MINUS factor</span>
|
||
<span class="sd"> | INTEGER</span>
|
||
<span class="sd"> | LPAREN expr RPAREN</span>
|
||
<span class="sd"> | variable</span>
|
||
<span class="sd"> """</span>
|
||
<span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span>
|
||
<span class="k">if</span> <span class="n">token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">PLUS</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">eat</span><span class="p">(</span><span class="n">PLUS</span><span class="p">)</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="n">UnaryOp</span><span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">factor</span><span class="p">())</span>
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
<span class="o">...</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>The parser’s <em>parse</em> method is updated to start the parsing process by parsing a program definition:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="n">node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">program</span><span class="p">()</span>
|
||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_token</span><span class="o">.</span><span class="n">type</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">error</span><span class="p">()</span>
|
||
|
||
<span class="k">return</span> <span class="n">node</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
</ol>
|
||
<p>Here is our sample program again:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">BEGIN</span>
|
||
<span class="k">BEGIN</span>
|
||
<span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span>
|
||
<span class="n">a</span> <span class="o">:=</span> <span class="n">number</span><span class="o">;</span>
|
||
<span class="n">b</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">number</span> <span class="o">/</span> <span class="mi">4</span><span class="o">;</span>
|
||
<span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span>
|
||
<span class="k">END</span><span class="o">;</span>
|
||
<span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span>
|
||
<span class="k">END</span><span class="o">.</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>Let’s visualize it with <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python/genastdot.py">genastdot.py</a> (For brevity, when displaying a <em>Var</em> node, it just shows the node’s variable name and when displaying an Assign node it shows ‘:=’ instead of showing ‘Assign’ text):</p>
|
||
<div class="highlight"><pre><span></span>$ python genastdot.py assignments.txt > ast.dot <span class="o">&&</span> dot -Tpng -o ast.png ast.dot
|
||
</pre></div>
|
||
|
||
|
||
<p><img alt="" src="lsbasi_part9_full_ast.png" width="640"></p>
|
||
<p><br/>
|
||
And finally, here are the required interpreter changes:
|
||
<img alt="" src="lsbasi_part9_interpreter.png" width="720"></p>
|
||
<p>To interpret new <span class="caps">AST</span> nodes, we need to add corresponding visitor methods to the interpreter. There are four new visitor methods:</p>
|
||
<ul>
|
||
<li>visit_Compound</li>
|
||
<li>visit_Assign</li>
|
||
<li>visit_Var</li>
|
||
<li>visit_NoOp</li>
|
||
</ul>
|
||
<p><em>Compound</em> and <em>NoOp</em> visitor methods are pretty straightforward. The <em>visit_Compound</em> method
|
||
iterates over its children and visits each one in turn, and the <em>visit_NoOp</em> method does nothing.</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Compound</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
|
||
<span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">children</span><span class="p">:</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
|
||
|
||
<span class="k">def</span> <span class="nf">visit_NoOp</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
|
||
<span class="k">pass</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/>
|
||
The <em>Assign</em> and <em>Var</em> visitor methods deserve a closer examination.</p>
|
||
<p>When we assign a value to a variable, we need to store that value somewhere
|
||
for when we need it later, and that’s exactly what the <em>visit_Assign</em> method does:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Assign</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
|
||
<span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">value</span>
|
||
<span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_SCOPE</span><span class="p">[</span><span class="n">var_name</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">right</span><span class="p">)</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>The method stores a key-value pair (a variable name and a value associated with the variable) in a <em>symbol table</em> GLOBAL_SCOPE.
|
||
What is a <em>symbol table</em>? A <em><strong>symbol table</strong></em> is an abstract data type (<strong><span class="caps">ADT</span></strong>) for tracking various symbols in source code.
|
||
The only symbol category we have right now is variables and we use the Python dictionary to implement the symbol table <span class="caps">ADT</span>.
|
||
For now I’ll just say that the way the symbol table is used in this article is pretty “hacky”: it’s not a separate class with
|
||
special methods but a simple Python dictionary and it also does double duty as a memory space.
|
||
In future articles, I will be talking about symbol tables in much greater detail, and together we’ll also remove all the hacks.</p>
|
||
<p>Let’s take a look at an <span class="caps">AST</span> for the statement “a := 3;” and the symbol table before and after the <em>visit_Assign</em> method does its job:</p>
|
||
<p><img alt="" src="lsbasi_part9_ast_st01.png" width="720"></p>
|
||
<p>Now let’s take a look at an <span class="caps">AST</span> for the statement “b := a + 7;”</p>
|
||
<p><img alt="" src="lsbasi_part9_ast_only_st02.png" width="280"></p>
|
||
<p>As you can see, the right-hand side of the assignment statement - “a + 7” - references
|
||
the variable ‘a’, so before we can evaluate the expression “a + 7” we need to find out
|
||
what the value of ‘a’ is and that’s the responsibility of the <em>visit_Var</em> method:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">visit_Var</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
|
||
<span class="n">var_name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span>
|
||
<span class="n">val</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">GLOBAL_SCOPE</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">var_name</span><span class="p">)</span>
|
||
<span class="k">if</span> <span class="n">val</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
|
||
<span class="k">raise</span> <span class="ne">NameError</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">var_name</span><span class="p">))</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="k">return</span> <span class="n">val</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>When the method visits a <em>Var</em> node as in the above <span class="caps">AST</span> picture, it first gets the variable’s
|
||
name and then uses that name as a key into the <em>GLOBAL_SCOPE</em> dictionary to get the variable’s
|
||
value. If it can find the value, it returns it, if not - it raises a <em>NameError</em> exception.
|
||
Here are the contents of the symbol table before evaluating the assignment statement “b := a + 7;”:</p>
|
||
<p><img alt="" src="../lsbasi-part11/lsbasi_part9_ast_st02.png" width="720"></p>
|
||
<p>These are all the changes that we need to do today to make our interpreter tick. At the end of the main program, we simply print the contents of the symbol table GLOBAL_SCOPE to standard output.</p>
|
||
<p>Let’s take our updated interpreter for a drive both from a Python interactive shell and from the command line. Make sure that you downloaded both the source code for the interpreter and the <a href="https://github.com/rspivak/lsbasi/blob/master/part9/python/assignments.txt">assignments.txt</a> file before testing:</p>
|
||
<p>Launch your Python shell:</p>
|
||
<div class="highlight"><pre><span></span>$ python
|
||
>>> from spi import Lexer, Parser, Interpreter
|
||
>>> <span class="nv">text</span> <span class="o">=</span> <span class="s2">"""\</span>
|
||
<span class="s2">... BEGIN</span>
|
||
<span class="s2">...</span>
|
||
<span class="s2">... BEGIN</span>
|
||
<span class="s2">... number := 2;</span>
|
||
<span class="s2">... a := number;</span>
|
||
<span class="s2">... b := 10 * a + 10 * number / 4;</span>
|
||
<span class="s2">... c := a - - b</span>
|
||
<span class="s2">... END;</span>
|
||
<span class="s2">...</span>
|
||
<span class="s2">... x := 11;</span>
|
||
<span class="s2">... END.</span>
|
||
<span class="s2">... """</span>
|
||
>>> <span class="nv">lexer</span> <span class="o">=</span> Lexer<span class="o">(</span>text<span class="o">)</span>
|
||
>>> <span class="nv">parser</span> <span class="o">=</span> Parser<span class="o">(</span>lexer<span class="o">)</span>
|
||
>>> <span class="nv">interpreter</span> <span class="o">=</span> Interpreter<span class="o">(</span>parser<span class="o">)</span>
|
||
>>> interpreter.interpret<span class="o">()</span>
|
||
>>> print<span class="o">(</span>interpreter.GLOBAL_SCOPE<span class="o">)</span>
|
||
<span class="o">{</span><span class="s1">'a'</span>: <span class="m">2</span>, <span class="s1">'x'</span>: <span class="m">11</span>, <span class="s1">'c'</span>: <span class="m">27</span>, <span class="s1">'b'</span>: <span class="m">25</span>, <span class="s1">'number'</span>: <span class="m">2</span><span class="o">}</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>And from the command line, using a source file as input to our interpreter:</p>
|
||
<div class="highlight"><pre><span></span>$ python spi.py assignments.txt
|
||
<span class="o">{</span><span class="s1">'a'</span>: <span class="m">2</span>, <span class="s1">'x'</span>: <span class="m">11</span>, <span class="s1">'c'</span>: <span class="m">27</span>, <span class="s1">'b'</span>: <span class="m">25</span>, <span class="s1">'number'</span>: <span class="m">2</span><span class="o">}</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>If you haven’t tried it yet, try it now and see for yourself that the interpreter is doing its job properly.</p>
|
||
<p><br/>
|
||
Let’s sum up what you had to do to extend the Pascal interpreter in this article:</p>
|
||
<ol>
|
||
<li>Add new rules to the grammar</li>
|
||
<li>Add new tokens and supporting methods to the lexer and update the <em>get_next_token</em> method</li>
|
||
<li>Add new <span class="caps">AST</span> nodes to the parser for new language constructs</li>
|
||
<li>Add new methods corresponding to the new grammar rules to our recursive-descent parser and update any existing methods, if necessary (<em>factor</em> method, I’m looking at you. :)</li>
|
||
<li>Add new visitor methods to the interpreter</li>
|
||
<li>Add a dictionary for storing variables and for looking them up</li>
|
||
</ol>
|
||
<p><br/>
|
||
In this part I had to introduce a number of “hacks” that we’ll remove as we move forward with the series:</p>
|
||
<p><img alt="" src="lsbasi_part9_hacks.png" width="720"></p>
|
||
<ol>
|
||
<li>The <em>program</em> grammar rule is incomplete. We’ll extend it later with additional elements.</li>
|
||
<li>Pascal is a statically typed language, and you must declare a variable and its type before using it. But, as you saw, that was not the case in this article.</li>
|
||
<li>No type checking so far. It’s not a big deal at this point, but I just wanted to mention it explicitly. Once we add more types to our interpreter we’ll need to report an error when you try to add a string and an integer, for example.</li>
|
||
<li>A symbol table in this part is a simple Python dictionary that does double duty as a memory space. Worry not: symbol tables are such an important topic that I’ll have several articles dedicated just to them. And memory space (runtime management) is a topic of its own.</li>
|
||
<li>In our simple calculator from previous articles, we used a forward slash character ‘/’ for denoting integer division. In Pascal, though, you have to use a keyword <em>div</em> to specify integer division (See Exercise 1).</li>
|
||
<li>There is also one hack that I introduced on purpose so that you could fix it in Exercise 2: in Pascal all reserved keywords and identifiers are case insensitive, but the interpreter in this article treats them as case sensitive.</li>
|
||
</ol>
|
||
<p><br/>
|
||
To keep you fit, here are new exercises for you:</p>
|
||
<p><img alt="" src="lsbasi_part9_exercises.png" width="320"></p>
|
||
<ol>
|
||
<li>
|
||
<p>Pascal variables and reserved keywords are case insensitive, unlike in many other programming languages, so <em><span class="caps">BEGIN</span></em>, <em>begin</em>, and <em>BeGin</em> they all refer to the same reserved keyword. Update the interpreter so that variables and reserved keywords are case insensitive. Use the following program to test it:</p>
|
||
<div class="highlight"><pre><span></span><span class="k">BEGIN</span>
|
||
|
||
<span class="k">BEGIN</span>
|
||
<span class="n">number</span> <span class="o">:=</span> <span class="mi">2</span><span class="o">;</span>
|
||
<span class="n">a</span> <span class="o">:=</span> <span class="n">NumBer</span><span class="o">;</span>
|
||
<span class="n">B</span> <span class="o">:=</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">10</span> <span class="o">*</span> <span class="n">NUMBER</span> <span class="o">/</span> <span class="mi">4</span><span class="o">;</span>
|
||
<span class="n">c</span> <span class="o">:=</span> <span class="n">a</span> <span class="o">-</span> <span class="o">-</span> <span class="n">b</span>
|
||
<span class="k">end</span><span class="o">;</span>
|
||
|
||
<span class="n">x</span> <span class="o">:=</span> <span class="mi">11</span><span class="o">;</span>
|
||
<span class="k">END</span><span class="o">.</span>
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>I mentioned in the “hacks” section before that our interpreter is using the forward slash character ‘/’ to denote integer division, but instead it should be using Pascal’s reserved keyword <em>div</em> for integer division. Update the interpreter to use the <em>div</em> keyword for integer division, thus eliminating one of the hacks.</p>
|
||
</li>
|
||
<li>
|
||
<p>Update the interpreter so that variables could also start with an underscore as in ‘_num := 5’.</p>
|
||
</li>
|
||
</ol>
|
||
<p><br/>
|
||
That’s all for today. Stay tuned and see you soon.</p>
|
||
<p><br/>
|
||
Here is a list of books I recommend that will help you in your study of interpreters and compilers:</p>
|
||
<ol>
|
||
<li>
|
||
<p><a href="http://www.amazon.com/gp/product/193435645X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=193435645X&linkCode=as2&tag=russblo0b-20&linkId=MP4DCXDV6DJMEJBL">Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (Pragmatic Programmers)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=193435645X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="http://www.amazon.com/gp/product/0321486811/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321486811&linkCode=as2&tag=russblo0b-20&linkId=GOEGDQG4HIHU56FQ">Compilers: Principles, Techniques, and Tools (2nd Edition)</a><img src="http://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=as2&o=1&a=0321486811" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></p>
|
||
</li>
|
||
</ol>
|
||
<p><br/>
|
||
<p>If you want to get my newest articles in your inbox, then enter your email address below and click "Get Updates!"</p>
|
||
|
||
<!-- Begin MailChimp Signup Form -->
|
||
<link href="https://cdn-images.mailchimp.com/embedcode/classic-081711.css"
|
||
rel="stylesheet" type="text/css">
|
||
<style type="text/css">
|
||
#mc_embed_signup {
|
||
background: #f5f5f5;
|
||
clear: left;
|
||
font: 18px Helvetica,Arial,sans-serif;
|
||
}
|
||
|
||
#mc_embed_signup form {
|
||
text-align: center;
|
||
padding: 20px 0 10px 3%;
|
||
}
|
||
|
||
#mc_embed_signup .mc-field-group input {
|
||
display: inline;
|
||
width: 40%;
|
||
}
|
||
|
||
#mc_embed_signup div.response {
|
||
width: 100%;
|
||
}
|
||
</style>
|
||
<div id="mc_embed_signup">
|
||
<form
|
||
action="https://ruslanspivak.us4.list-manage.com/subscribe/post?u=7dde30eedc045f4670430c25f&id=6f69f44e03"
|
||
method="post"
|
||
id="mc-embedded-subscribe-form"
|
||
name="mc-embedded-subscribe-form"
|
||
class="validate"
|
||
target="_blank" novalidate>
|
||
<div id="mc_embed_signup_scroll">
|
||
|
||
<div class="mc-field-group">
|
||
<label for="mce-NAME">Enter Your First Name *</label>
|
||
<input type="text" value="" name="NAME" class="required" id="mce-NAME">
|
||
</div>
|
||
<div class="mc-field-group">
|
||
<label for="mce-EMAIL">Enter Your Best Email *</label>
|
||
<input type="email" value="" name="EMAIL" class="required email" id="mce-EMAIL">
|
||
</div>
|
||
<div id="mce-responses" class="clear">
|
||
<div class="response" id="mce-error-response" style="display:none"></div>
|
||
<div class="response" id="mce-success-response" style="display:none"></div>
|
||
</div>
|
||
<!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
|
||
<div style="position: absolute; left: -5000px;"><input type="text" name="b_7dde30eedc045f4670430c25f_6f69f44e03" tabindex="-1" value=""></div>
|
||
<div class="clear"><input type="submit" value="Get Updates!" name="subscribe" id="mc-embedded-subscribe" class="button" style="background-color: rgb(63, 146, 236);"></div>
|
||
</div>
|
||
</form>
|
||
</div>
|
||
<!-- <script type='text/javascript' src='//s3.amazonaws.com/downloads.mailchimp.com/js/mc-validate.js'></script><script type='text/javascript'>(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[1]='NAME';ftypes[1]='text';fnames[0]='EMAIL';ftypes[0]='email';}(jQuery));var $mcj = jQuery.noConflict(true);</script> -->
|
||
<!--End mc_embed_signup-->
|
||
</p>
|
||
<p><br/>
|
||
<strong>All articles in this series:</strong>
|
||
|
||
<ul>
|
||
<li>
|
||
<a href="../lsbasi-part1/index.html">Let's Build A Simple Interpreter. Part 1.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part2/index.html">Let's Build A Simple Interpreter. Part 2.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part3/index.html">Let's Build A Simple Interpreter. Part 3.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part4/index.html">Let's Build A Simple Interpreter. Part 4.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part5/index.html">Let's Build A Simple Interpreter. Part 5.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part6/index.html">Let's Build A Simple Interpreter. Part 6.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part7/index.html">Let's Build A Simple Interpreter. Part 7: Abstract Syntax Trees</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part8/index.html">Let's Build A Simple Interpreter. Part 8.</a>
|
||
</li>
|
||
<li>
|
||
<a href="index.html">Let's Build A Simple Interpreter. Part 9.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part10/index.html">Let's Build A Simple Interpreter. Part 10.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part11/index.html">Let's Build A Simple Interpreter. Part 11.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part12/index.html">Let's Build A Simple Interpreter. Part 12.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part13.html">Let's Build A Simple Interpreter. Part 13: Semantic Analysis</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part14/index.html">Let's Build A Simple Interpreter. Part 14: Nested Scopes and a Source-to-Source Compiler</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part15/index.html">Let's Build A Simple Interpreter. Part 15.</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part16/index.html">Let's Build A Simple Interpreter. Part 16: Recognizing Procedure Calls</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part17.html">Let's Build A Simple Interpreter. Part 17: Call Stack and Activation Records</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part18/index.html">Let's Build A Simple Interpreter. Part 18: Executing Procedure Calls</a>
|
||
</li>
|
||
<li>
|
||
<a href="../lsbasi-part19/index.html">Let's Build A Simple Interpreter. Part 19: Nested Procedure Calls</a>
|
||
</li>
|
||
</ul>
|
||
</p>
|
||
</div>
|
||
<!-- /.entry-content -->
|
||
<hr/>
|
||
<section class="comments" id="comments">
|
||
<h2>Comments</h2>
|
||
|
||
<div id="disqus_thread"></div>
|
||
<script type="text/javascript">
|
||
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
|
||
var disqus_shortname = 'ruslanspivak'; // required: replace example with your forum shortname
|
||
|
||
var disqus_identifier = 'lets-build-a-simple-interpreter-part-9';
|
||
var disqus_url = 'https://ruslanspivak.com/lsbasi-part9/';
|
||
|
||
var disqus_config = function () {
|
||
this.language = "en";
|
||
};
|
||
|
||
/* * * DON'T EDIT BELOW THIS LINE * * */
|
||
(function () {
|
||
var dsq = document.createElement('script');
|
||
dsq.type = 'text/javascript';
|
||
dsq.async = true;
|
||
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
|
||
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
|
||
})();
|
||
</script>
|
||
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by
|
||
Disqus.</a></noscript>
|
||
<a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
|
||
|
||
</section>
|
||
</article>
|
||
</section>
|
||
|
||
</div>
|
||
<div class="col-sm-3" id="sidebar">
|
||
<aside>
|
||
|
||
<section class="well well-sm">
|
||
<ul class="list-group list-group-flush">
|
||
<li class="list-group-item"><h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
|
||
<ul class="list-group" id="social">
|
||
<li class="list-group-item"><a href="https://github.com/rspivak/"><i class="fa fa-github-square fa-lg"></i> github</a></li>
|
||
<li class="list-group-item"><a href="https://twitter.com/rspivak"><i class="fa fa-twitter-square fa-lg"></i> twitter</a></li>
|
||
<li class="list-group-item"><a href="https://linkedin.com/in/ruslanspivak/"><i class="fa fa-linkedin-square fa-lg"></i> linkedin</a></li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li class="list-group-item"><h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Popular posts</span></h4>
|
||
<ul class="list-group" id="popularposts">
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbaws-part1/index.html">
|
||
Let's Build A Web Server. Part 1.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbasi-part1/index.html">
|
||
Let's Build A Simple Interpreter. Part 1.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbaws-part2/index.html">
|
||
Let's Build A Web Server. Part 2.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbaws-part3/index.html">
|
||
Let's Build A Web Server. Part 3.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbasi-part2/index.html">
|
||
Let's Build A Simple Interpreter. Part 2.
|
||
</a>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li class="list-group-item">
|
||
<h4>
|
||
<span>Disclaimer</span>
|
||
</h4>
|
||
<p id="disclaimer-text"> Some of the links on this site
|
||
have my Amazon referral id, which provides me with a small
|
||
commission for each sale. Thank you for your support.
|
||
</p>
|
||
</li>
|
||
|
||
|
||
|
||
</ul>
|
||
</section>
|
||
</aside>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<footer>
|
||
<div class="container">
|
||
<hr>
|
||
<div class="row">
|
||
<div class="col-xs-10">© 2020 Ruslan Spivak
|
||
<!-- · Powered by <a href="https://github.com/DandyDev/pelican-bootstrap3" target="_blank">pelican-bootstrap3</a>, -->
|
||
<!-- <a href="http://docs.getpelican.com/" target="_blank">Pelican</a>, -->
|
||
<!-- <a href="http://getbootstrap.com" target="_blank">Bootstrap</a> -->
|
||
<!-- -->
|
||
</div>
|
||
<div class="col-xs-2"><p class="pull-right"><i class="fa fa-arrow-up"></i> <a href="index.html#">Back to top</a></p></div>
|
||
</div>
|
||
</div>
|
||
</footer>
|
||
<script src="../theme/js/jquery.min.js"></script>
|
||
|
||
<!-- Include all compiled plugins (below), or include individual files as needed -->
|
||
<script src="../theme/js/bootstrap.min.js"></script>
|
||
|
||
<!-- Enable responsive features in IE8 with Respond.js (https://github.com/scottjehl/Respond) -->
|
||
<script src="../theme/js/respond.min.js"></script>
|
||
|
||
<!-- Disqus -->
|
||
<script type="text/javascript">
|
||
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
|
||
var disqus_shortname = 'ruslanspivak'; // required: replace example with your forum shortname
|
||
|
||
/* * * DON'T EDIT BELOW THIS LINE * * */
|
||
(function () {
|
||
var s = document.createElement('script');
|
||
s.async = true;
|
||
s.type = 'text/javascript';
|
||
s.src = '//' + disqus_shortname + '.disqus.com/count.js';
|
||
(document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
|
||
}());
|
||
</script>
|
||
<!-- End Disqus Code -->
|
||
<!-- Google Analytics Universal -->
|
||
<script type="text/javascript">
|
||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||
|
||
ga('create', 'UA-2572871-3', 'auto');
|
||
ga('send', 'pageview');
|
||
</script>
|
||
<!-- End Google Analytics Universal Code -->
|
||
|
||
</body>
|
||
</html> |