508 lines
22 KiB
HTML
508 lines
22 KiB
HTML
|
<!DOCTYPE html>
|
||
|
<html>
|
||
|
<!-- Created by GNU Texinfo 7.1, https://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<!-- This manual documents Guile version 3.0.10.
|
||
|
|
||
|
Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,
|
||
|
Inc.
|
||
|
|
||
|
Copyright (C) 2021 Maxime Devos
|
||
|
|
||
|
Copyright (C) 2024 Tomas Volf
|
||
|
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
|
any later version published by the Free Software Foundation; with no
|
||
|
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
|
||
|
copy of the license is included in the section entitled "GNU Free
|
||
|
Documentation License." -->
|
||
|
<title>PEG Tutorial (Guile Reference Manual)</title>
|
||
|
|
||
|
<meta name="description" content="PEG Tutorial (Guile Reference Manual)">
|
||
|
<meta name="keywords" content="PEG Tutorial (Guile Reference Manual)">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content=".texi2any-real">
|
||
|
<meta name="viewport" content="width=device-width,initial-scale=1">
|
||
|
|
||
|
<link href="index.html" rel="start" title="Top">
|
||
|
<link href="Concept-Index.html" rel="index" title="Concept Index">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="PEG-Parsing.html" rel="up" title="PEG Parsing">
|
||
|
<link href="PEG-Internals.html" rel="next" title="PEG Internals">
|
||
|
<link href="PEG-API-Reference.html" rel="prev" title="PEG API Reference">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
span:hover a.copiable-link {visibility: visible}
|
||
|
-->
|
||
|
</style>
|
||
|
<link rel="stylesheet" type="text/css" href="https://www.gnu.org/software/gnulib/manual.css">
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en">
|
||
|
<div class="subsection-level-extent" id="PEG-Tutorial">
|
||
|
<div class="nav-panel">
|
||
|
<p>
|
||
|
Next: <a href="PEG-Internals.html" accesskey="n" rel="next">PEG Internals</a>, Previous: <a href="PEG-API-Reference.html" accesskey="p" rel="prev">PEG API Reference</a>, Up: <a href="PEG-Parsing.html" accesskey="u" rel="up">PEG Parsing</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<h4 class="subsection" id="PEG-Tutorial-1"><span>6.15.3 PEG Tutorial<a class="copiable-link" href="#PEG-Tutorial-1"> ¶</a></span></h4>
|
||
|
|
||
|
<h4 class="subsubheading" id="Parsing-_002fetc_002fpasswd"><span>Parsing /etc/passwd<a class="copiable-link" href="#Parsing-_002fetc_002fpasswd"> ¶</a></span></h4>
|
||
|
<p>This example will show how to parse /etc/passwd using PEGs.
|
||
|
</p>
|
||
|
<p>First we define an example /etc/passwd file:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define *etc-passwd*
|
||
|
"root:x:0:0:root:/root:/bin/bash
|
||
|
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
|
||
|
bin:x:2:2:bin:/bin:/bin/sh
|
||
|
sys:x:3:3:sys:/dev:/bin/sh
|
||
|
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
|
||
|
messagebus:x:103:107::/var/run/dbus:/bin/false
|
||
|
")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>As a first pass at this, we might want to have all the entries in
|
||
|
/etc/passwd in a list.
|
||
|
</p>
|
||
|
<p>Doing this with string-based PEG syntax would look like this:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-string-patterns
|
||
|
"passwd <- entry* !.
|
||
|
entry <-- (! NL .)* NL*
|
||
|
NL < '\n'")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>A <code class="code">passwd</code> file is 0 or more entries (<code class="code">entry*</code>) until the end
|
||
|
of the file (<code class="code">!.</code> (<code class="code">.</code> is any character, so <code class="code">!.</code> means
|
||
|
“not anything”)). We want to capture the data in the nonterminal
|
||
|
<code class="code">passwd</code>, but not tag it with the name, so we use <code class="code"><-</code>.
|
||
|
</p>
|
||
|
<p>An entry is a series of 0 or more characters that aren’t newlines
|
||
|
(<code class="code">(! NL .)*</code>) followed by 0 or more newlines (<code class="code">NL*</code>). We want
|
||
|
to tag all the entries with <code class="code">entry</code>, so we use <code class="code"><--</code>.
|
||
|
</p>
|
||
|
<p>A newline is just a literal newline (<code class="code">'\n'</code>). We don’t want a
|
||
|
bunch of newlines cluttering up the output, so we use <code class="code"><</code> to throw
|
||
|
away the captured data.
|
||
|
</p>
|
||
|
<p>Here is the same PEG defined using S-expressions:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-pattern passwd body (and (* entry) (not-followed-by peg-any)))
|
||
|
(define-peg-pattern entry all (and (* (and (not-followed-by NL) peg-any))
|
||
|
(* NL)))
|
||
|
(define-peg-pattern NL none "\n")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Obviously this is much more verbose. On the other hand, it’s more
|
||
|
explicit, and thus easier to build automatically. However, there are
|
||
|
some tricks that make S-expressions easier to use in some cases. One is
|
||
|
the <code class="code">ignore</code> keyword; the string syntax has no way to say “throw
|
||
|
away this text” except breaking it out into a separate nonterminal.
|
||
|
For instance, to throw away the newlines we had to define <code class="code">NL</code>. In
|
||
|
the S-expression syntax, we could have simply written <code class="code">(ignore
|
||
|
"\n")</code>. Also, for the cases where string syntax is really much cleaner,
|
||
|
the <code class="code">peg</code> keyword can be used to embed string syntax in
|
||
|
S-expression syntax. For instance, we could have written:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-pattern passwd body (peg "entry* !."))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>However we define it, parsing <code class="code">*etc-passwd*</code> with the <code class="code">passwd</code>
|
||
|
nonterminal yields the same results:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(peg:tree (match-pattern passwd *etc-passwd*)) ⇒
|
||
|
((entry "root:x:0:0:root:/root:/bin/bash")
|
||
|
(entry "daemon:x:1:1:daemon:/usr/sbin:/bin/sh")
|
||
|
(entry "bin:x:2:2:bin:/bin:/bin/sh")
|
||
|
(entry "sys:x:3:3:sys:/dev:/bin/sh")
|
||
|
(entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh")
|
||
|
(entry "messagebus:x:103:107::/var/run/dbus:/bin/false"))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>However, here is something to be wary of:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(peg:tree (match-pattern passwd "one entry")) ⇒
|
||
|
(entry "one entry")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>By default, the parse trees generated by PEGs are compressed as much as
|
||
|
possible without losing information. It may not look like this is what
|
||
|
you want at first, but uncompressed parse trees are an enormous headache
|
||
|
(there’s no easy way to predict how deep particular lists will nest,
|
||
|
there are empty lists littered everywhere, etc. etc.). One side-effect
|
||
|
of this, however, is that sometimes the compressor is too aggressive.
|
||
|
No information is discarded when <code class="code">((entry "one entry"))</code> is
|
||
|
compressed to <code class="code">(entry "one entry")</code>, but in this particular case it
|
||
|
probably isn’t what we want.
|
||
|
</p>
|
||
|
<p>There are two functions for easily dealing with this:
|
||
|
<code class="code">keyword-flatten</code> and <code class="code">context-flatten</code>. The
|
||
|
<code class="code">keyword-flatten</code> function takes a list of keywords and a list to
|
||
|
flatten, then tries to coerce the list such that the first element of
|
||
|
all sublists is one of the keywords. The <code class="code">context-flatten</code>
|
||
|
function is similar, but instead of a list of keywords it takes a
|
||
|
predicate that should indicate whether a given sublist is good enough
|
||
|
(refer to the API reference for more details).
|
||
|
</p>
|
||
|
<p>What we want here is <code class="code">keyword-flatten</code>.
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(keyword-flatten '(entry) (peg:tree (match-pattern passwd *etc-passwd*))) ⇒
|
||
|
((entry "root:x:0:0:root:/root:/bin/bash")
|
||
|
(entry "daemon:x:1:1:daemon:/usr/sbin:/bin/sh")
|
||
|
(entry "bin:x:2:2:bin:/bin:/bin/sh")
|
||
|
(entry "sys:x:3:3:sys:/dev:/bin/sh")
|
||
|
(entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh")
|
||
|
(entry "messagebus:x:103:107::/var/run/dbus:/bin/false"))
|
||
|
(keyword-flatten '(entry) (peg:tree (match-pattern passwd "one entry"))) ⇒
|
||
|
((entry "one entry"))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Of course, this is a somewhat contrived example. In practice we would
|
||
|
probably just tag the <code class="code">passwd</code> nonterminal to remove the ambiguity
|
||
|
(using either the <code class="code">all</code> keyword for S-expressions or the <code class="code"><--</code>
|
||
|
symbol for strings)..
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-pattern tag-passwd all (peg "entry* !."))
|
||
|
(peg:tree (match-pattern tag-passwd *etc-passwd*)) ⇒
|
||
|
(tag-passwd
|
||
|
(entry "root:x:0:0:root:/root:/bin/bash")
|
||
|
(entry "daemon:x:1:1:daemon:/usr/sbin:/bin/sh")
|
||
|
(entry "bin:x:2:2:bin:/bin:/bin/sh")
|
||
|
(entry "sys:x:3:3:sys:/dev:/bin/sh")
|
||
|
(entry "nobody:x:65534:65534:nobody:/nonexistent:/bin/sh")
|
||
|
(entry "messagebus:x:103:107::/var/run/dbus:/bin/false"))
|
||
|
(peg:tree (match-pattern tag-passwd "one entry"))
|
||
|
(tag-passwd
|
||
|
(entry "one entry"))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>If you’re ever uncertain about the potential results of parsing
|
||
|
something, remember the two absolute rules:
|
||
|
</p><ol class="enumerate">
|
||
|
<li> No parsing information will ever be discarded.
|
||
|
</li><li> There will never be any lists with fewer than 2 elements.
|
||
|
</li></ol>
|
||
|
|
||
|
<p>For the purposes of (1), "parsing information" means things tagged with
|
||
|
the <code class="code">any</code> keyword or the <code class="code"><--</code> symbol. Plain strings will be
|
||
|
concatenated.
|
||
|
</p>
|
||
|
<p>Let’s extend this example a bit more and actually pull some useful
|
||
|
information out of the passwd file:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-string-patterns
|
||
|
"passwd <-- entry* !.
|
||
|
entry <-- login C pass C uid C gid C nameORcomment C homedir C shell NL*
|
||
|
login <-- text
|
||
|
pass <-- text
|
||
|
uid <-- [0-9]*
|
||
|
gid <-- [0-9]*
|
||
|
nameORcomment <-- text
|
||
|
homedir <-- path
|
||
|
shell <-- path
|
||
|
path <-- (SLASH pathELEMENT)*
|
||
|
pathELEMENT <-- (!NL !C !'/' .)*
|
||
|
text <- (!NL !C .)*
|
||
|
C < ':'
|
||
|
NL < '\n'
|
||
|
SLASH < '/'")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>This produces rather pretty parse trees:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(passwd
|
||
|
(entry (login "root")
|
||
|
(pass "x")
|
||
|
(uid "0")
|
||
|
(gid "0")
|
||
|
(nameORcomment "root")
|
||
|
(homedir (path (pathELEMENT "root")))
|
||
|
(shell (path (pathELEMENT "bin") (pathELEMENT "bash"))))
|
||
|
(entry (login "daemon")
|
||
|
(pass "x")
|
||
|
(uid "1")
|
||
|
(gid "1")
|
||
|
(nameORcomment "daemon")
|
||
|
(homedir
|
||
|
(path (pathELEMENT "usr") (pathELEMENT "sbin")))
|
||
|
(shell (path (pathELEMENT "bin") (pathELEMENT "sh"))))
|
||
|
(entry (login "bin")
|
||
|
(pass "x")
|
||
|
(uid "2")
|
||
|
(gid "2")
|
||
|
(nameORcomment "bin")
|
||
|
(homedir (path (pathELEMENT "bin")))
|
||
|
(shell (path (pathELEMENT "bin") (pathELEMENT "sh"))))
|
||
|
(entry (login "sys")
|
||
|
(pass "x")
|
||
|
(uid "3")
|
||
|
(gid "3")
|
||
|
(nameORcomment "sys")
|
||
|
(homedir (path (pathELEMENT "dev")))
|
||
|
(shell (path (pathELEMENT "bin") (pathELEMENT "sh"))))
|
||
|
(entry (login "nobody")
|
||
|
(pass "x")
|
||
|
(uid "65534")
|
||
|
(gid "65534")
|
||
|
(nameORcomment "nobody")
|
||
|
(homedir (path (pathELEMENT "nonexistent")))
|
||
|
(shell (path (pathELEMENT "bin") (pathELEMENT "sh"))))
|
||
|
(entry (login "messagebus")
|
||
|
(pass "x")
|
||
|
(uid "103")
|
||
|
(gid "107")
|
||
|
nameORcomment
|
||
|
(homedir
|
||
|
(path (pathELEMENT "var")
|
||
|
(pathELEMENT "run")
|
||
|
(pathELEMENT "dbus")))
|
||
|
(shell (path (pathELEMENT "bin") (pathELEMENT "false")))))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Notice that when there’s no entry in a field (e.g. <code class="code">nameORcomment</code>
|
||
|
for messagebus) the symbol is inserted. This is the “don’t throw away
|
||
|
any information” rule—we successfully matched a <code class="code">nameORcomment</code>
|
||
|
of 0 characters (since we used <code class="code">*</code> when defining it). This is
|
||
|
usually what you want, because it allows you to e.g. use <code class="code">list-ref</code>
|
||
|
to pull out elements (since they all have known offsets).
|
||
|
</p>
|
||
|
<p>If you’d prefer not to have symbols for empty matches, you can replace
|
||
|
the <code class="code">*</code> with a <code class="code">+</code> and add a <code class="code">?</code> after the
|
||
|
<code class="code">nameORcomment</code> in <code class="code">entry</code>. Then it will try to parse 1 or
|
||
|
more characters, fail (inserting nothing into the parse tree), but
|
||
|
continue because it didn’t have to match the nameORcomment to continue.
|
||
|
</p>
|
||
|
|
||
|
<h4 class="subsubheading" id="Embedding-Arithmetic-Expressions"><span>Embedding Arithmetic Expressions<a class="copiable-link" href="#Embedding-Arithmetic-Expressions"> ¶</a></span></h4>
|
||
|
|
||
|
<p>We can parse simple mathematical expressions with the following PEG:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-string-patterns
|
||
|
"expr <- sum
|
||
|
sum <-- (product ('+' / '-') sum) / product
|
||
|
product <-- (value ('*' / '/') product) / value
|
||
|
value <-- number / '(' expr ')'
|
||
|
number <-- [0-9]+")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Then:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(peg:tree (match-pattern expr "1+1/2*3+(1+1)/2")) ⇒
|
||
|
(sum (product (value (number "1")))
|
||
|
"+"
|
||
|
(sum (product
|
||
|
(value (number "1"))
|
||
|
"/"
|
||
|
(product
|
||
|
(value (number "2"))
|
||
|
"*"
|
||
|
(product (value (number "3")))))
|
||
|
"+"
|
||
|
(sum (product
|
||
|
(value "("
|
||
|
(sum (product (value (number "1")))
|
||
|
"+"
|
||
|
(sum (product (value (number "1")))))
|
||
|
")")
|
||
|
"/"
|
||
|
(product (value (number "2")))))))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>There is very little wasted effort in this PEG. The <code class="code">number</code>
|
||
|
nonterminal has to be tagged because otherwise the numbers might run
|
||
|
together with the arithmetic expressions during the string concatenation
|
||
|
stage of parse-tree compression (the parser will see “1” followed by
|
||
|
“/” and decide to call it “1/”). When in doubt, tag.
|
||
|
</p>
|
||
|
<p>It is very easy to turn these parse trees into lisp expressions:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define (parse-sum sum left . rest)
|
||
|
(if (null? rest)
|
||
|
(apply parse-product left)
|
||
|
(list (string->symbol (car rest))
|
||
|
(apply parse-product left)
|
||
|
(apply parse-sum (cadr rest)))))
|
||
|
|
||
|
(define (parse-product product left . rest)
|
||
|
(if (null? rest)
|
||
|
(apply parse-value left)
|
||
|
(list (string->symbol (car rest))
|
||
|
(apply parse-value left)
|
||
|
(apply parse-product (cadr rest)))))
|
||
|
|
||
|
(define (parse-value value first . rest)
|
||
|
(if (null? rest)
|
||
|
(string->number (cadr first))
|
||
|
(apply parse-sum (car rest))))
|
||
|
|
||
|
(define parse-expr parse-sum)
|
||
|
</pre></div>
|
||
|
|
||
|
<p>(Notice all these functions look very similar; for a more complicated
|
||
|
PEG, it would be worth abstracting.)
|
||
|
</p>
|
||
|
<p>Then:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(apply parse-expr (peg:tree (match-pattern expr "1+1/2*3+(1+1)/2"))) ⇒
|
||
|
(+ 1 (+ (/ 1 (* 2 3)) (/ (+ 1 1) 2)))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>But wait! The associativity is wrong! Where it says <code class="code">(/ 1 (* 2
|
||
|
3))</code>, it should say <code class="code">(* (/ 1 2) 3)</code>.
|
||
|
</p>
|
||
|
<p>It’s tempting to try replacing e.g. <code class="code">"sum <-- (product ('+' / '-')
|
||
|
sum) / product"</code> with <code class="code">"sum <-- (sum ('+' / '-') product) /
|
||
|
product"</code>, but this is a Bad Idea. PEGs don’t support left recursion.
|
||
|
To see why, imagine what the parser will do here. When it tries to
|
||
|
parse <code class="code">sum</code>, it first has to try and parse <code class="code">sum</code>. But to do
|
||
|
that, it first has to try and parse <code class="code">sum</code>. This will continue
|
||
|
until the stack gets blown off.
|
||
|
</p>
|
||
|
<p>So how does one parse left-associative binary operators with PEGs?
|
||
|
Honestly, this is one of their major shortcomings. There’s no
|
||
|
general-purpose way of doing this, but here the repetition operators are
|
||
|
a good choice:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(use-modules (srfi srfi-1))
|
||
|
|
||
|
(define-peg-string-patterns
|
||
|
"expr <- sum
|
||
|
sum <-- (product ('+' / '-'))* product
|
||
|
product <-- (value ('*' / '/'))* value
|
||
|
value <-- number / '(' expr ')'
|
||
|
number <-- [0-9]+")
|
||
|
|
||
|
;; take a deep breath...
|
||
|
(define (make-left-parser next-func)
|
||
|
(lambda (sum first . rest) ;; general form, comments below assume
|
||
|
;; that we're dealing with a sum expression
|
||
|
(if (null? rest) ;; form (sum (product ...))
|
||
|
(apply next-func first)
|
||
|
(if (string? (cadr first));; form (sum ((product ...) "+") (product ...))
|
||
|
(list (string->symbol (cadr first))
|
||
|
(apply next-func (car first))
|
||
|
(apply next-func (car rest)))
|
||
|
;; form (sum (((product ...) "+") ((product ...) "+")) (product ...))
|
||
|
(car
|
||
|
(reduce ;; walk through the list and build a left-associative tree
|
||
|
(lambda (l r)
|
||
|
(list (list (cadr r) (car r) (apply next-func (car l)))
|
||
|
(string->symbol (cadr l))))
|
||
|
'ignore
|
||
|
(append ;; make a list of all the products
|
||
|
;; the first one should be pre-parsed
|
||
|
(list (list (apply next-func (caar first))
|
||
|
(string->symbol (cadar first))))
|
||
|
(cdr first)
|
||
|
;; the last one has to be added in
|
||
|
(list (append rest '("done"))))))))))
|
||
|
|
||
|
(define (parse-value value first . rest)
|
||
|
(if (null? rest)
|
||
|
(string->number (cadr first))
|
||
|
(apply parse-sum (car rest))))
|
||
|
(define parse-product (make-left-parser parse-value))
|
||
|
(define parse-sum (make-left-parser parse-product))
|
||
|
(define parse-expr parse-sum)
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Then:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(apply parse-expr (peg:tree (match-pattern expr "1+1/2*3+(1+1)/2"))) ⇒
|
||
|
(+ (+ 1 (* (/ 1 2) 3)) (/ (+ 1 1) 2))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>As you can see, this is much uglier (it could be made prettier by using
|
||
|
<code class="code">context-flatten</code>, but the way it’s written above makes it clear
|
||
|
how we deal with the three ways the zero-or-more <code class="code">*</code> expression can
|
||
|
parse). Fortunately, most of the time we can get away with only using
|
||
|
right-associativity.
|
||
|
</p>
|
||
|
<h4 class="subsubheading" id="Simplified-Functions"><span>Simplified Functions<a class="copiable-link" href="#Simplified-Functions"> ¶</a></span></h4>
|
||
|
|
||
|
<p>For a more tantalizing example, consider the following grammar that
|
||
|
parses (highly) simplified C functions:
|
||
|
</p>
|
||
|
<div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(define-peg-string-patterns
|
||
|
"cfunc <-- cSP ctype cSP cname cSP cargs cLB cSP cbody cRB
|
||
|
ctype <-- cidentifier
|
||
|
cname <-- cidentifier
|
||
|
cargs <-- cLP (! (cSP cRP) carg cSP (cCOMMA / cRP) cSP)* cSP
|
||
|
carg <-- cSP ctype cSP cname
|
||
|
cbody <-- cstatement *
|
||
|
cidentifier <- [a-zA-z][a-zA-Z0-9_]*
|
||
|
cstatement <-- (!';'.)*cSC cSP
|
||
|
cSC < ';'
|
||
|
cCOMMA < ','
|
||
|
cLP < '('
|
||
|
cRP < ')'
|
||
|
cLB < '{'
|
||
|
cRB < '}'
|
||
|
cSP < [ \t\n]*")
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Then:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(match-pattern cfunc "int square(int a) { return a*a;}") ⇒
|
||
|
(32
|
||
|
(cfunc (ctype "int")
|
||
|
(cname "square")
|
||
|
(cargs (carg (ctype "int") (cname "a")))
|
||
|
(cbody (cstatement "return a*a"))))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>And:
|
||
|
</p><div class="example lisp">
|
||
|
<pre class="lisp-preformatted">(match-pattern cfunc "int mod(int a, int b) { int c = a/b;return a-b*c; }") ⇒
|
||
|
(52
|
||
|
(cfunc (ctype "int")
|
||
|
(cname "mod")
|
||
|
(cargs (carg (ctype "int") (cname "a"))
|
||
|
(carg (ctype "int") (cname "b")))
|
||
|
(cbody (cstatement "int c = a/b")
|
||
|
(cstatement "return a- b*c"))))
|
||
|
</pre></div>
|
||
|
|
||
|
<p>By wrapping all the <code class="code">carg</code> nonterminals in a <code class="code">cargs</code>
|
||
|
nonterminal, we were able to remove any ambiguity in the parsing
|
||
|
structure and avoid having to call <code class="code">context-flatten</code> on the output
|
||
|
of <code class="code">match-pattern</code>. We used the same trick with the <code class="code">cstatement</code>
|
||
|
nonterminals, wrapping them in a <code class="code">cbody</code> nonterminal.
|
||
|
</p>
|
||
|
<p>The whitespace nonterminal <code class="code">cSP</code> used here is a (very) useful
|
||
|
instantiation of a common pattern for matching syntactically irrelevant
|
||
|
information. Since it’s tagged with <code class="code"><</code> and ends with <code class="code">*</code> it
|
||
|
won’t clutter up the parse trees (all the empty lists will be discarded
|
||
|
during the compression step) and it will never cause parsing to fail.
|
||
|
</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<div class="nav-panel">
|
||
|
<p>
|
||
|
Next: <a href="PEG-Internals.html">PEG Internals</a>, Previous: <a href="PEG-API-Reference.html">PEG API Reference</a>, Up: <a href="PEG-Parsing.html">PEG Parsing</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|