610 lines
No EOL
42 KiB
HTML
610 lines
No EOL
42 KiB
HTML
<HTML><HEAD><TITLE>Practical: An HTML Generation Library, the Interpreter</TITLE><LINK REL="stylesheet" TYPE="text/css" HREF="style.css"/></HEAD><BODY><DIV CLASS="copyright">Copyright © 2003-2005, Peter Seibel</DIV><H1>30. Practical: An HTML Generation Library, the Interpreter</H1><P>In this chapter and the next you'll take a look under the hood of the
|
|
FOO HTML generator that you've been using in the past few chapters.
|
|
FOO is an example of a kind of programming that's quite common in
|
|
Common Lisp and relatively uncommon in non-Lisp languages, namely,
|
|
<I>language-oriented</I> programming. Rather than provide an API built
|
|
primarily out of functions, classes, and macros, FOO provides language
|
|
processors for a domain-specific language that you can embed in your
|
|
Common Lisp programs.</P><P>FOO provides two language processors for the same s-expression
|
|
language. One is an interpreter that takes a FOO "program" as data
|
|
and interprets it to generate HTML. The other is a compiler that
|
|
compiles FOO expressions, possibly with embedded Common Lisp code,
|
|
into Common Lisp that generates HTML and runs the embedded code. The
|
|
interpreter is exposed as the function <CODE>emit-html</CODE> and the
|
|
compiler as the macro <CODE>html</CODE>, which you used in previous
|
|
chapters.</P><P>In this chapter you'll look at some of the infrastructure shared
|
|
between the interpreter and the compiler and then at the
|
|
implementation of the interpreter. In the next chapter, I'll show you
|
|
how the compiler works.</P><A NAME="designing-a-domain-specific-language"><H2>Designing a Domain-Specific Language</H2></A><P>Designing an embedded language requires two steps: first, design the
|
|
language that'll allow you to express the things you want to express,
|
|
and second, implement a processor, or processors, that accepts a
|
|
"program" in that language and either performs the actions indicated
|
|
by the program or translates the program into Common Lisp code
|
|
that'll perform equivalent behaviors.</P><P>So, step one is to design the HTML-generating language. The key to
|
|
designing a good domain-specific language is to strike the right
|
|
balance between expressiveness and concision. For instance, a highly
|
|
expressive but not very concise "language" for generating HTML is the
|
|
language of literal HTML strings. The legal "forms" of this language
|
|
are strings containing literal HTML. Language processors for this
|
|
"language" could process such forms by simply emitting them as-is.</P><PRE>(defvar *html-output* *standard-output*)
|
|
|
|
(defun emit-html (html)
|
|
"An interpreter for the literal HTML language."
|
|
(write-sequence html *html-output*))
|
|
|
|
(defmacro html (html)
|
|
"A compiler for the literal HTML language."
|
|
`(write-sequence ,html *html-output*))</PRE><P>This "language" is highly expressive since it can express <I>any</I>
|
|
HTML you could possibly want to generate.<SUP>1</SUP> On the other hand, this language doesn't win a lot of
|
|
points for its concision because it gives you zero compression--its
|
|
input <I>is</I> its output.</P><P>To design a language that gives you some useful compression without
|
|
sacrificing too much expressiveness, you need to identify the details
|
|
of the output that are either redundant or uninteresting. You can
|
|
then make those aspects of the output implicit in the semantics of
|
|
the language.</P><P>For instance, because of the structure of HTML, every opening tag is
|
|
paired with a matching closing tag.<SUP>2</SUP> When you write
|
|
HTML by hand, you have to write those closing tags, but you can
|
|
improve the concision of your HTML-generating language by making the
|
|
closing tags implicit.</P><P>Another way you can gain concision at a slight cost in expressiveness
|
|
is to make the language processors responsible for adding appropriate
|
|
whitespace between elements--blank lines and indentation. When you're
|
|
generating HTML programmatically, you typically don't care much about
|
|
which elements have line breaks before or after them or about whether
|
|
different elements are indented relative to their parent elements.
|
|
Letting the language processor insert whitespace according to some
|
|
rule means you don't have to worry about it. As it turns out, FOO
|
|
actually supports two modes--one that uses the minimum amount of
|
|
whitespace, which allows it to generate extremely efficient code and
|
|
compact HTML, and another that generates nicely formatted HTML with
|
|
different elements indented and separated from other elements
|
|
according to their role.</P><P>Another detail that's best moved into the language processor is the
|
|
escaping of certain characters that have a special meaning in HTML
|
|
such as <CODE><</CODE>, <CODE>></CODE>, and <CODE>&</CODE>. Obviously, if you generate
|
|
HTML by just printing strings to a stream, then it's up to you to
|
|
replace any occurrences of those characters in the string with the
|
|
appropriate escape sequences, <CODE>&lt;</CODE>, <CODE>&gt;</CODE> and
|
|
<CODE>&amp;</CODE>. But if the language processor can know which strings
|
|
are to be emitted as element data, then it can take care of
|
|
automatically escaping those characters for you.</P><A NAME="the-foo-language"><H2>The FOO Language</H2></A><P>So, enough theory. I'll give you a quick overview of the language
|
|
implemented by FOO, and then you'll look at the implementation of the
|
|
two FOO language processors--the interpreter, in this chapter, and
|
|
the compiler, in the next.</P><P>Like Lisp itself, the basic syntax of the FOO language is defined in
|
|
terms of forms made up of Lisp objects. The language defines how each
|
|
legal FOO form is translated into HTML.</P><P>The simplest FOO forms are self-evaluating Lisp objects such as
|
|
strings, numbers, and keyword symbols.<SUP>3</SUP> You'll need a function <CODE>self-evaluating-p</CODE> that
|
|
tests whether a given object is self-evaluating for FOO's purposes.</P><PRE>(defun self-evaluating-p (form)
|
|
(and (atom form) (if (symbolp form) (keywordp form) t)))</PRE><P>Objects that satisfy this predicate will be emitted by converting
|
|
them to strings with <CODE><B>PRINC-TO-STRING</B></CODE> and then escaping any
|
|
reserved characters, such as <CODE><</CODE>, <CODE>></CODE>, or <CODE>&</CODE>. When
|
|
the value is being emitted as an attribute, the characters <CODE>"</CODE>,
|
|
and <CODE>'</CODE> are also escaped. Thus, you can invoke the <CODE>html</CODE>
|
|
macro on a self-evaluating object to emit it to <CODE>*html-output*</CODE>
|
|
(which is initially bound to <CODE><B>*STANDARD-OUTPUT*</B></CODE>). Table 30-1
|
|
shows how a few different self-evaluating values will be output.</P><P><DIV CLASS="table-caption">Table 30-1. FOO Output for Self-Evaluating Objects</DIV></P><TABLE CLASS="book-table"><TR><TD>FOO Form</TD><TD>Generated HTML</TD></TR><TR><TD><CODE>"foo"</CODE></TD><TD><CODE>foo</CODE></TD></TR><TR><TD><CODE>10</CODE></TD><TD><CODE>10</CODE></TD></TR><TR><TD><CODE>:foo</CODE></TD><TD><CODE>FOO</CODE></TD></TR><TR><TD><CODE>"foo & bar"</CODE></TD><TD><CODE>foo &amp; bar</CODE></TD></TR></TABLE><P>Of course, most HTML consists of tagged elements. The three pieces of
|
|
information that describe each element are the tag, a set of
|
|
attributes, and a body containing text and/or more HTML elements.
|
|
Thus, you need a way to represent these three pieces of information
|
|
as Lisp objects, preferably ones that the Lisp reader already knows
|
|
how to read.<SUP>4</SUP> If you forget about attributes for a moment, there's an
|
|
obvious mapping between Lisp lists and HTML elements: any HTML
|
|
element can be represented by a list whose <CODE><B>FIRST</B></CODE> is a symbol
|
|
where the name is the name of the element's tag and whose <CODE><B>REST</B></CODE>
|
|
is a list of self-evaluating objects or lists representing other HTML
|
|
elements. Thus:</P><PRE><p>Foo</p> <==> (:p "Foo")
|
|
|
|
<p><i>Now</i> is the time</p> <==> (:p (:i "Now") " is the time")</PRE><P>Now the only problem is where to squeeze in the attributes. Since
|
|
most elements have no attributes, it'd be nice if you could use the
|
|
preceding syntax for elements without attributes. FOO provides two
|
|
ways to notate elements with attributes. The first is to simply
|
|
include the attributes in the list immediately following the symbol,
|
|
alternating keyword symbols naming the attributes and objects
|
|
representing the attribute value forms. The body of the element
|
|
starts with the first item in the list that's in a position to be an
|
|
attribute name and isn't a keyword symbol. Thus:</P><PRE>HTML> (html (:p "foo"))
|
|
<p>foo</p>
|
|
NIL
|
|
HTML> (html (:p "foo " (:i "bar") " baz"))
|
|
<p>foo <i>bar</i> baz</p>
|
|
NIL
|
|
HTML> (html (:p :style "foo" "Foo"))
|
|
<p style='foo'>Foo</p>
|
|
NIL
|
|
HTML> (html (:p :id "x" :style "foo" "Foo"))
|
|
<p id='x' style='foo'>Foo</p>
|
|
NIL</PRE><P>For folks who prefer a bit more obvious delineation between the
|
|
element's attributes and its body, FOO supports an alternative
|
|
syntax: if the first element of a list is itself a list with a
|
|
keyword as <I>its</I> first element, then the outer list represents an
|
|
HTML element with that keyword indicating the tag, with the <CODE><B>REST</B></CODE>
|
|
of the nested list as the attributes, and with the <CODE><B>REST</B></CODE> of the
|
|
outer list as the body. Thus, you could write the previous two
|
|
expressions like this:</P><PRE>HTML> (html ((:p :style "foo") "Foo"))
|
|
<p style='foo'>Foo</p>
|
|
NIL
|
|
HTML> (html ((:p :id "x" :style "foo") "Foo"))
|
|
<p id='x' style='foo'>Foo</p>
|
|
NIL</PRE><P>The following function tests whether a given object matches either of
|
|
these syntaxes:</P><PRE>(defun cons-form-p (form &optional (test #'keywordp))
|
|
(and (consp form)
|
|
(or (funcall test (car form))
|
|
(and (consp (car form)) (funcall test (caar form))))))</PRE><P>You should parameterize the <CODE>test</CODE> function because later you'll
|
|
need to test the same two syntaxes with a slightly different predicate
|
|
on the name.</P><P>To completely abstract the differences between the two syntax
|
|
variants, you can define a function, <CODE>parse-cons-form</CODE>, that
|
|
takes a form and parses it into three elements, the tag, the
|
|
attributes plist, and the body list, returning them as multiple
|
|
values. The code that actually evaluates cons forms will use this
|
|
function and not have to worry about which syntax was used.</P><PRE>(defun parse-cons-form (sexp)
|
|
(if (consp (first sexp))
|
|
(parse-explicit-attributes-sexp sexp)
|
|
(parse-implicit-attributes-sexp sexp)))
|
|
|
|
(defun parse-explicit-attributes-sexp (sexp)
|
|
(destructuring-bind ((tag &rest attributes) &body body) sexp
|
|
(values tag attributes body)))
|
|
|
|
(defun parse-implicit-attributes-sexp (sexp)
|
|
(loop with tag = (first sexp)
|
|
for rest on (rest sexp) by #'cddr
|
|
while (and (keywordp (first rest)) (second rest))
|
|
when (second rest)
|
|
collect (first rest) into attributes and
|
|
collect (second rest) into attributes
|
|
end
|
|
finally (return (values tag attributes rest))))</PRE><P>Now that you have the basic language specified, you can think about
|
|
how you're actually going to implement the language processors. How
|
|
do you get from a series of FOO forms to the desired HTML? As I
|
|
mentioned previously, you'll be implementing two language processors
|
|
for FOO: an interpreter that walks a tree of FOO forms and emits the
|
|
corresponding HTML directly and a compiler that walks a tree and
|
|
translates it into Common Lisp code that'll emit the same HTML. Both
|
|
the interpreter and compiler will be built on top of a common
|
|
foundation of code, which provides support for things such as
|
|
escaping reserved characters and generating nicely indented output,
|
|
so it makes sense to start there.</P><A NAME="character-escaping"><H2>Character Escaping</H2></A><P>The first bit of the foundation you'll need to lay is the code that
|
|
knows how to escape characters with a special meaning in HTML. There
|
|
are three such characters, and they must not appear in the text of an
|
|
element or in an attribute value; they are <CODE><</CODE>, <CODE>></CODE>, and
|
|
<CODE>&</CODE>. In element text or attribute values, these characters must
|
|
be replaced with the <I>character reference entities</I> <CODE>&lt;</CODE>,
|
|
<CODE>&gt</CODE>;, and <CODE>&amp;</CODE>. Similarly, in attribute values, the
|
|
quotation marks used to delimit the value must be escaped, <CODE>'</CODE>
|
|
with <CODE>&apos;</CODE> and <CODE>"</CODE> with <CODE>&quot;</CODE>. Additionally, any
|
|
character can be represented by a numeric character reference entity
|
|
consisting of an ampersand, followed by a sharp sign, followed by the
|
|
numeric code as a base 10 integer, and followed by a semicolon. These
|
|
numeric escapes are sometimes used to embed non-ASCII characters in
|
|
HTML.</P><DIV CLASS="sidebarhead">The Package</DIV><DIV CLASS="sidebar"><P>Since FOO is a low-level library, the package you develop it
|
|
in doesn't rely on much external code--just the usual dependency on
|
|
names from the <CODE>COMMON-LISP</CODE> package and, almost as usual, on
|
|
the names of the macro-writing macros from
|
|
<CODE>COM.GIGAMONKEYS.MACRO-UTILITIES</CODE>. On the other hand, the
|
|
package needs to export all the names needed by code that uses FOO.
|
|
Here's the <CODE><B>DEFPACKAGE</B></CODE> from the source that you can download from
|
|
the book's Web site:</P><PRE>(defpackage :com.gigamonkeys.html
|
|
(:use :common-lisp :com.gigamonkeys.macro-utilities)
|
|
(:export :with-html-output
|
|
:in-html-style
|
|
:define-html-macro
|
|
:html
|
|
:emit-html
|
|
:&attributes))</PRE></DIV><P>The following function accepts a single character and returns a string
|
|
containing a character reference entity for that character:</P><PRE>(defun escape-char (char)
|
|
(case char
|
|
(#\& "&amp;")
|
|
(#\< "&lt;")
|
|
(#\> "&gt;")
|
|
(#\' "&apos;")
|
|
(#\" "&quot;")
|
|
(t (format nil "&#~d;" (char-code char)))))</PRE><P>You can use this function as the basis for a function, <CODE>escape</CODE>,
|
|
that takes a string and a sequence of characters and returns a copy of
|
|
the first argument with all occurrences of the characters in the
|
|
second argument replaced with the corresponding character entity
|
|
returned by <CODE>escape-char</CODE>.</P><PRE>(defun escape (in to-escape)
|
|
(flet ((needs-escape-p (char) (find char to-escape)))
|
|
(with-output-to-string (out)
|
|
(loop for start = 0 then (1+ pos)
|
|
for pos = (position-if #'needs-escape-p in :start start)
|
|
do (write-sequence in out :start start :end pos)
|
|
when pos do (write-sequence (escape-char (char in pos)) out)
|
|
while pos))))</PRE><P>You can also define two parameters: <CODE>*element-escapes*</CODE>, which
|
|
contains the characters you need to escape in normal element data,
|
|
and <CODE>*attribute-escapes*</CODE>, which contains the set of characters
|
|
to be escaped in attribute values.</P><PRE>(defparameter *element-escapes* "<>&")
|
|
(defparameter *attribute-escapes* "<>&\"'")</PRE><P>Here are some examples:</P><PRE>HTML> (escape "foo & bar" *element-escapes*)
|
|
"foo &amp; bar"
|
|
HTML> (escape "foo & 'bar'" *element-escapes*)
|
|
"foo &amp; 'bar'"
|
|
HTML> (escape "foo & 'bar'" *attribute-escapes*)
|
|
"foo &amp; &apos;bar&apos;"</PRE><P>Finally, you'll need a variable, <CODE>*escapes*</CODE>, that will be bound
|
|
to the set of characters that need to be escaped. It's initially set
|
|
to the value of <CODE>*element-escapes*</CODE>, but when generating
|
|
attributes, it will, as you'll see, be rebound to the value of
|
|
<CODE>*attribute-escapes*</CODE>.</P><PRE>(defvar *escapes* *element-escapes*)</PRE><A NAME="indenting-printer"><H2>Indenting Printer</H2></A><P>To handle generating nicely indented output, you can define a class
|
|
<CODE>indenting-printer</CODE>, which wraps around an output stream, and
|
|
functions that use an instance of that class to emit strings to the
|
|
stream while keeping track of when it's at the beginning of the line.
|
|
The class looks like this:</P><PRE>(defclass indenting-printer ()
|
|
((out :accessor out :initarg :out)
|
|
(beginning-of-line-p :accessor beginning-of-line-p :initform t)
|
|
(indentation :accessor indentation :initform 0)
|
|
(indenting-p :accessor indenting-p :initform t)))</PRE><P>The main function that operates on <CODE>indenting-printer</CODE>s is
|
|
<CODE>emit</CODE>, which takes the printer and a string and emits the
|
|
string to the printer's output stream, keeping track of when it emits
|
|
a newline so it can reset the <CODE>beginning-of-line-p</CODE> slot.</P><PRE>(defun emit (ip string)
|
|
(loop for start = 0 then (1+ pos)
|
|
for pos = (position #\Newline string :start start)
|
|
do (emit/no-newlines ip string :start start :end pos)
|
|
when pos do (emit-newline ip)
|
|
while pos))</PRE><P>To actually emit the string, it uses the function
|
|
<CODE>emit/no-newlines</CODE>, which emits any needed indentation, via the
|
|
helper <CODE>indent-if-necessary</CODE>, and then writes the string to the
|
|
stream. This function can also be called directly by other code to
|
|
emit a string that's known not to contain any newlines.</P><PRE>(defun emit/no-newlines (ip string &key (start 0) end)
|
|
(indent-if-necessary ip)
|
|
(write-sequence string (out ip) :start start :end end)
|
|
(unless (zerop (- (or end (length string)) start))
|
|
(setf (beginning-of-line-p ip) nil)))</PRE><P>The helper <CODE>indent-if-necessary</CODE> checks
|
|
<CODE>beginning-of-line-p</CODE> and <CODE>indenting-p</CODE> to determine
|
|
whether it needs to emit indentation and, if they're both true, emits
|
|
as many spaces as indicated by the value of <CODE>indentation</CODE>. Code
|
|
that uses the <CODE>indenting-printer</CODE> can control the indentation by
|
|
manipulating the <CODE>indentation</CODE> and <CODE>indenting-p</CODE> slots.
|
|
Incrementing and decrementing <CODE>indentation</CODE> changes the number
|
|
of leading spaces, while setting <CODE>indenting-p</CODE> to <CODE><B>NIL</B></CODE> can
|
|
temporarily turn off indentation.</P><PRE>(defun indent-if-necessary (ip)
|
|
(when (and (beginning-of-line-p ip) (indenting-p ip))
|
|
(loop repeat (indentation ip) do (write-char #\Space (out ip)))
|
|
(setf (beginning-of-line-p ip) nil)))</PRE><P>The last two functions in the <CODE>indenting-printer</CODE> API are
|
|
<CODE>emit-newline</CODE> and <CODE>emit-freshline</CODE>, which are both used to
|
|
emit a newline character, similar to the <CODE>~%</CODE> and <CODE>~&</CODE>
|
|
<CODE><B>FORMAT</B></CODE> directives. That is, the only difference is that
|
|
<CODE>emit-newline</CODE> always emits a newline, while
|
|
<CODE>emit-freshline</CODE> does so only if <CODE>beginning-of-line-p</CODE> is
|
|
false. Thus, multiple calls to <CODE>emit-freshline</CODE> without any
|
|
intervening <CODE>emit</CODE>s won't result in a blank line. This is handy
|
|
when one piece of code wants to generate some output that should end
|
|
with a newline while another piece of code wants to generate some
|
|
output that should start on a newline but you don't want a blank line
|
|
between the two bits of output.</P><PRE>(defun emit-newline (ip)
|
|
(write-char #\Newline (out ip))
|
|
(setf (beginning-of-line-p ip) t))
|
|
|
|
(defun emit-freshline (ip)
|
|
(unless (beginning-of-line-p ip) (emit-newline ip)))</PRE><P>With those preliminaries out of the way, you're ready to get to the
|
|
guts of the FOO processor.</P><A NAME="html-processor-interface"><H2>HTML Processor Interface</H2></A><P>Now you're ready to define the interface that'll be used by the FOO
|
|
language processor to emit HTML. You can define this interface as a
|
|
set of generic functions because you'll need two implementations--one
|
|
that actually emits HTML and another that the <CODE>html</CODE> macro can
|
|
use to collect a list of actions that need to be performed, which can
|
|
then be optimized and compiled into code that emits the same output
|
|
in a more efficient way. I'll call this set of generic functions the
|
|
<I>backend interface</I>. It consists of the following eight generic
|
|
functions:</P><PRE>(defgeneric raw-string (processor string &optional newlines-p))
|
|
|
|
(defgeneric newline (processor))
|
|
|
|
(defgeneric freshline (processor))
|
|
|
|
(defgeneric indent (processor))
|
|
|
|
(defgeneric unindent (processor))
|
|
|
|
(defgeneric toggle-indenting (processor))
|
|
|
|
(defgeneric embed-value (processor value))
|
|
|
|
(defgeneric embed-code (processor code))</PRE><P>While several of these functions have obvious correspondence to
|
|
<CODE>indenting-printer</CODE> functions, it's important to understand that
|
|
these generic functions define the abstract operations that are used by
|
|
the FOO language processors and won't always be implemented in terms
|
|
of calls to the <CODE>indenting-printer</CODE> functions.</P><P>That said, perhaps the easiest way to understand the semantics of
|
|
these abstract operations is to look at the concrete implementations
|
|
of the methods specialized on <CODE>html-pretty-printer</CODE>, the class
|
|
used to generate human-readable HTML.</P><A NAME="the-pretty-printer-backend"><H2>The Pretty Printer Backend</H2></A><P>You can start by defining a class with two slots--one to hold an
|
|
instance of <CODE>indenting-printer</CODE> and one to hold the tab
|
|
width--the number of spaces you want to increase the indentation for
|
|
each level of nesting of HTML elements.</P><PRE>(defclass html-pretty-printer ()
|
|
((printer :accessor printer :initarg :printer)
|
|
(tab-width :accessor tab-width :initarg :tab-width :initform 2)))</PRE><P>Now you can implement methods specialized on
|
|
<CODE>html-pretty-printer</CODE> on the eight generic functions that make up
|
|
the backend interface.</P><P>The FOO processors use the <CODE>raw-string</CODE> function to emit strings
|
|
that don't need character escaping, either because you actually want
|
|
to emit normally reserved characters or because all reserved
|
|
characters have already been escaped. Usually <CODE>raw-string</CODE> is
|
|
invoked with strings that don't contain newlines, so the default
|
|
behavior is to use <CODE>emit/no-newlines</CODE> unless the caller
|
|
specifies a non-<CODE><B>NIL</B></CODE> <CODE>newlines-p</CODE> argument.</P><PRE>(defmethod raw-string ((pp html-pretty-printer) string &optional newlines-p)
|
|
(if newlines-p
|
|
(emit (printer pp) string)
|
|
(emit/no-newlines (printer pp) string)))</PRE><P>The functions <CODE>newline</CODE>, <CODE>freshline</CODE>, <CODE>indent</CODE>,
|
|
<CODE>unindent</CODE>, and <CODE>toggle-indenting</CODE> implement fairly
|
|
straightforward manipulations of the underlying
|
|
<CODE>indenting-printer</CODE>. The only wrinkle is that the HTML pretty
|
|
printer generates pretty output only when the dynamic variable
|
|
<CODE>*pretty*</CODE> is true. When it's <CODE><B>NIL</B></CODE>, you should generate
|
|
compact HTML with no unnecessary whitespace. So, these methods, with
|
|
the exception of <CODE>newline</CODE>, all check <CODE>*pretty*</CODE> before
|
|
doing anything:<SUP>5</SUP></P><PRE>(defmethod newline ((pp html-pretty-printer))
|
|
(emit-newline (printer pp)))
|
|
|
|
(defmethod freshline ((pp html-pretty-printer))
|
|
(when *pretty* (emit-freshline (printer pp))))
|
|
|
|
(defmethod indent ((pp html-pretty-printer))
|
|
(when *pretty*
|
|
(incf (indentation (printer pp)) (tab-width pp))))
|
|
|
|
(defmethod unindent ((pp html-pretty-printer))
|
|
(when *pretty*
|
|
(decf (indentation (printer pp)) (tab-width pp))))
|
|
|
|
(defmethod toggle-indenting ((pp html-pretty-printer))
|
|
(when *pretty*
|
|
(with-slots (indenting-p) (printer pp)
|
|
(setf indenting-p (not indenting-p)))))</PRE><P>Finally, the functions <CODE>embed-value</CODE> and <CODE>embed-code</CODE> are
|
|
used only by the FOO compiler--<CODE>embed-value</CODE> is used to
|
|
generate code that'll emit the value of a Common Lisp expression,
|
|
while <CODE>embed-code</CODE> is used to embed a bit of code to be run and
|
|
its result discarded. In the interpreter, you can't meaningfully
|
|
evaluate embedded Lisp code, so the methods on these functions always
|
|
signal an error.</P><PRE>(defmethod embed-value ((pp html-pretty-printer) value)
|
|
(error "Can't embed values when interpreting. Value: ~s" value))
|
|
|
|
(defmethod embed-code ((pp html-pretty-printer) code)
|
|
(error "Can't embed code when interpreting. Code: ~s" code))</PRE><DIV CLASS="sidebarhead">Using Conditions to Have Your Cake and Eat It Too</DIV><DIV CLASS="sidebar"><P>An alternate approach would be to use <CODE><B>EVAL</B></CODE> to evaluate
|
|
Lisp expressions in the interpreter. The problem with this approach
|
|
is that <CODE><B>EVAL</B></CODE> has no access to the lexical environment. Thus,
|
|
there's no way to make something like this work:</P><PRE>(let ((x 10)) (emit-html '(:p x)))</PRE><P>when <CODE>x</CODE> is a lexical variable. The symbol <CODE>x</CODE> that's passed
|
|
to <CODE>emit-html</CODE> at runtime has no particular connection to the
|
|
lexical variable named with the same symbol. The Lisp compiler
|
|
arranges for references to <CODE>x</CODE> in the code to refer to the
|
|
variable, but after the code is compiled, there's no longer
|
|
necessarily any association between the name <CODE>x</CODE> and that
|
|
variable. This is the main reason that when you think <CODE><B>EVAL</B></CODE> is
|
|
the solution to your problem, you're probably wrong.</P><P>However, if <CODE>x</CODE> was a dynamic variable, declared with
|
|
<CODE><B>DEFVAR</B></CODE> or <CODE><B>DEFPARAMETER</B></CODE> (and likely named <CODE>*x*</CODE> instead
|
|
of <CODE>x</CODE>), <CODE><B>EVAL</B></CODE> could get at its value. Thus, it might be
|
|
useful to allow the FOO interpreter to use <CODE><B>EVAL</B></CODE> in some
|
|
situations. But it's a bad idea to always use <CODE><B>EVAL</B></CODE>. You can get
|
|
the best of both worlds by combining the idea of using <CODE><B>EVAL</B></CODE> with
|
|
the condition system.</P><P>First define some error classes that you can signal when
|
|
<CODE>embed-value</CODE> and <CODE>embed-code</CODE> are called in the
|
|
interpreter.</P><PRE>(define-condition embedded-lisp-in-interpreter (error)
|
|
((form :initarg :form :reader form)))</PRE><PRE>(define-condition value-in-interpreter (embedded-lisp-in-interpreter) ()
|
|
(:report
|
|
(lambda (c s)
|
|
(format s "Can't embed values when interpreting. Value: ~s" (form c)))))</PRE><PRE>(define-condition code-in-interpreter (embedded-lisp-in-interpreter) ()
|
|
(:report
|
|
(lambda (c s)
|
|
(format s "Can't embed code when interpreting. Code: ~s" (form c)))))</PRE><P>Now you can implement <CODE>embed-value</CODE> and <CODE>embed-code</CODE> to
|
|
signal those errors <I>and</I> provide a restart that'll evaluate the
|
|
form with <CODE><B>EVAL</B></CODE>.</P><PRE>(defmethod embed-value ((pp html-pretty-printer) value)
|
|
(restart-case (error 'value-in-interpreter :form value)
|
|
(evaluate ()
|
|
:report (lambda (s) (format s "EVAL ~s in null lexical environment." value))
|
|
(raw-string pp (escape (princ-to-string (eval value)) *escapes*) t))))</PRE><PRE>(defmethod embed-code ((pp html-pretty-printer) code)
|
|
(restart-case (error 'code-in-interpreter :form code)
|
|
(evaluate ()
|
|
:report (lambda (s) (format s "EVAL ~s in null lexical environment." code))
|
|
(eval code))))</PRE><P>Now you can do something like this:</P><PRE>HTML> (defvar *x* 10)
|
|
*X*
|
|
HTML> (emit-html '(:p *x*))</PRE><P>and you'll get dropped into the debugger with this message:</P><PRE>Can't embed values when interpreting. Value: *X*
|
|
[Condition of type VALUE-IN-INTERPRETER]</PRE><PRE>Restarts:
|
|
0: [EVALUATE] EVAL *X* in null lexical environment.
|
|
1: [ABORT] Abort handling SLIME request.
|
|
2: [ABORT] Abort entirely from this process.</PRE><P>If you invoke the <CODE>evaluate</CODE> restart, <CODE>embed-value</CODE> will
|
|
<CODE><B>EVAL</B></CODE> <CODE>*x*</CODE>, get the value <CODE>10</CODE>, and generate this
|
|
HTML:</P><PRE><p>10</p></PRE><P>Then, as a convenience, you can provide restart functions--functions
|
|
that invoke the <CODE>evaluate</CODE> restart--in certain situations. The
|
|
<CODE>evaluate</CODE> restart function unconditionally invokes the restart,
|
|
while <CODE>eval-dynamic-variables</CODE> and <CODE>eval-code</CODE> invoke it
|
|
only if the form in the condition is a dynamic variable or potential
|
|
code.</P><PRE>(defun evaluate (&optional condition)
|
|
(declare (ignore condition))
|
|
(invoke-restart 'evaluate))</PRE><PRE>(defun eval-dynamic-variables (&optional condition)
|
|
(when (and (symbolp (form condition)) (boundp (form condition)))
|
|
(evaluate)))</PRE><PRE>(defun eval-code (&optional condition)
|
|
(when (consp (form condition))
|
|
(evaluate)))</PRE><P>Now you can use <CODE><B>HANDLER-BIND</B></CODE> to set up a handler to
|
|
automatically invoke the <CODE>evaluate</CODE> restart for you.</P><PRE>HTML> (handler-bind ((value-in-interpreter #'evaluate)) (emit-html '(:p *x*)))
|
|
<p>10</p>
|
|
T</PRE><P>Finally, you can define a macro to provide a nicer syntax for binding
|
|
handlers for the two kinds of errors.</P><PRE>(defmacro with-dynamic-evaluation ((&key values code) &body body)
|
|
`(handler-bind (
|
|
,@(if values `((value-in-interpreter #'evaluate)))
|
|
,@(if code `((code-in-interpreter #'evaluate))))
|
|
,@body))</PRE><P>With this macro defined, you can write this:</P><PRE>HTML> (with-dynamic-evaluation (:values t) (emit-html '(:p *x*)))
|
|
<p>10</p>
|
|
T</PRE></DIV><A NAME="the-basic-evaluation-rule"><H2>The Basic Evaluation Rule</H2></A><P>Now to connect the FOO language to the processor interface, all you
|
|
need is a function that takes an object and processes it, invoking
|
|
the appropriate processor functions to generate HTML. For instance,
|
|
when given a simple form like this:</P><PRE>(:p "Foo")</PRE><P>this function might execute this sequence of calls on the processor:</P><PRE>(freshline processor)
|
|
(raw-string processor "<p" nil)
|
|
(raw-string processor ">" nil)
|
|
(raw-string processor "Foo" nil)
|
|
(raw-string processor "</p>" nil)
|
|
(freshline processor)</PRE><P>For now you can define a simple function that just checks whether a
|
|
form is, in fact, a legal FOO form and, if it is, hands it off to the
|
|
function <CODE>process-sexp-html</CODE> for processing. In the next
|
|
chapter, you'll add some bells and whistles to this function to allow
|
|
it to handle macros and special operators. But for now it looks like
|
|
this:</P><PRE>(defun process (processor form)
|
|
(if (sexp-html-p form)
|
|
(process-sexp-html processor form)
|
|
(error "Malformed FOO form: ~s" form)))</PRE><P>The function <CODE>sexp-html-p</CODE> determines whether the given object
|
|
is a legal FOO expression, either a self-evaluating form or a
|
|
properly formatted cons.</P><PRE>(defun sexp-html-p (form)
|
|
(or (self-evaluating-p form) (cons-form-p form)))</PRE><P>Self-evaluating forms are easily handled: just convert to a string
|
|
with <CODE><B>PRINC-TO-STRING</B></CODE> and escape the characters in the variable
|
|
<CODE>*escapes*</CODE>, which, as you'll recall, is initially bound to the
|
|
value of <CODE>*element-escapes*</CODE>. Cons forms you pass off to
|
|
<CODE>process-cons-sexp-html</CODE>.</P><PRE>(defun process-sexp-html (processor form)
|
|
(if (self-evaluating-p form)
|
|
(raw-string processor (escape (princ-to-string form) *escapes*) t)
|
|
(process-cons-sexp-html processor form)))</PRE><P>The function <CODE>process-cons-sexp-html</CODE> is then responsible for
|
|
emitting the opening tag, any attributes, the body, and the closing
|
|
tag. The main complication here is that to generate pretty HTML, you
|
|
need to emit fresh lines and adjust the indentation according to the
|
|
type of the element being emitted. You can categorize all the
|
|
elements defined in HTML into one of three categories: block,
|
|
paragraph, and inline. Block elements--such as <CODE>body</CODE> and
|
|
<CODE>ul</CODE>--are emitted with fresh lines before and after both their
|
|
opening and closing tags and with their contents indented one level.
|
|
Paragraph elements--such as <CODE>p</CODE>, <CODE>li</CODE>, and
|
|
<CODE>blockquote</CODE>--are emitted with a fresh line before the opening
|
|
tag and after the closing tag. Inline elements are simply emitted in
|
|
line. The following three parameters list the elements of each type:</P><PRE>(defparameter *block-elements*
|
|
'(:body :colgroup :dl :fieldset :form :head :html :map :noscript :object
|
|
:ol :optgroup :pre :script :select :style :table :tbody :tfoot :thead
|
|
:tr :ul))
|
|
|
|
(defparameter *paragraph-elements*
|
|
'(:area :base :blockquote :br :button :caption :col :dd :div :dt :h1
|
|
:h2 :h3 :h4 :h5 :h6 :hr :input :li :link :meta :option :p :param
|
|
:td :textarea :th :title))
|
|
|
|
(defparameter *inline-elements*
|
|
'(:a :abbr :acronym :address :b :bdo :big :cite :code :del :dfn :em
|
|
:i :img :ins :kbd :label :legend :q :samp :small :span :strong :sub
|
|
:sup :tt :var))</PRE><P>The functions <CODE>block-element-p</CODE> and <CODE>paragraph-element-p</CODE>
|
|
test whether a given tag is a member of the corresponding
|
|
list.<SUP>6</SUP></P><PRE>(defun block-element-p (tag) (find tag *block-elements*))
|
|
|
|
(defun paragraph-element-p (tag) (find tag *paragraph-elements*))</PRE><P>Two other categorizations with their own predicates are the elements
|
|
that are always empty, such as <CODE>br</CODE> and <CODE>hr</CODE>, and the three
|
|
elements, <CODE>pre</CODE>, <CODE>style</CODE>, and <CODE>script</CODE>, in which
|
|
whitespace is supposed to be preserved. The former are handled
|
|
specially when generating regular HTML (in other words, not XHTML)
|
|
since they're not supposed to have a closing tag. And when emitting
|
|
the three tags in which whitespace is preserved, you can temporarily
|
|
turn off indentation so the pretty printer doesn't add any spaces
|
|
that aren't part of the element's actual contents.</P><PRE>(defparameter *empty-elements*
|
|
'(:area :base :br :col :hr :img :input :link :meta :param))
|
|
|
|
(defparameter *preserve-whitespace-elements* '(:pre :script :style))
|
|
|
|
(defun empty-element-p (tag) (find tag *empty-elements*))
|
|
|
|
(defun preserve-whitespace-p (tag) (find tag *preserve-whitespace-elements*))</PRE><P>The last piece of information you need when generating HTML is
|
|
whether you're generating XHTML since that affects how you emit empty
|
|
elements.</P><PRE>(defparameter *xhtml* nil)</PRE><P>With all that information, you're ready to process a cons FOO form.
|
|
You use <CODE>parse-cons-form</CODE> to parse the list into three parts, the
|
|
tag symbol, a possibly empty plist of attribute key/value pairs, and a
|
|
possibly empty list of body forms. You then emit the opening tag, the
|
|
body, and the closing tag with the helper functions
|
|
<CODE>emit-open-tag</CODE>, <CODE>emit-element-body</CODE>, and
|
|
<CODE>emit-close-tag</CODE>.</P><PRE>(defun process-cons-sexp-html (processor form)
|
|
(when (string= *escapes* *attribute-escapes*)
|
|
(error "Can't use cons forms in attributes: ~a" form))
|
|
(multiple-value-bind (tag attributes body) (parse-cons-form form)
|
|
(emit-open-tag processor tag body attributes)
|
|
(emit-element-body processor tag body)
|
|
(emit-close-tag processor tag body)))</PRE><P>In <CODE>emit-open-tag</CODE> you have to call <CODE>freshline</CODE> when
|
|
appropriate and then emit the attributes with <CODE>emit-attributes</CODE>.
|
|
You need to pass the element's body to <CODE>emit-open-tag</CODE> so when
|
|
it's emitting XHTML, it knows whether to finish the tag with
|
|
<CODE>/></CODE> or <CODE>></CODE>.</P><PRE>(defun emit-open-tag (processor tag body-p attributes)
|
|
(when (or (paragraph-element-p tag) (block-element-p tag))
|
|
(freshline processor))
|
|
(raw-string processor (format nil "<~(~a~)" tag))
|
|
(emit-attributes processor attributes)
|
|
(raw-string processor (if (and *xhtml* (not body-p)) "/>" ">")))</PRE><P>In <CODE>emit-attributes</CODE> the attribute names aren't evaluated since
|
|
they must be keyword symbols, but you should invoke the top-level
|
|
<CODE>process</CODE> function to evaluate the attribute values, binding
|
|
<CODE>*escapes*</CODE> to <CODE>*attribute-escapes*</CODE>. As a convenience for
|
|
specifying boolean attributes, whose value should be the name of the
|
|
attribute, if the value is <CODE><B>T</B></CODE>--not just any true value but
|
|
actually <CODE><B>T</B></CODE>--then you replace the value with the name of the
|
|
attribute.<SUP>7</SUP></P><PRE>(defun emit-attributes (processor attributes)
|
|
(loop for (k v) on attributes by #'cddr do
|
|
(raw-string processor (format nil " ~(~a~)='" k))
|
|
(let ((*escapes* *attribute-escapes*))
|
|
(process processor (if (eql v t) (string-downcase k) v)))
|
|
(raw-string processor "'")))</PRE><P>Emitting the element's body is similar to emitting the attribute
|
|
values: you can loop through the body calling <CODE>process</CODE> to
|
|
evaluate each form. The rest of the code is dedicated to emitting
|
|
fresh lines and adjusting the indentation as appropriate for the type
|
|
of element.</P><PRE>(defun emit-element-body (processor tag body)
|
|
(when (block-element-p tag)
|
|
(freshline processor)
|
|
(indent processor))
|
|
(when (preserve-whitespace-p tag) (toggle-indenting processor))
|
|
(dolist (item body) (process processor item))
|
|
(when (preserve-whitespace-p tag) (toggle-indenting processor))
|
|
(when (block-element-p tag)
|
|
(unindent processor)
|
|
(freshline processor)))</PRE><P>Finally, <CODE>emit-close-tag</CODE>, as you'd probably expect, emits the
|
|
closing tag (unless no closing tag is necessary, such as when the
|
|
body is empty and you're either emitting XHTML or the element is one
|
|
of the special empty elements). Regardless of whether you actually
|
|
emit a close tag, you need to emit a final fresh line for block and
|
|
paragraph elements.</P><PRE>(defun emit-close-tag (processor tag body-p)
|
|
(unless (and (or *xhtml* (empty-element-p tag)) (not body-p))
|
|
(raw-string processor (format nil "</~(~a~)>" tag)))
|
|
(when (or (paragraph-element-p tag) (block-element-p tag))
|
|
(freshline processor)))</PRE><P>The function <CODE>process</CODE> is the basic FOO interpreter. To make it
|
|
a bit easier to use, you can define a function, <CODE>emit-html</CODE>,
|
|
that invokes <CODE>process</CODE>, passing it an <CODE>html-pretty-printer</CODE>
|
|
and a form to evaluate. You can define and use a helper function,
|
|
<CODE>get-pretty-printer</CODE>, to get the pretty printer, which returns
|
|
the current value of <CODE>*html-pretty-printer*</CODE> if it's bound;
|
|
otherwise, it makes a new instance of <CODE>html-pretty-printer</CODE> with
|
|
<CODE>*html-output*</CODE> as its output stream.</P><PRE>(defun emit-html (sexp) (process (get-pretty-printer) sexp))
|
|
|
|
(defun get-pretty-printer ()
|
|
(or *html-pretty-printer*
|
|
(make-instance
|
|
'html-pretty-printer
|
|
:printer (make-instance 'indenting-printer :out *html-output*))))</PRE><P>With this function, you can emit HTML to <CODE>*html-output*</CODE>. Rather
|
|
than expose the variable <CODE>*html-output*</CODE> as part of FOO's public
|
|
API, you should define a macro, <CODE>with-html-output</CODE>, that takes
|
|
care of binding the stream for you. It also lets you specify whether
|
|
you want pretty HTML output, defaulting to the value of the variable
|
|
<CODE>*pretty*</CODE>.</P><PRE>(defmacro with-html-output ((stream &key (pretty *pretty*)) &body body)
|
|
`(let* ((*html-output* ,stream)
|
|
(*pretty* ,pretty))
|
|
,@body))</PRE><P>So, if you wanted to use <CODE>emit-html</CODE> to generate HTML to a file,
|
|
you could write the following:</P><PRE>(with-open-file (out "foo.html" :direction output)
|
|
(with-html-output (out :pretty t)
|
|
(emit-html *some-foo-expression*)))</PRE><A NAME="whats-next"><H2>What's Next?</H2></A><P>In the next chapter, you'll look at how to implement a macro that
|
|
compiles FOO expressions into Common Lisp so you can embed HTML
|
|
generation code directly into your Lisp programs. You'll also extend
|
|
the FOO language to make it a bit more expressive by adding its own
|
|
flavor of special operators and macros.
|
|
</P><HR/><DIV CLASS="notes"><P><SUP>1</SUP>In fact, it's probably
|
|
<I>too</I> expressive since it can also generate all sorts of output
|
|
that's not even vaguely legal HTML. Of course, that might be a
|
|
feature if you need to generate HTML that's not strictly correct to
|
|
compensate for buggy Web browsers. Also, it's common for language
|
|
processors to accept programs that are syntactically correct and
|
|
otherwise well formed that'll nonetheless provoke undefined behavior
|
|
when run.</P><P><SUP>2</SUP>Well, almost every tag.
|
|
Certain tags such as <CODE>IMG</CODE> and <CODE>BR</CODE> don't. You'll deal with
|
|
those in the section "The Basic Evaluation Rule."</P><P><SUP>3</SUP>In the strict language of
|
|
the Common Lisp standard, keyword symbols aren't <I>self-evaluating</I>,
|
|
though they do, in fact, evaluate to themselves. See section
|
|
3.1.2.1.3 of the language standard or HyperSpec for a brief
|
|
discussion.</P><P><SUP>4</SUP>The requirement to use objects that the Lisp reader
|
|
knows how to read isn't a hard-and-fast one. Since the Lisp reader is
|
|
itself customizable, you could also define a new reader-level syntax
|
|
for a new kind of object. But that tends to be more trouble than it's
|
|
worth.</P><P><SUP>5</SUP>Another, more purely object-oriented, approach
|
|
would be to define two classes, perhaps <CODE>html-pretty-printer</CODE>
|
|
and <CODE>html-raw-printer</CODE>, and then define no-op methods
|
|
specialized on <CODE>html-raw-printer</CODE> for the methods that should do
|
|
stuff only when <CODE>*pretty*</CODE> is true. However, in this case, after
|
|
defining all the no-op methods, you'd end up with more code, and then
|
|
you'd have the hassle of making sure you created an instance of the
|
|
right class at the right time. But in general, using polymorphism to
|
|
replace conditionals is a good strategy.</P><P><SUP>6</SUP>You don't need a predicate for <CODE>*inline-elements*</CODE>
|
|
since you only ever test for block and paragraph elements. I include
|
|
the parameter here for completeness.</P><P><SUP>7</SUP>While XHTML requires boolean attributes to be notated
|
|
with their name as the value to indicate a true value, in HTML it's
|
|
also legal to simply include the name of the attribute with no value,
|
|
for example, <CODE><B><option selected></B></CODE> rather than <CODE><B><option
|
|
selected='selected'></B></CODE>. All HTML 4.0-compatible browsers should
|
|
understand both forms, but some buggy browsers understand only the
|
|
no-value form for certain attributes. If you need to generate HTML
|
|
for such browsers, you'll need to hack <CODE>emit-attributes</CODE> to emit
|
|
those attributes a bit differently.</P></DIV></BODY></HTML> |