emacs.d/clones/lisp/gigamonkeys.com/book/syntax-and-semantics.html

<HTML><HEAD><TITLE>Syntax and Semantics</TITLE><LINK REL="stylesheet" TYPE="text/css" HREF="style.css"/></HEAD><BODY><DIV CLASS="copyright">Copyright &copy; 2003-2005, Peter Seibel</DIV><H1>4. Syntax and Semantics</H1><P>After that whirlwind tour, we'll settle down for a few chapters to
take a more systematic look at the features you've used so far. I'll
start with an overview of the basic elements of Lisp's syntax and
semantics, which means, of course, that I must first address that
burning question. . . </P><A NAME="whats-with-all-the-parentheses"><H2>What's with All the Parentheses?</H2></A><P>Lisp's syntax is quite a bit different from the syntax of languages
descended from Algol. The two most immediately obvious
characteristics are the extensive use of parentheses and prefix
notation. For whatever reason, a lot of folks are put off by this
syntax. Lisp's detractors tend to describe the syntax as &quot;weird&quot; and
&quot;annoying.&quot; Lisp, they say, must stand for Lots of Irritating
Superfluous Parentheses. Lisp folks, on the other hand, tend to
consider Lisp's syntax one of its great virtues. How is it that
what's so off-putting to one group is a source of delight to another?</P><P>I can't really make the complete case for Lisp's syntax until I've
explained Lisp's macros a bit more thoroughly, but I can start with
an historical tidbit that suggests it may be worth keeping an open
mind: when John McCarthy first invented Lisp, he intended to
implement a more Algol-like syntax, which he called
<I>M-expressions</I>. However, he never got around to it. He explained
why not in his article &quot;History of
Lisp.&quot;<SUP>1</SUP></P><BLOCKQUOTE>The project of defining M-expressions precisely and compiling them
or at least translating them into S-expressions was neither
finalized nor explicitly abandoned. It just receded into the
indefinite future, and a new generation of programmers appeared who
preferred [S-expressions] to any FORTRAN-like or ALGOL-like
notation that could be devised.</BLOCKQUOTE><P>In other words, the people who have actually used Lisp over the past
45 years have <I>liked</I> the syntax and have found that it makes the
language more powerful. In the next few chapters, you'll begin to see
why. </P><A NAME="breaking-open-the-black-box"><H2>Breaking Open the Black Box</H2></A><P>Before we look at the specifics of Lisp's syntax and semantics, it's
worth taking a moment to look at how they're defined and how this
differs from many other languages.</P><P>In most programming languages, the language processor--whether an
interpreter or a compiler--operates as a black box: you shove a
sequence of characters representing the text of a program into the
black box, and it--depending on whether it's an interpreter or a
compiler--either executes the behaviors indicated or produces a
compiled version of the program that will execute the behaviors when
it's run.</P><P>Inside the black box, of course, language processors are usually
divided into subsystems that are each responsible for one part of the
task of translating a program text into behavior or object code. A
typical division is to split the processor into three phases, each of
which feeds into the next: a lexical analyzer breaks up the stream of
characters into tokens and feeds them to a parser that builds a tree
representing the expressions in the program, according to the
language's grammar. This tree--called an <I>abstract syntax tree</I>--is
then fed to an evaluator that either interprets it directly or
compiles it into some other language such as machine code. Because
the language processor is a black box, the data structures used by
the processor, such as the tokens and abstract syntax trees, are of
interest only to the language implementer.</P><P>In Common Lisp things are sliced up a bit differently, with
consequences for both the implementer and for how the language is
defined. Instead of a single black box that goes from text to program
behavior in one step, Common Lisp defines <I>two</I> black boxes, one
that translates text into Lisp objects and another that implements
the semantics of the language in terms of those objects. The first
box is called the <I>reader</I>, and the second is called the
<I>evaluator</I>.<SUP>2</SUP></P><P>Each black box defines one level of syntax. The reader defines how
strings of characters can be translated into Lisp objects called
<I>s-expressions</I>.<SUP>3</SUP> Since the s-expression syntax includes syntax for lists
of arbitrary objects, including other lists, s-expressions can
represent arbitrary tree expressions, much like the abstract syntax
tree generated by the parsers for non-Lisp languages.</P><P>The evaluator then defines a syntax of Lisp <I>forms</I> that can be
built out of s-expressions. Not all s-expressions are legal Lisp
forms any more than all sequences of characters are legal
s-expressions. For instance, both <CODE>(foo 1 2)</CODE> and <CODE>(&quot;foo&quot; 1
2)</CODE> are s-expressions, but only the former can be a Lisp form since a
list that starts with a string has no meaning as a Lisp form. </P><P>This split of the black box has a couple of consequences. One is that
you can use s-expressions, as you saw in Chapter 3, as an
externalizable data format for data other than source code, using
<CODE><B>READ</B></CODE> to read it and <CODE><B>PRINT</B></CODE> to print it.<SUP>4</SUP> The other consequence is that since the semantics of the
language are defined in terms of trees of objects rather than strings
of characters, it's easier to generate code within the language than
it would be if you had to generate code as text. Generating code
completely from scratch is only marginally easier--building up lists
vs. building up strings is about the same amount of work. The real
win, however, is that you can generate code by manipulating existing
data. This is the basis for Lisp's macros, which I'll discuss in much
more detail in future chapters. For now I'll focus on the two levels
of syntax defined by Common Lisp: the syntax of s-expressions
understood by the reader and the syntax of Lisp forms understood by
the evaluator. </P><A NAME="s-expressions"><H2>S-expressions</H2></A><P>The basic elements of s-expressions are <I>lists</I> and <I>atoms</I>.
Lists are delimited by parentheses and can contain any number of
whitespace-separated elements. Atoms are everything else.<SUP>5</SUP> The elements of lists are themselves s-expressions
(in other words, atoms or nested lists). Comments--which aren't,
technically speaking, s-expressions--start with a semicolon, extend
to the end of a line, and are treated essentially like whitespace.</P><P>And that's pretty much it. Since lists are syntactically so trivial,
the only remaining syntactic rules you need to know are those
governing the form of different kinds of atoms. In this section I'll
describe the rules for the most commonly used kinds of atoms:
numbers, strings, and names. After that, I'll cover how s-expressions
composed of these elements can be evaluated as Lisp forms.</P><P>Numbers are fairly straightforward: any sequence of digits--possibly
prefaced with a sign (<CODE>+</CODE> or <CODE>-</CODE>), containing a decimal
point (<CODE>.</CODE>) or a solidus (<CODE>/</CODE>), or ending with an exponent
marker--is read as a number. For example: </P><PRE>123       ; the integer one hundred twenty-three
3/7       ; the ratio three-sevenths
1.0       ; the floating-point number one in default precision
1.0e0     ; another way to write the same floating-point number
1.0d0     ; the floating-point number one in &quot;double&quot; precision
1.0e-4    ; the floating-point equivalent to one-ten-thousandth
+42       ; the integer forty-two
-42       ; the integer negative forty-two
-1/4      ; the ratio negative one-quarter
-2/8      ; another way to write negative one-quarter
246/2     ; another way to write the integer one hundred twenty-three</PRE><P>These different forms represent different kinds of numbers: integers,
ratios, and floating point. Lisp also supports complex numbers, which
have their own notation and which I'll discuss in Chapter 10. </P><P>As some of these examples suggest, you can notate the same number in
many ways. But regardless of how you write them, all
rationals--integers and ratios--are represented internally in
&quot;simplified&quot; form. In other words, the objects that represent -2/8 or
246/2 aren't distinct from the objects that represent -1/4 and 123.
Similarly, <CODE>1.0</CODE> and <CODE>1.0e0</CODE> are just different ways of
writing the same number. On the other hand, <CODE>1.0</CODE>, <CODE>1.0d0</CODE>,
and <CODE>1</CODE> can all denote different objects because the different
floating-point representations and integers are different types.
We'll save the details about the characteristics of different kinds
of numbers for Chapter 10.</P><P>Strings literals, as you saw in the previous chapter, are enclosed in
double quotes. Within a string a backslash (<CODE>\</CODE>) escapes the
next character, causing it to be included in the string regardless of
what it is. The only two characters that <I>must</I> be escaped within a
string are double quotes and the backslash itself. All other
characters can be included in a string literal without escaping,
regardless of their meaning outside a string. Some example string
literals are as follows: </P><PRE>&quot;foo&quot;     ; the string containing the characters f, o, and o.
&quot;fo\o&quot;    ; the same string
&quot;fo\\o&quot;   ; the string containing the characters f, o, \, and o.
&quot;fo\&quot;o&quot;   ; the string containing the characters f, o, &quot;, and o.</PRE><P>Names used in Lisp programs, such as <CODE><B>FORMAT</B></CODE> and
<CODE>hello-world</CODE>, and <CODE>*db*</CODE> are represented by objects called
<I>symbols</I>. The reader knows nothing about how a given name is going
to be used--whether it's the name of a variable, a function, or
something else. It just reads a sequence of characters and builds an
object to represent the name.<SUP>6</SUP> Almost any
character can appear in a name. Whitespace characters can't, though,
because the elements of lists are separated by whitespace. Digits can
appear in names as long as the name as a whole can't be interpreted
as a number. Similarly, names can contain periods, but the reader
can't read a name that consists only of periods. Ten characters that
serve other syntactic purposes can't appear in names: open and close
parentheses, double and single quotes, backtick, comma, colon,
semicolon, backslash, and vertical bar. And even those characters
<I>can</I>, if you're willing to escape them by preceding the character
to be escaped with a backslash or by surrounding the part of the name
containing characters that need escaping with vertical bars. </P><P>Two important characteristics of the way the reader translates names
to symbol objects have to do with how it treats the case of letters
in names and how it ensures that the same name is always read as the
same symbol. While reading names, the reader converts all unescaped
characters in a name to their uppercase equivalents. Thus, the reader
will read <CODE>foo</CODE>, <CODE>Foo</CODE>, and <CODE>FOO</CODE> as the same symbol:
<CODE>FOO</CODE>. However, <CODE>\f\o\o</CODE> and <CODE>|foo|</CODE> will both be
read as <CODE>foo</CODE>, which is a different object than the symbol
<CODE>FOO</CODE>. This is why when you define a function at the REPL and it
prints the name of the function, it's been converted to uppercase.
Standard style, these days, is to write code in all lowercase and let
the reader change names to uppercase.<SUP>7</SUP></P><P>To ensure that the same textual name is always read as the same
symbol, the reader <I>interns</I> symbols--after it has read the name
and converted it to all uppercase, the reader looks in a table called
a <I>package</I> for an existing symbol with the same name. If it can't
find one, it creates a new symbol and adds it to the table.
Otherwise, it returns the symbol already in the table. Thus, anywhere
the same name appears in any s-expression, the same object will be
used to represent it.<SUP>8</SUP> </P><P>Because names can contain many more characters in Lisp than they can
in Algol-derived languages, certain naming conventions are distinct
to Lisp, such as the use of hyphenated names like <CODE>hello-world</CODE>.
Another important convention is that global variables are given names
that start and end with <CODE>*</CODE>. Similarly, constants are given
names starting and ending in <CODE>+</CODE>. And some programmers will name
particularly low-level functions with names that start with <CODE>%</CODE>
or even <CODE>%%</CODE>. The names defined in the language standard use
only the alphabetic characters (A-Z) plus <CODE>*</CODE>, <CODE>+</CODE>,
<CODE>-</CODE>, <CODE>/</CODE>, <CODE>1</CODE>, <CODE>2</CODE>, <CODE>&lt;</CODE>, <CODE>=</CODE>, <CODE>&gt;</CODE>,
and <CODE>&amp;</CODE>.</P><P>The syntax for lists, numbers, strings, and symbols can describe a
good percentage of Lisp programs. Other rules describe notations for
literal vectors, individual characters, and arrays, which I'll cover
when I talk about the associated data types in Chapters 10 and 11.
For now the key thing to understand is how you can combine numbers,
strings, and symbols with parentheses-delimited lists to build
s-expressions representing arbitrary trees of objects. Some simple
examples look like this:</P><PRE>x             ; the symbol X
()            ; the empty list
(1 2 3)       ; a list of three numbers
(&quot;foo&quot; &quot;bar&quot;) ; a list of two strings
(x y z)       ; a list of three symbols
(x 1 &quot;foo&quot;)   ; a list of a symbol, a number, and a string
(+ (* 2 3) 4) ; a list of a symbol, a list, and a number.</PRE><P>An only slightly more complex example is the following four-item list
that contains two symbols, the empty list, and another list, itself
containing two symbols and a string: </P><PRE>(defun hello-world ()
  (format t &quot;hello, world&quot;))</PRE><A NAME="s-expressions-as-lisp-forms"><H2>S-expressions As Lisp Forms</H2></A><P>After the reader has translated a bunch of text into s-expressions,
the s-expressions can then be evaluated as Lisp code. Or some of them
can--not every s-expressions that the reader can read can necessarily
be evaluated as Lisp code. Common Lisp's evaluation rule defines a
second level of syntax that determines which s-expressions can be
treated as Lisp forms.<SUP>9</SUP> The syntactic rules at this level are quite
simple. Any atom--any nonlist or the empty list--is a legal Lisp form
as is any list that has a symbol as its first element.<SUP>10</SUP></P><P>Of course, the interesting thing about Lisp forms isn't their syntax
but how they're evaluated. For purposes of discussion, you can think
of the evaluator as a function that takes as an argument a
syntactically well-formed Lisp form and returns a value, which we can
call the <I>value</I> of the form. Of course, when the evaluator is a
compiler, this is a bit of a simplification--in that case, the
evaluator is given an expression and generates code that will compute
the appropriate value when it's run. But this simplification lets me
describe the semantics of Common Lisp in terms of how the different
kinds of Lisp forms are evaluated by this notional function. </P><P>The simplest Lisp forms, atoms, can be divided into two categories:
symbols and everything else. A symbol, evaluated as a form, is
considered the name of a variable and evaluates to the current value
of the variable.<SUP>11</SUP> I'll discuss in Chapter 6 how variables get
their values in the first place. You should also note that certain
&quot;variables&quot; are that old oxymoron of programming: &quot;constant
variables.&quot; For instance, the symbol <CODE><B>PI</B></CODE> names a constant
variable whose value is the best possible floating-point
approximation to the mathematical constant <I>pi</I>.</P><P>All other atoms--numbers and strings are the kinds you've seen so
far--are <I>self-evaluating</I> objects. This means when such an
expression is passed to the notional evaluation function, it's simply
returned. You saw examples of self-evaluating objects in Chapter 2
when you typed <CODE>10</CODE> and <CODE>&quot;hello, world&quot;</CODE> at the REPL.</P><P>It's also possible for symbols to be self-evaluating in the sense
that the variables they name can be assigned the value of the symbol
itself. Two important constants that are defined this way are <CODE><B>T</B></CODE>
and <CODE><B>NIL</B></CODE>, the canonical true and false values. I'll discuss their
role as booleans in the section &quot;Truth, Falsehood, and Equality.&quot;</P><P>Another class of self-evaluating symbols are the <I>keyword</I>
symbols--symbols whose names start with <CODE>:</CODE>. When the reader
interns such a name, it automatically defines a constant variable
with the name and with the symbol as the value.</P><P>Things get more interesting when we consider how lists are evaluated.
All legal list forms start with a symbol, but three kinds of list
forms are evaluated in three quite different ways. To determine what
kind of form a given list is, the evaluator must determine whether
the symbol that starts the list is the name of a function, a macro,
or a special operator. If the symbol hasn't been defined yet--as may
be the case if you're compiling code that contains references to
functions that will be defined later--it's assumed to be a function
name.<SUP>12</SUP> I'll refer to the three kinds of forms as <I>function call
forms</I>, <I>macro forms</I>, and <I>special forms</I>.</P><A NAME="function-calls"><H2>Function Calls</H2></A><P>The evaluation rule for function call forms is simple: evaluate the
remaining elements of the list as Lisp forms and pass the resulting
values to the named function. This rule obviously places some
additional syntactic constraints on a function call form: all the
elements of the list after the first must themselves be well-formed
Lisp forms. In other words, the basic syntax of a function call form
is as follows, where each of the arguments is itself a Lisp form:</P><PRE>(<I>function-name</I> <I>argument</I>*)</PRE><P>Thus, the following expression is evaluated by first evaluating
<CODE>1</CODE>, then evaluating <CODE>2</CODE>, and then passing the resulting
values to the <CODE><B>+</B></CODE> function, which returns 3:</P><PRE>(+ 1 2)</PRE><P>A more complex expression such as the following is evaluated in
similar fashion except that evaluating the arguments <CODE>(+ 1 2)</CODE>
and <CODE>(- 3 4)</CODE> entails first evaluating their arguments and
applying the appropriate functions to them:</P><PRE>(* (+ 1 2) (- 3 4))</PRE><P>Eventually, the values 3 and -1 are passed to the <CODE><B>*</B></CODE> function,
which returns -3.</P><P>As these examples show, functions are used for many of the things
that require special syntax in other languages. This helps keep
Lisp's syntax regular. </P><A NAME="special-operators"><H2>Special Operators</H2></A><P>That said, not all operations can be defined as functions. Because
all the arguments to a function are evaluated before the function is
called, there's no way to write a function that behaves like the
<CODE><B>IF</B></CODE> operator you used in Chapter 3. To see why, consider this
form:</P><PRE>(if x (format t &quot;yes&quot;) (format t &quot;no&quot;))</PRE><P>If <CODE><B>IF</B></CODE> were a function, the evaluator would evaluate the argument
expressions from left to right. The symbol <CODE>x</CODE> would be
evaluated as a variable yielding some value; then <CODE>(format t
&quot;yes&quot;)</CODE> would be evaluated as a function call, yielding <CODE><B>NIL</B></CODE>
after printing &quot;yes&quot; to standard output. Then <CODE>(format t &quot;no&quot;)</CODE>
would be evaluated, printing &quot;no&quot; and also yielding <CODE><B>NIL</B></CODE>. Only
after all three expressions were evaluated would the resulting values
be passed to <CODE><B>IF</B></CODE>, too late for it to control which of the two
<CODE><B>FORMAT</B></CODE> expressions gets evaluated.</P><P>To solve this problem, Common Lisp defines a couple dozen so-called
special operators, <CODE><B>IF</B></CODE> being one, that do things that functions
can't do. There are 25 in all, but only a small handful are used
directly in day-to-day programming.<SUP>13</SUP></P><P>When the first element of a list is a symbol naming a special
operator, the rest of the expressions are evaluated according to the
rule for that operator. </P><P>The rule for <CODE><B>IF</B></CODE> is pretty easy: evaluate the first expression.
If it evaluates to non-<CODE><B>NIL</B></CODE>, then evaluate the next expression
and return its value. Otherwise, return the value of evaluating the
third expression or <CODE><B>NIL</B></CODE> if the third expression is omitted. In
other words, the basic form of an <CODE><B>IF</B></CODE> expression is as follows:</P><PRE>(if <I>test-form</I> <I>then-form</I> [ <I>else-form</I> ])</PRE><P>The <I>test-form</I> will always be evaluated and then one or the other
of the <I>then-form</I> or <I>else-form</I>.</P><P>An even simpler special operator is <CODE><B>QUOTE</B></CODE>, which takes a single
expression as its &quot;argument&quot; and simply returns it, unevaluated. For
instance, the following evaluates to the list <CODE>(+ 1 2)</CODE>, not the
value 3:</P><PRE>(quote (+ 1 2))</PRE><P>There's nothing special about this list; you can manipulate it just
like any list you could create with the <CODE><B>LIST</B></CODE>
function.<SUP>14</SUP></P><P><CODE><B>QUOTE</B></CODE> is used commonly enough that a special syntax for it is
built into the reader. Instead of writing the following: </P><PRE>(quote (+ 1 2))</PRE><P>you can write this:</P><PRE>'(+ 1 2)</PRE><P>This syntax is a small extension of the s-expression syntax
understood by the reader. From the point of view of the evaluator,
both those expressions will look the same: a list whose first element
is the symbol <CODE><B>QUOTE</B></CODE> and whose second element is the list
<CODE>(+ 1 2)</CODE>.<SUP>15</SUP></P><P>In general, the special operators implement features of the language
that require some special processing by the evaluator. For instance,
several special operators manipulate the environment in which other
forms will be evaluated. One of these, which I'll discuss in detail
in Chapter 6, is <CODE><B>LET</B></CODE>, which is used to create new variable
bindings. The following form evaluates to 10 because the second
<CODE>x</CODE> is evaluated in an environment where it's the name of a
variable established by the <CODE><B>LET</B></CODE> with the value 10: </P><PRE>(let ((x 10)) x)</PRE><A NAME="macros"><H2>Macros</H2></A><P>While special operators extend the syntax of Common Lisp beyond what
can be expressed with just function calls, the set of special
operators is fixed by the language standard. Macros, on the other
hand, give users of the language a way to extend its syntax. As you
saw in Chapter 3, a macro is a function that takes s-expressions as
arguments and returns a Lisp form that's then evaluated in place of
the macro form. The evaluation of a macro form proceeds in two
phases: First, the elements of the macro form are passed,
unevaluated, to the macro function. Second, the form returned by the
macro function--called its <I>expansion--</I>is evaluated according to
the normal evaluation rules.</P><P>It's important to keep the two phases of evaluating a macro form
clear in your mind. It's easy to lose track when you're typing
expressions at the REPL because the two phases happen one after
another and the value of the second phase is immediately returned.
But when Lisp code is compiled, the two phases happen at completely
different times, so it's important to keep clear what's happening
when. For instance, when you compile a whole file of source code with
the function <CODE><B>COMPILE-FILE</B></CODE>, all the macro forms in the file are
recursively expanded until the code consists of nothing but function
call forms and special forms. This macroless code is then compiled
into a FASL file that the <CODE><B>LOAD</B></CODE> function knows how to load. The
compiled code, however, isn't executed until the file is loaded.
Because macros generate their expansion at compile time, they can do
relatively large amounts of work generating their expansion without
having to pay for it when the file is loaded or the functions defined
in the file are called.</P><P>Since the evaluator doesn't evaluate the elements of the macro form
before passing them to the macro function, they don't need to be
well-formed Lisp forms. Each macro assigns a meaning to the
s-expressions in the macro form by virtue of how it uses them to
generate its expansion. In other words, each macro defines its own
local syntax. For instance, the <CODE>backwards</CODE> macro from Chapter 3
defines a syntax in which an expression is a legal <CODE>backwards</CODE>
form if it's a list that's the reverse of a legal Lisp form.</P><P>I'll talk quite a bit more about macros throughout this book. For now
the important thing for you to realize is that macros--while
syntactically similar to function calls--serve quite a different
purpose, providing a hook into the compiler.<SUP>16</SUP></P><A NAME="truth-falsehood-and-equality"><H2>Truth, Falsehood, and Equality</H2></A><P>Two last bits of basic knowledge you need to get under your belt are
Common Lisp's notion of truth and falsehood and what it means for two
Lisp objects to be &quot;equal.&quot; Truth and falsehood are--in this
realm--straightforward: the symbol <CODE><B>NIL</B></CODE> is the only false value,
and everything else is true. The symbol <CODE><B>T</B></CODE> is the canonical true
value and can be used when you need to return a non-<CODE><B>NIL</B></CODE> value
and don't have anything else handy. The only tricky thing about
<CODE><B>NIL</B></CODE> is that it's the only object that's both an atom and a list:
in addition to falsehood, it's also used to represent the empty
list.<SUP>17</SUP> This equivalence between <CODE><B>NIL</B></CODE> and the empty list is
built into the reader: if the reader sees <CODE>()</CODE>, it reads it as
the symbol <CODE><B>NIL</B></CODE>. They're completely interchangeable. And because
<CODE><B>NIL</B></CODE>, as I mentioned previously, is the name of a constant
variable with the symbol <CODE><B>NIL</B></CODE> as its value, the expressions
<CODE>nil</CODE>, <CODE>()</CODE>, <CODE>'nil</CODE>, and <CODE>'()</CODE> all evaluate to
the same thing--the unquoted forms are evaluated as a reference to
the constant variable whose value is the symbol <CODE><B>NIL</B></CODE>, but in the
quoted forms the <CODE><B>QUOTE</B></CODE> special operator evaluates to the symbol
directly. For the same reason, both <CODE>t</CODE> and <CODE>'t</CODE> will
evaluate to the same thing: the symbol <CODE><B>T</B></CODE>.</P><P>Using phrases such as &quot;the same thing&quot; of course begs the question of
what it means for two values to be &quot;the same.&quot; As you'll see in
future chapters, Common Lisp provides a number of type-specific
equality predicates: <CODE><B>=</B></CODE> is used to compare numbers, <CODE><B>CHAR=</B></CODE> to
compare characters, and so on. In this section I'll discuss the four
&quot;generic&quot; equality predicates--functions that can be passed any two
Lisp objects and will return true if they're equivalent and false
otherwise. They are, in order of discrimination, <CODE><B>EQ</B></CODE>, <CODE><B>EQL</B></CODE>,
<CODE><B>EQUAL</B></CODE>, and <CODE><B>EQUALP</B></CODE>.</P><P><CODE><B>EQ</B></CODE> tests for &quot;object identity&quot;--two objects are <CODE><B>EQ</B></CODE> if
they're identical. Unfortunately, the object identity of numbers and
characters depends on how those data types are implemented in a
particular Lisp. Thus, <CODE><B>EQ</B></CODE> may consider two numbers or two
characters with the same value to be equivalent, or it may not.
Implementations have enough leeway that the expression <CODE>(eq 3
3)</CODE> can legally evaluate to either true or false. More to the point,
<CODE>(eq x x)</CODE> can evaluate to either true or false if the value of
<CODE>x</CODE> happens to be a number or character.</P><P>Thus, you should never use <CODE><B>EQ</B></CODE> to compare values that may be
numbers or characters. It may seem to work in a predictable way for
certain values in a particular implementation, but you have no
guarantee that it will work the same way if you switch
implementations. And switching implementations may mean simply
upgrading your implementation to a new version--if your Lisp
implementer changes how they represent numbers or characters, the
behavior of <CODE><B>EQ</B></CODE> could very well change as well. </P><P>Thus, Common Lisp defines <CODE><B>EQL</B></CODE> to behave like <CODE><B>EQ</B></CODE> except that
it also is guaranteed to consider two objects of the same class
representing the same numeric or character value to be equivalent.
Thus, <CODE>(eql 1 1)</CODE> is guaranteed to be true. And <CODE>(eql 1
1.0)</CODE> is guaranteed to be false since the integer value 1 and the
floating-point value are instances of different classes.</P><P>There are two schools of thought about when to use <CODE><B>EQ</B></CODE> and when
to use <CODE><B>EQL</B></CODE>: The &quot;use <CODE><B>EQ</B></CODE> when possible&quot; camp argues you
should use <CODE><B>EQ</B></CODE> when you know you aren't going to be com-paring
numbers or characters because (a) it's a way to indicate that you
aren't going to be comparing numbers or characters and (b) it will be
marginally more efficient since <CODE><B>EQ</B></CODE> doesn't have to check whether
its arguments are numbers or characters.</P><P>The &quot;always use <CODE><B>EQL</B></CODE>&quot; camp says you should never use <CODE><B>EQ</B></CODE>
because (a) the potential gain in clarity is lost because every time
someone reading your code--including you--sees an <CODE><B>EQ</B></CODE>, they have
to stop and check whether it's being used correctly (in other words,
that it's never going to be called upon to compare numbers or
characters) and (b) that the efficiency difference between <CODE><B>EQ</B></CODE>
and <CODE><B>EQL</B></CODE> is in the noise compared to real performance
bottlenecks. </P><P>The code in this book is written in the &quot;always use <CODE><B>EQL</B></CODE>&quot;
style.<SUP>18</SUP></P><P>The other two equality predicates, <CODE><B>EQUAL</B></CODE> and <CODE><B>EQUALP</B></CODE>, are
general in the sense that they can operate on all types of objects,
but they're much less fundamental than <CODE><B>EQ</B></CODE> or <CODE><B>EQL</B></CODE>. They each
define a slightly less discriminating notion of equivalence than
<CODE><B>EQL</B></CODE>, allowing different objects to be considered equivalent.
There's nothing special about the particular notions of equivalence
these functions implement except that they've been found to be handy
by Lisp programmers in the past. If these predicates don't suit your
needs, you can always define your own predicate function that
compares different types of objects in the way you need.</P><P><CODE><B>EQUAL</B></CODE> loosens the discrimination of <CODE><B>EQL</B></CODE> to consider lists
equivalent if they have the same structure and contents, recursively,
according to <CODE><B>EQUAL</B></CODE>. <CODE><B>EQUAL</B></CODE> also considers strings equivalent
if they contain the same characters. It also defines a looser
definition of equivalence than <CODE><B>EQL</B></CODE> for bit vectors and
pathnames, two data types I'll discuss in future chapters. For all
other types, it falls back on <CODE><B>EQL</B></CODE>.</P><P><CODE><B>EQUALP</B></CODE> is similar to <CODE><B>EQUAL</B></CODE> except it's even less
discriminating. It considers two strings equivalent if they contain
the same characters, ignoring differences in case. It also considers
two characters equivalent if they differ only in case. Numbers are
equivalent under <CODE><B>EQUALP</B></CODE> if they represent the same mathematical
value. Thus, <CODE>(equalp 1 1.0)</CODE> is true. Lists with <CODE><B>EQUALP</B></CODE>
elements are <CODE><B>EQUALP</B></CODE>; likewise, arrays with <CODE><B>EQUALP</B></CODE> elements
are <CODE><B>EQUALP</B></CODE>. As with <CODE><B>EQUAL</B></CODE>, there are a few other data types
that I haven't covered yet for which <CODE><B>EQUALP</B></CODE> can consider two
objects equivalent that neither <CODE><B>EQL</B></CODE> nor <CODE><B>EQUAL</B></CODE> will. For all
other data types, <CODE><B>EQUALP</B></CODE> falls back on <CODE><B>EQL</B></CODE>. </P><A NAME="formatting-lisp-code"><H2>Formatting Lisp Code</H2></A><P>While code formatting is, strictly speaking, neither a syntactic nor
a semantic matter, proper formatting is important to reading and
writing code fluently and idiomatically. The key to formatting Lisp
code is to indent it properly. The indentation should reflect the
structure of the code so that you don't need to count parentheses to
see what goes with what. In general, each new level of nesting gets
indented a bit more, and, if line breaks are necessary, items at the
same level of nesting are lined up. Thus, a function call that needs
to be broken up across multiple lines might be written like this:</P><PRE>(some-function arg-with-a-long-name
               another-arg-with-an-even-longer-name)</PRE><P>Macro and special forms that implement control constructs are
typically indented a little differently: the &quot;body&quot; elements are
indented two spaces relative to the opening parenthesis of the form.
Thus: </P><PRE>(defun print-list (list)
  (dolist (i list)
    (format t &quot;item: ~a~%&quot; i)))</PRE><P>However, you don't need to worry too much about these rules because a
proper Lisp environment such as SLIME will take care of it for you.
In fact, one of the advantages of Lisp's regular syntax is that it's
fairly easy for software such as editors to know how to indent it.
Since the indentation is supposed to reflect the structure of the
code and the structure is marked by parentheses, it's easy to let the
editor indent your code for you.</P><P>In SLIME, hitting Tab at the beginning of each line will cause it to
be indented appropriately, or you can re-indent a whole expression by
positioning the cursor on the opening parenthesis and typing
<CODE>C-M-q</CODE>. Or you can re-indent the whole body of a function from
anywhere within it by typing <CODE>C-c M-q</CODE>.</P><P>Indeed, experienced Lisp programmers tend to rely on their editor
handling indenting automatically, not just to make their code look
nice but to detect typos: once you get used to how code is supposed
to be indented, a misplaced parenthesis will be instantly
recognizable by the weird indentation your editor gives you. For
example, suppose you were writing a function that was supposed to
look like this: </P><PRE>(defun foo ()
  (if (test)
    (do-one-thing)
    (do-another-thing)))</PRE><P>Now suppose you accidentally left off the closing parenthesis after
<CODE>test</CODE>. Because you don't bother counting parentheses, you quite
likely would have added an extra parenthesis at the end of the
<CODE><B>DEFUN</B></CODE> form, giving you this code:</P><PRE>(defun foo ()
  (if (test
    (do-one-thing)
    (do-another-thing))))</PRE><P>However, if you had been indenting by hitting Tab at the beginning of
each line, you wouldn't have code like that. Instead you'd have this:</P><PRE>(defun foo ()
  (if (test
       (do-one-thing)
       (do-another-thing))))</PRE><P>Seeing the then and else clauses indented way out under the condition
rather than just indented slightly relative to the <CODE><B>IF</B></CODE> shows you
immediately that something is awry.</P><P>Another important formatting rule is that closing parentheses are
always put on the same line as the last element of the list they're
closing. That is, don't write this: </P><PRE>(defun foo ()
  (dotimes (i 10)
    (format t &quot;~d. hello~%&quot; i)
  )
)</PRE><P>but instead write this:</P><PRE>(defun foo ()
  (dotimes (i 10)
    (format t &quot;~d. hello~%&quot; i)))</PRE><P>The string of <CODE>)))</CODE>s at the end may seem forbidding, but as long
your code is properly indented the parentheses should fade away--no
need to give them undue prominence by spreading them across several
lines.</P><P>Finally, comments should be prefaced with one to four semicolons
depending on the scope of the comment as follows: </P><PRE>;;;; Four semicolons are used for a file header comment.

;;; A comment with three semicolons will usually be a paragraph
;;; comment that applies to a large section of code that follows,

(defun foo (x)
  (dotimes (i x)
    ;; Two semicolons indicate this comment applies to the code
    ;; that follows. Note that this comment is indented the same
    ;; as the code that follows.
    (some-function-call)
    (another i)              ; this comment applies to this line only
    (and-another)            ; and this is for this line
    (baz)))</PRE><P>Now you're ready to start looking in greater detail at the major
building blocks of Lisp programs, functions, variables, and macros.
Up next: functions.
</P><HR/><DIV CLASS="notes"><P><SUP>1</SUP><CODE>http://www-formal.stanford.edu/jmc/history/lisp/node3.html</CODE></P><P><SUP>2</SUP>Lisp implementers, like implementers of any
language, have many ways they can implement an evaluator, ranging
from a &quot;pure&quot; interpreter that interprets the objects given to the
evaluator directly to a compiler that translates the objects into
machine code that it then runs. In the middle are implementations
that compile the input into an intermediate form such as bytecodes
for a virtual machine and then interprets the bytecodes. Most Common
Lisp implementations these days use some form of compilation even
when evaluating code at run time.</P><P><SUP>3</SUP>Sometimes the phrase <I>s-expression</I> refers
to the textual representation and sometimes to the objects that
result from reading the textual representation. Usually either it's
clear from context which is meant or the distinction isn't that
important.</P><P><SUP>4</SUP>Not all Lisp
objects can be written out in a way that can be read back in. But
anything you can <CODE><B>READ</B></CODE> can be printed back out &quot;readably&quot; with
<CODE><B>PRINT</B></CODE>.</P><P><SUP>5</SUP>The
empty list, <CODE>()</CODE>, which can also be written <CODE><B>NIL</B></CODE>, is both an
atom and a list.</P><P><SUP>6</SUP>In fact, as you'll see later,
names aren't intrinsically tied to any one kind of thing. You can use
the same name, depending on context, to refer to both a variable and
a function, not to mention several other possibilities.</P><P><SUP>7</SUP>The case-converting
behavior of the reader can, in fact, be customized, but understanding
when and how to change it requires a much deeper discussion of the
relation between names, symbols, and other program elements than I'm
ready to get into just yet.</P><P><SUP>8</SUP>I'll discuss the relation between symbols
and packages in more detail in Chapter 21.</P><P><SUP>9</SUP>Of course, other levels of correctness
exist in Lisp, as in other languages. For instance, the s-expression
that results from reading <CODE>(foo 1 2)</CODE> is syntactically
well-formed but can be evaluated only if <CODE>foo</CODE> is the name of a
function or macro.</P><P><SUP>10</SUP>One other
rarely used kind of Lisp form is a list whose first element is a
<I>lambda form</I>. I'll discuss this kind of form in Chapter 5.</P><P><SUP>11</SUP>One other possibility exists--it's possible to
define <I>symbol macros</I> that are evaluated slightly differently. We
won't worry about them.</P><P><SUP>12</SUP>In Common Lisp a symbol can name both an
operator--function, macro, or special operator--and a variable. This
is one of the major differences between Common Lisp and Scheme. The
difference is sometimes described as Common Lisp being a Lisp-2 vs.
Scheme being a Lisp-1--a Lisp-2 has two namespaces, one for operators
and one for variables, but a Lisp-1 uses a single namespace. Both
choices have advantages, and partisans can debate endlessly which is
better.</P><P><SUP>13</SUP>The others provide useful,
but somewhat esoteric, features. I'll discuss them as the features
they support come up.</P><P><SUP>14</SUP>Well, one difference exists--literal objects such as
quoted lists, but also including double-quoted strings, literal
arrays, and vectors (whose syntax you'll see later), must not be
modified. Consequently, any lists you plan to manipulate you should
create with <CODE><B>LIST</B></CODE>.</P><P><SUP>15</SUP>This syntax is an example of a <I>reader macro</I>.
Reader macros modify the syntax the reader uses to translate text
into Lisp objects. It is, in fact, possible to define your own reader
macros, but that's a rarely used facility of the language. When most
Lispers talk about &quot;extending the syntax&quot; of the language, they're
talking about regular macros, as I'll discuss in a moment.</P><P><SUP>16</SUP>People without
experience using Lisp's macros or, worse yet, bearing the scars of C
preprocessor-inflicted wounds, tend to get nervous when they realize
that macro calls look like regular function calls. This turns out not
to be a problem in practice for several reasons. One is that macro
forms are usually formatted differently than function calls. For
instance, you write the following:</P><PRE>(dolist (x foo)
  (print x))</PRE><P>rather than this:</P><PRE>(dolist (x foo) (print x))</PRE><P>or </P><PRE>(dolist (x foo)
       (print x))</PRE><P>the way you would if <CODE><B>DOLIST</B></CODE> was a function. A good Lisp
environment will automatically format macro calls correctly, even for
user-defined macros.</P><P>And even if a <CODE><B>DOLIST</B></CODE> form was written on a single line, there are
several clues that it's a macro: For one, the expression <CODE>(x
foo)</CODE> is meaningful by itself only if <CODE>x</CODE> is the name of a
function or macro. Combine that with the later occurrence of <CODE>x</CODE>
as a variable, and it's pretty suggestive that <CODE><B>DOLIST</B></CODE> is a macro
that's creating a binding for a variable named <CODE>x</CODE>. Naming
conventions also help--looping constructs, which are invariably
macros--are frequently given names starting with <I>do</I>.</P><P><SUP>17</SUP>Using the empty list as false is a reflection of Lisp's
heritage as a list-processing language much as the use of the integer
0 as false in C is a reflection of its heritage as a bit-twiddling
language. Not all Lisps handle boolean values the same way. Another
of the many subtle differences upon which a good Common Lisp vs.
Scheme flame war can rage for days is Scheme's use of a distinct
false value <CODE>#f</CODE>, which isn't the same value as either the
symbol <CODE>nil</CODE> or the empty list, which are also distinct from
each other.</P><P><SUP>18</SUP>Even the language standard is a bit ambivalent about which
of <CODE><B>EQ</B></CODE> or <CODE><B>EQL</B></CODE> should be preferred. <I>Object identity</I> is
defined by <CODE><B>EQ</B></CODE>, but the standard defines the phrase <I>the</I>
<I>same</I> when talking about objects to mean <CODE><B>EQL</B></CODE> unless another
predicate is explicitly mentioned. Thus, if you want to be 100 percent
technically correct, you can say that <CODE>(- 3 2)</CODE> and <CODE>(- 4
3)</CODE> evaluate to &quot;the same&quot; object but not that they evaluate to
&quot;identical&quot; objects. This is, admittedly, a bit of an
<I>angels-on-pinheads </I>kind of issue.</P></DIV></BODY></HTML>