453 lines
38 KiB
HTML
453 lines
38 KiB
HTML
|
<HTML><HEAD><TITLE>They Called It LISP for a Reason: List Processing</TITLE><LINK REL="stylesheet" TYPE="text/css" HREF="style.css"/></HEAD><BODY><DIV CLASS="copyright">Copyright © 2003-2005, Peter Seibel</DIV><H1>12. They Called It LISP for a Reason: List Processing</H1><P>Lists play an important role in Lisp--for reasons both historical and
|
||
|
practical. Historically, lists were Lisp's original composite data
|
||
|
type, though it has been decades since they were its <I>only</I> such
|
||
|
data type. These days, a Common Lisp programmer is as likely to use a
|
||
|
vector, a hash table, or a user-defined class or structure as to use a
|
||
|
list.</P><P>Practically speaking, lists remain in the language because they're an
|
||
|
excellent solution to certain problems. One such problem--how to
|
||
|
represent code as data in order to support code-transforming and
|
||
|
code-generating macros--is particular to Lisp, which may explain why
|
||
|
other languages don't feel the lack of Lisp-style lists. More
|
||
|
generally, lists are an excellent data structure for representing any
|
||
|
kind of heterogeneous and/or hierarchical data. They're also quite
|
||
|
lightweight and support a functional style of programming that's
|
||
|
another important part of Lisp's heritage.</P><P>Thus, you need to understand lists on their own terms; as you gain a
|
||
|
better understanding of how lists work, you'll be in a better
|
||
|
position to appreciate when you should and shouldn't use them. </P><A NAME="there-is-no-list"><H2>"There Is No List"</H2></A><BLOCKQUOTE><B>Spoon Boy</B>: Do not try and bend the list. That's impossible.
|
||
|
Instead . . . only try to realize the truth.</BLOCKQUOTE><BLOCKQUOTE><B>Neo</B>: What truth?</BLOCKQUOTE><BLOCKQUOTE><B>Spoon Boy</B>: There is no list.</BLOCKQUOTE><BLOCKQUOTE><B>Neo</B>: There is no list?</BLOCKQUOTE><BLOCKQUOTE><B>Spoon Boy</B>: Then you'll see that it is not the list that bends;
|
||
|
it is only yourself.<SUP>1</SUP></BLOCKQUOTE><P>The key to understanding lists is to understand that they're largely
|
||
|
an illusion built on top of objects that are instances of a more
|
||
|
primitive data type. Those simpler objects are pairs of values called
|
||
|
<I>cons cells</I>, after the function <CODE><B>CONS</B></CODE> used to create them.</P><P><CODE><B>CONS</B></CODE> takes two arguments and returns a new cons cell containing
|
||
|
the two values.<SUP>2</SUP> These values can be references to any kind of object.
|
||
|
Unless the second value is <CODE><B>NIL</B></CODE> or another cons cell, a cons is
|
||
|
printed as the two values in parentheses separated by a dot, a
|
||
|
so-called dotted pair.</P><PRE>(cons 1 2) ==> (1 . 2)</PRE><P>The two values in a cons cell are called the <CODE><B>CAR</B></CODE> and the
|
||
|
<CODE><B>CDR</B></CODE> after the names of the functions used to access them. At the
|
||
|
dawn of time, these names were mnemonic, at least to the folks
|
||
|
implementing the first Lisp on an IBM 704. But even then they were
|
||
|
just lifted from the assembly mnemonics used to implement the
|
||
|
operations. However, it's not all bad that these names are somewhat
|
||
|
meaningless--when considering individual cons cells, it's best to
|
||
|
think of them simply as an arbitrary pair of values without any
|
||
|
particular semantics. Thus:</P><PRE>(car (cons 1 2)) ==> 1
|
||
|
(cdr (cons 1 2)) ==> 2</PRE><P>Both <CODE><B>CAR</B></CODE> and <CODE><B>CDR</B></CODE> are also <CODE><B>SETF</B></CODE>able places--given an
|
||
|
existing cons cell, it's possible to assign a new value to either of
|
||
|
its values.<SUP>3</SUP></P><PRE>(defparameter *cons* (cons 1 2))
|
||
|
*cons* ==> (1 . 2)
|
||
|
(setf (car *cons*) 10) ==> 10
|
||
|
*cons* ==> (10 . 2)
|
||
|
(setf (cdr *cons*) 20) ==> 20
|
||
|
*cons* ==> (10 . 20)</PRE><P>Because the values in a cons cell can be references to any kind of
|
||
|
object, you can build larger structures out of cons cells by linking
|
||
|
them together. Lists are built by linking together cons cells in a
|
||
|
chain. The elements of the list are held in the <CODE><B>CAR</B></CODE>s of the cons
|
||
|
cells while the links to subsequent cons cells are held in the
|
||
|
<CODE><B>CDR</B></CODE>s. The last cell in the chain has a <CODE><B>CDR</B></CODE> of <CODE><B>NIL</B></CODE>,
|
||
|
which--as I mentioned in Chapter 4--represents the empty list as
|
||
|
well as the boolean value false. </P><P>This arrangement is by no means unique to Lisp; it's called a
|
||
|
<I>singly linked list</I>. However, few languages outside the Lisp
|
||
|
family provide such extensive support for this humble data type.</P><P>So when I say a particular value is a list, what I really mean is
|
||
|
it's either <CODE><B>NIL</B></CODE> or a reference to a cons cell. The <CODE><B>CAR</B></CODE> of
|
||
|
the cons cell is the first item of the list, and the <CODE><B>CDR</B></CODE> is a
|
||
|
reference to another list, that is, another cons cell or <CODE><B>NIL</B></CODE>,
|
||
|
containing the remaining elements. The Lisp printer understands this
|
||
|
convention and prints such chains of cons cells as parenthesized
|
||
|
lists rather than as dotted pairs. </P><PRE>(cons 1 nil) ==> (1)
|
||
|
(cons 1 (cons 2 nil)) ==> (1 2)
|
||
|
(cons 1 (cons 2 (cons 3 nil))) ==> (1 2 3)</PRE><P>When talking about structures built out of cons cells, a few diagrams
|
||
|
can be a big help. Box-and-arrow diagrams represent cons cells as a
|
||
|
pair of boxes like this:</P><P><IMG CLASS="figure" SRC="figures/one-cons-cell.png"/></P><P>The box on the left represents the <CODE><B>CAR</B></CODE>, and the box on the right
|
||
|
is the <CODE><B>CDR</B></CODE>. The values stored in a particular cons cell are
|
||
|
either drawn in the appropriate box or represented by an arrow from
|
||
|
the box to a representation of the referenced value.<SUP>4</SUP> For instance, the
|
||
|
list <CODE>(1 2 3)</CODE>, which consists of three cons cells linked
|
||
|
together by their <CODE><B>CDR</B></CODE>s, would be diagrammed like this:</P><P><IMG CLASS="figure" SRC="figures/list-1-2-3.png"/></P><P>However, most of the time you work with lists you won't have to deal
|
||
|
with individual cons cells--the functions that create and manipulate
|
||
|
lists take care of that for you. For example, the <CODE><B>LIST</B></CODE> function
|
||
|
builds a cons cells under the covers for you and links them together;
|
||
|
the following <CODE><B>LIST</B></CODE> expressions are equivalent to the previous
|
||
|
<CODE><B>CONS</B></CODE> expressions:</P><PRE>(list 1) ==> (1)
|
||
|
(list 1 2) ==> (1 2)
|
||
|
(list 1 2 3) ==> (1 2 3)</PRE><P>Similarly, when you're thinking in terms of lists, you don't have to
|
||
|
use the meaningless names <CODE><B>CAR</B></CODE> and <CODE><B>CDR</B></CODE>; <CODE><B>FIRST</B></CODE> and
|
||
|
<CODE><B>REST</B></CODE> are synonyms for <CODE><B>CAR</B></CODE> and <CODE><B>CDR</B></CODE> that you should use
|
||
|
when you're dealing with cons cells as lists. </P><PRE>(defparameter *list* (list 1 2 3 4))
|
||
|
(first *list*) ==> 1
|
||
|
(rest *list*) ==> (2 3 4)
|
||
|
(first (rest *list*)) ==> 2</PRE><P>Because cons cells can hold any kind of values, so can lists. And a
|
||
|
single list can hold objects of different types.</P><PRE>(list "foo" (list 1 2) 10) ==> ("foo" (1 2) 10)</PRE><P>The structure of that list would look like this:</P><P><IMG CLASS="figure" SRC="figures/mixed-list.png"/></P><P>Because lists can have other lists as elements, you can also use them
|
||
|
to represent trees of arbitrary depth and complexity. As such, they
|
||
|
make excellent representations for any heterogeneous, hierarchical
|
||
|
data. Lisp-based XML processors, for instance, usually represent XML
|
||
|
documents internally as lists. Another obvious example of
|
||
|
tree-structured data is Lisp code itself. In Chapters 30 and 31
|
||
|
you'll write an HTML generation library that uses lists of lists to
|
||
|
represent the HTML to be generated. I'll talk more next chapter about
|
||
|
using cons cells to represent other data structures.</P><P>Common Lisp provides quite a large library of functions for
|
||
|
manipulating lists. In the sections "List-Manipulation Functions" and
|
||
|
"Mapping," you'll look at some of the more important of these
|
||
|
functions. However, they will be easier to understand in the context
|
||
|
of a few ideas borrowed from functional programming. </P><A NAME="functional-programming-and-lists"><H2>Functional Programming and Lists</H2></A><P>The essence of functional programming is that programs are built
|
||
|
entirely of functions with no side effects that compute their results
|
||
|
based solely on the values of their arguments. The advantage of the
|
||
|
functional style is that it makes programs easier to understand.
|
||
|
Eliminating side effects eliminates almost all possibilities for
|
||
|
action at a distance. And since the result of a function is
|
||
|
determined only by the values of its arguments, its behavior is
|
||
|
easier to understand and test. For instance, when you see an
|
||
|
expression such as <CODE>(+ 3 4)</CODE>, you know the result is uniquely
|
||
|
determined by the definition of the <CODE><B>+</B></CODE> function and the values
|
||
|
<CODE>3</CODE> and <CODE>4</CODE>. You don't have to worry about what may have
|
||
|
happened earlier in the execution of the program since there's
|
||
|
nothing that can change the result of evaluating that expression.</P><P>Functions that deal with numbers are naturally functional since
|
||
|
numbers are immutable. A list, on the other hand, can be mutated, as
|
||
|
you've just seen, by <CODE><B>SETF</B></CODE>ing the <CODE><B>CAR</B></CODE>s and <CODE><B>CDR</B></CODE>s of the
|
||
|
cons cells that make up its backbone. However, lists can be treated
|
||
|
as a functional data type if you consider their value to be
|
||
|
determined by the elements they contain. Thus, any list of the form
|
||
|
<CODE>(1 2 3 4)</CODE> is functionally equivalent to any other list
|
||
|
containing those four values, regardless of what cons cells are
|
||
|
actually used to represent the list. And any function that takes a
|
||
|
list as an argument and returns a value based solely on the contents
|
||
|
of the list can likewise be considered functional. For instance, the
|
||
|
<CODE><B>REVERSE</B></CODE> sequence function, given the list <CODE>(1 2 3 4)</CODE>,
|
||
|
always returns a list <CODE>(4 3 2 1)</CODE>. Different calls to
|
||
|
<CODE><B>REVERSE</B></CODE> with functionally equivalent lists as the argument will
|
||
|
return functionally equivalent result lists. Another aspect of
|
||
|
functional programming, which I'll discuss in the section "Mapping,"
|
||
|
is the use of higher-order functions: functions that treat other
|
||
|
functions as data, taking them as arguments or returning them as
|
||
|
results. </P><P>Most of Common Lisp's list-manipulation functions are written in a
|
||
|
functional style. I'll discuss later how to mix functional and other
|
||
|
coding styles, but first you should understand a few subtleties of
|
||
|
the functional style as applied to lists.</P><P>The reason most list functions are written functionally is it allows
|
||
|
them to return results that share cons cells with their arguments. To
|
||
|
take a concrete example, the function <CODE><B>APPEND</B></CODE> takes any number of
|
||
|
list arguments and returns a new list containing the elements of all
|
||
|
its arguments. For instance:</P><PRE>(append (list 1 2) (list 3 4)) ==> (1 2 3 4)</PRE><P>From a functional point of view, <CODE><B>APPEND</B></CODE>'s job is to return the
|
||
|
list <CODE>(1 2 3 4)</CODE> without modifying any of the cons cells in the
|
||
|
lists <CODE>(1 2)</CODE> and <CODE>(3 4)</CODE>. One obvious way to achieve that
|
||
|
goal is to create a completely new list consisting of four new cons
|
||
|
cells. However, that's more work than is necessary. Instead,
|
||
|
<CODE><B>APPEND</B></CODE> actually makes only two new cons cells to hold the values
|
||
|
<CODE>1</CODE> and <CODE>2</CODE>, linking them together and pointing the
|
||
|
<CODE><B>CDR</B></CODE> of the second cons cell at the head of the last argument,
|
||
|
the list <CODE>(3 4)</CODE>. It then returns the cons cell containing the
|
||
|
<CODE>1</CODE>. None of the original cons cells has been modified, and the
|
||
|
result is indeed the list <CODE>(1 2 3 4)</CODE>. The only wrinkle is that
|
||
|
the list returned by <CODE><B>APPEND</B></CODE> shares some cons cells with the list
|
||
|
<CODE>(3 4)</CODE>. The resulting structure looks like this:</P><P><IMG CLASS="figure" SRC="figures/after-append.png"/></P><P>In general, <CODE><B>APPEND</B></CODE> must copy all but its last argument, but it
|
||
|
can always return a result that <I>shares structure</I> with the last
|
||
|
argument.</P><P>Other functions take similar advantage of lists' ability to share
|
||
|
structure. Some, like <CODE><B>APPEND</B></CODE>, are specified to always return
|
||
|
results that share structure in a particular way. Others are simply
|
||
|
allowed to return shared structure at the discretion of the
|
||
|
implementation. </P><A NAME="destructive-operations"><H2>"Destructive" Operations</H2></A><P>If Common Lisp were a purely functional language, that would be the
|
||
|
end of the story. However, because it's possible to modify a cons
|
||
|
cell after it has been created by <CODE><B>SETF</B></CODE>ing its <CODE><B>CAR</B></CODE> or
|
||
|
<CODE><B>CDR</B></CODE>, you need to think a bit about how side effects and
|
||
|
structure sharing mix.</P><P>Because of Lisp's functional heritage, operations that modify
|
||
|
existing objects are called <I>destructive--</I>in functional
|
||
|
programming, changing an object's state "destroys" it since it no
|
||
|
longer represents the same value. However, using the same term to
|
||
|
describe all state-modifying operations leads to a certain amount of
|
||
|
confusion since there are two very different kinds of destructive
|
||
|
operations, <I>for-side-effect</I> operations and <I>recycling</I>
|
||
|
operations.<SUP>5</SUP></P><P>For-side-effect operations are those used specifically for their side
|
||
|
effects. All uses of <CODE><B>SETF</B></CODE> are destructive in this sense, as are
|
||
|
functions that use <CODE><B>SETF</B></CODE> under the covers to change the state of
|
||
|
an existing object such as <CODE><B>VECTOR-PUSH</B></CODE> or <CODE><B>VECTOR-POP</B></CODE>. But
|
||
|
it's a bit unfair to describe these operations as
|
||
|
destructive--they're not intended to be used in code written in a
|
||
|
functional style, so they shouldn't be described using functional
|
||
|
terminology. However, if you mix nonfunctional, for-side-effect
|
||
|
operations with functions that return structure-sharing results, then
|
||
|
you need to be careful not to inadvertently modify the shared
|
||
|
structure. For instance, consider these three definitions: </P><PRE>(defparameter *list-1* (list 1 2))
|
||
|
(defparameter *list-2* (list 3 4))
|
||
|
(defparameter *list-3* (append *list-1* *list-2*))</PRE><P>After evaluating these forms, you have three lists, but
|
||
|
<CODE>*list-3*</CODE> and <CODE>*list-2*</CODE> share structure just like the
|
||
|
lists in the previous diagram.</P><PRE>*list-1* ==> (1 2)
|
||
|
*list-2* ==> (3 4)
|
||
|
*list-3* ==> (1 2 3 4)</PRE><P>Now consider what happens when you modify <CODE>*list-2*</CODE>.</P><PRE>(setf (first *list-2*) 0) ==> 0
|
||
|
*list-2* ==> (0 4) ; as expected
|
||
|
*list-3* ==> (1 2 0 4) ; maybe not what you wanted</PRE><P>The change to <CODE>*list-2*</CODE> also changes <CODE>*list-3*</CODE> because of
|
||
|
the shared structure: the first cons cell in <CODE>*list-2*</CODE> is also
|
||
|
the third cons cell in <CODE>*list-3*</CODE>. <CODE><B>SETF</B></CODE>ing the <CODE><B>FIRST</B></CODE>
|
||
|
of <CODE>*list-2*</CODE> changes the value in the <CODE><B>CAR</B></CODE> of that cons
|
||
|
cell, affecting both lists.</P><P>On the other hand, the other kind of destructive operations,
|
||
|
recycling operations, <I>are</I> intended to be used in functional code.
|
||
|
They use side effects only as an optimization. In particular, they
|
||
|
reuse certain cons cells from their arguments when building their
|
||
|
result. However, unlike functions such as <CODE><B>APPEND</B></CODE> that reuse cons
|
||
|
cells by including them, unmodified, in the list they return,
|
||
|
recycling functions reuse cons cells as raw material, modifying the
|
||
|
<CODE><B>CAR</B></CODE> and <CODE><B>CDR</B></CODE> as necessary to build the desired result. Thus,
|
||
|
recycling functions can be used safely only when the original lists
|
||
|
aren't going to be needed after the call to the recycling function. </P><P>To see how a recycling function works, let's compare <CODE><B>REVERSE</B></CODE>,
|
||
|
the nondestructive function that returns a reversed version of a
|
||
|
sequence, to <CODE><B>NREVERSE</B></CODE>, a recycling version of the same function.
|
||
|
Because <CODE><B>REVERSE</B></CODE> doesn't modify its argument, it must allocate a
|
||
|
new cons cell for each element in the list being reversed. But
|
||
|
suppose you write something like this:</P><PRE>(setf *list* (reverse *list*))</PRE><P>By assigning the result of <CODE><B>REVERSE</B></CODE> back to <CODE>*list*</CODE>, you've
|
||
|
removed the reference to the original value of <CODE>*list*</CODE>.
|
||
|
Assuming the cons cells in the original list aren't referenced
|
||
|
anywhere else, they're now eligible to be garbage collected. However,
|
||
|
in many Lisp implementations it'd be more efficient to immediately
|
||
|
reuse the existing cons cells rather than allocating new ones and
|
||
|
letting the old ones become garbage.</P><P><CODE><B>NREVERSE</B></CODE> allows you to do exactly that. The <I>N</I> stands for
|
||
|
<I>non-consing</I>, meaning it doesn't need to allocate any new cons
|
||
|
cells. The exact side effects of <CODE><B>NREVERSE</B></CODE> are intentionally not
|
||
|
specified--it's allowed to modify any <CODE><B>CAR</B></CODE> or <CODE><B>CDR</B></CODE> of any
|
||
|
cons cell in the list--but a typical implementation might walk down
|
||
|
the list changing the <CODE><B>CDR</B></CODE> of each cons cell to point to the
|
||
|
previous cons cell, eventually returning the cons cell that was
|
||
|
previously the last cons cell in the old list and is now the head of
|
||
|
the reversed list. No new cons cells need to be allocated, and no
|
||
|
garbage is created.</P><P>Most recycling functions, like <CODE><B>NREVERSE</B></CODE>, have nondestructive
|
||
|
counterparts that compute the same result. In general, the recycling
|
||
|
functions have names that are the same as their non-destructive
|
||
|
counterparts except with a leading <I>N</I>. However, not all do,
|
||
|
including several of the more commonly used recycling functions such
|
||
|
as <CODE><B>NCONC</B></CODE>, the recycling version of <CODE><B>APPEND</B></CODE>, and <CODE><B>DELETE</B></CODE>,
|
||
|
<CODE><B>DELETE-IF</B></CODE>, <CODE><B>DELETE-IF-NOT</B></CODE>, and <CODE><B>DELETE-DUPLICATES</B></CODE>, the
|
||
|
recycling versions of the <CODE><B>REMOVE</B></CODE> family of sequence functions.</P><P>In general, you use recycling functions in the same way you use their
|
||
|
nondestructive counterparts except it's safe to use them only when
|
||
|
you know the arguments aren't going to be used after the function
|
||
|
returns. The side effects of most recycling functions aren't
|
||
|
specified tightly enough to be relied upon. </P><P>However, the waters are further muddied by a handful of recycling
|
||
|
functions with specified side effects that <I>can</I> be relied upon.
|
||
|
They are <CODE><B>NCONC</B></CODE>, the recycling version of <CODE><B>APPEND</B></CODE>, and
|
||
|
<CODE><B>NSUBSTITUTE</B></CODE> and its <CODE>-IF</CODE> and <CODE>-IF-NOT</CODE> variants, the
|
||
|
recycling versions of the sequence functions <CODE><B>SUBSTITUTE</B></CODE> and
|
||
|
friends.</P><P>Like <CODE><B>APPEND</B></CODE>, <CODE><B>NCONC</B></CODE> returns a concatenation of its list
|
||
|
arguments, but it builds its result in the following way: for each
|
||
|
nonempty list it's passed, <CODE><B>NCONC</B></CODE> sets the <CODE><B>CDR</B></CODE> of the list's
|
||
|
last cons cell to point to the first cons cell of the next nonempty
|
||
|
list. It then returns the first list, which is now the head of the
|
||
|
spliced-together result. Thus:</P><PRE>(defparameter *x* (list 1 2 3))
|
||
|
|
||
|
(nconc *x* (list 4 5 6)) ==> (1 2 3 4 5 6)
|
||
|
|
||
|
*x* ==> (1 2 3 4 5 6)</PRE><P><CODE><B>NSUBSTITUTE</B></CODE> and variants can be relied on to walk down the list
|
||
|
structure of the list argument and to <CODE><B>SETF</B></CODE> the <CODE><B>CAR</B></CODE>s of any
|
||
|
cons cells holding the old value to the new value and to otherwise
|
||
|
leave the list intact. It then returns the original list, which now
|
||
|
has the same value as would've been computed by <CODE><B>SUBSTITUTE</B></CODE>.
|
||
|
<SUP>6</SUP></P><P>The key thing to remember about <CODE><B>NCONC</B></CODE> and <CODE><B>NSUBSTITUTE</B></CODE> is
|
||
|
that they're the exceptions to the rule that you can't rely on the
|
||
|
side effects of recycling functions. It's perfectly acceptable--and
|
||
|
arguably good style--to ignore the reliability of their side effects
|
||
|
and use them, like any other recycling function, only for the value
|
||
|
they return. </P><A NAME="combining-recycling-with-shared-structure"><H2>Combining Recycling with Shared Structure</H2></A><P>Although you can use recycling functions whenever the arguments to
|
||
|
the recycling function won't be used after the function call, it's
|
||
|
worth noting that each recycling function is a loaded gun pointed
|
||
|
footward: if you accidentally use a recycling function on an argument
|
||
|
that <I>is</I> used later, you're liable to lose some toes.</P><P>To make matters worse, shared structure and recycling functions tend
|
||
|
to work at cross-purposes. Nondestructive list functions return
|
||
|
lists that share structure under the assumption that cons cells are
|
||
|
never modified, but recycling functions work by violating that
|
||
|
assumption. Or, put another way, sharing structure is based on the
|
||
|
premise that you don't care exactly what cons cells make up a list
|
||
|
while using recycling functions requires that you know exactly what
|
||
|
cons cells are referenced from where.</P><P>In practice, recycling functions tend to be used in a few idiomatic
|
||
|
ways. By far the most common recycling idiom is to build up a list to
|
||
|
be returned from a function by "consing" onto the front of a list,
|
||
|
usually by <CODE><B>PUSH</B></CODE>ing elements onto a list stored in a local
|
||
|
variable and then returning the result of <CODE><B>NREVERSE</B></CODE>ing
|
||
|
it.<SUP>7</SUP></P><P>This is an efficient way to build a list because each <CODE><B>PUSH</B></CODE> has
|
||
|
to create only one cons cell and modify a local variable and the
|
||
|
<CODE><B>NREVERSE</B></CODE> just has to zip down the list reassigning the
|
||
|
<CODE><B>CDR</B></CODE>s. Because the list is created entirely within the function,
|
||
|
there's no danger any code outside the function has a reference to
|
||
|
any of its cons cells. Here's a function that uses this idiom to
|
||
|
build a list of the first <I>n</I> numbers, starting at zero:<SUP>8</SUP> </P><PRE>(defun upto (max)
|
||
|
(let ((result nil))
|
||
|
(dotimes (i max)
|
||
|
(push i result))
|
||
|
(nreverse result)))
|
||
|
|
||
|
(upto 10) ==> (0 1 2 3 4 5 6 7 8 9)</PRE><P>The next most common recycling idiom<SUP>9</SUP> is to
|
||
|
immediately reassign the value returned by the recycling function
|
||
|
back to the place containing the potentially recycled value. For
|
||
|
instance, you'll often see expressions like the following, using
|
||
|
<CODE><B>DELETE</B></CODE>, the recycling version of <CODE><B>REMOVE</B></CODE>:</P><PRE>(setf foo (delete nil foo))</PRE><P>This sets the value of <CODE>foo</CODE> to its old value except with all
|
||
|
the <CODE><B>NIL</B></CODE>s removed. However, even this idiom must be used with
|
||
|
some care--if <CODE>foo</CODE> shares structure with lists referenced
|
||
|
elsewhere, using <CODE><B>DELETE</B></CODE> instead of <CODE><B>REMOVE</B></CODE> can destroy the
|
||
|
structure of those other lists. For example, consider the two lists
|
||
|
<CODE>*list-2*</CODE> and <CODE>*list-3*</CODE> from earlier that share their
|
||
|
last two cons cells. </P><PRE>*list-2* ==> (0 4)
|
||
|
*list-3* ==> (1 2 0 4)</PRE><P>You can delete <CODE>4</CODE> from <CODE>*list-3*</CODE> like this:</P><PRE>(setf *list-3* (delete 4 *list-3*)) ==> (1 2 0)</PRE><P>However, <CODE><B>DELETE</B></CODE> will likely perform the necessary deletion by
|
||
|
setting the <CODE><B>CDR</B></CODE> of the third cons cell to <CODE><B>NIL</B></CODE>,
|
||
|
disconnecting the fourth cons cell, the one holding the <CODE>4</CODE>,
|
||
|
from the list. Because the third cons cell of <CODE>*list-3*</CODE> is also
|
||
|
the first cons cell in <CODE>*list-2*</CODE>, the following modifies
|
||
|
<CODE>*list-2*</CODE> as well:</P><PRE>*list-2* ==> (0)</PRE><P>If you had used <CODE><B>REMOVE</B></CODE> instead of <CODE><B>DELETE</B></CODE>, it would've built
|
||
|
a list containing the values <CODE>1</CODE>, <CODE>2</CODE>, and <CODE>0</CODE>,
|
||
|
creating new cons cells as necessary rather than modifying any of the
|
||
|
cons cells in <CODE>*list-3*</CODE>. In that case, <CODE>*list-2*</CODE> wouldn't
|
||
|
have been affected. </P><P>The <CODE><B>PUSH</B></CODE>/<CODE><B>NREVERSE</B></CODE> and <CODE><B>SETF</B></CODE>/<CODE><B>DELETE</B></CODE> idioms probably
|
||
|
account for 80 percent of the uses of recycling functions. Other uses
|
||
|
are possible but require keeping careful track of which functions
|
||
|
return shared structure and which do not.</P><P>In general, when manipulating lists, it's best to write your own code
|
||
|
in a functional style--your functions should depend only on the
|
||
|
contents of their list arguments and shouldn't modify them. Following
|
||
|
that rule will, of course, rule out using any destructive functions,
|
||
|
recycling or otherwise. Once you have your code working, if profiling
|
||
|
shows you need to optimize, you can replace nondestructive list
|
||
|
operations with their recycling counterparts but only if you're
|
||
|
certain the argument lists aren't referenced from anywhere else.</P><P>One last gotcha to watch out for is that the sorting functions
|
||
|
<CODE><B>SORT</B></CODE>, <CODE><B>STABLE-SORT</B></CODE>, and <CODE><B>MERGE</B></CODE> mentioned in Chapter 11
|
||
|
are also recycling functions when applied to lists.<SUP>10</SUP> However, these functions don't have nondestructive
|
||
|
counterparts, so if you need to sort a list without destroying it,
|
||
|
you need to pass the sorting function a copy made with
|
||
|
<CODE><B>COPY-LIST</B></CODE>. In either case you need to be sure to save the result
|
||
|
of the sorting function because the original argument is likely to be
|
||
|
in tatters. For instance: </P><PRE>CL-USER> (defparameter *list* (list 4 3 2 1))
|
||
|
*LIST*
|
||
|
CL-USER> (sort *list* #'<)
|
||
|
(1 2 3 4) ; looks good
|
||
|
CL-USER> *list*
|
||
|
(4) ; whoops!</PRE><A NAME="list-manipulation-functions"><H2>List-Manipulation Functions</H2></A><P>With that background out of the way, you're ready to look at the
|
||
|
library of functions Common Lisp provides for manipulating lists.</P><P>You've already seen the basic functions for getting at the elements
|
||
|
of a list: <CODE><B>FIRST</B></CODE> and <CODE><B>REST</B></CODE>. Although you can get at any
|
||
|
element of a list by combining enough calls to <CODE><B>REST</B></CODE> (to move
|
||
|
down the list) with a <CODE><B>FIRST</B></CODE> (to extract the element), that can
|
||
|
be a bit tedious. So Common Lisp provides functions named for the
|
||
|
other ordinals from <CODE><B>SECOND</B></CODE> to <CODE><B>TENTH</B></CODE> that return the
|
||
|
appropriate element. More generally, the function <CODE><B>NTH</B></CODE> takes two
|
||
|
arguments, an index and a list, and returns the <I>n</I>th (zero-based)
|
||
|
element of the list. Similarly, <CODE><B>NTHCDR</B></CODE> takes an index and a list
|
||
|
and returns the result of calling <CODE><B>CDR</B></CODE> <I>n</I> times. (Thus,
|
||
|
<CODE>(nthcdr 0 ...)</CODE> simply returns the original list, and
|
||
|
<CODE>(nthcdr 1 ...)</CODE> is equivalent to <CODE><B>REST</B></CODE>.) Note, however,
|
||
|
that none of these functions is any more efficient, in terms of work
|
||
|
done by the computer, than the equivalent combinations of <CODE><B>FIRST</B></CODE>s
|
||
|
and <CODE><B>REST</B></CODE>s--there's no way to get to the <I>n</I>th element of a
|
||
|
list without following <I>n</I> <CODE><B>CDR</B></CODE> references.<SUP>11</SUP></P><P>The 28 composite <CODE><B>CAR</B></CODE>/<CODE><B>CDR</B></CODE> functions are another family of
|
||
|
functions you may see used from time to time. Each function is named
|
||
|
by placing a sequence of up to four <CODE>A</CODE>s and <CODE>D</CODE>s between a
|
||
|
<CODE>C</CODE> and <CODE>R</CODE>, with each <CODE>A</CODE> representing a call to
|
||
|
<CODE><B>CAR</B></CODE> and each <CODE>D</CODE> a call to <CODE><B>CDR</B></CODE>. Thus: </P><PRE>(caar list) === (car (car list))
|
||
|
(cadr list) === (car (cdr list))
|
||
|
(cadadr list) === (car (cdr (car (cdr list))))</PRE><P>Note, however, that many of these functions make sense only when
|
||
|
applied to lists that contain other lists. For instance, <CODE><B>CAAR</B></CODE>
|
||
|
extracts the <CODE><B>CAR</B></CODE> of the <CODE><B>CAR</B></CODE> of the list it's given; thus,
|
||
|
the list it's passed must contain another list as its first element.
|
||
|
In other words, these are really functions on trees rather than
|
||
|
lists:</P><PRE>(caar (list 1 2 3)) ==> <I>error</I>
|
||
|
(caar (list (list 1 2) 3)) ==> 1
|
||
|
(cadr (list (list 1 2) (list 3 4))) ==> (3 4)
|
||
|
(caadr (list (list 1 2) (list 3 4))) ==> 3</PRE><P>These functions aren't used as often now as in the old days. And even
|
||
|
the most die-hard old-school Lisp hackers tend to avoid the longer
|
||
|
combinations. However, they're used quite a bit in older Lisp code,
|
||
|
so it's worth at least understanding how they work.<SUP>12</SUP></P><P>The <CODE><B>FIRST</B></CODE>-<CODE><B>TENTH</B></CODE> and <CODE><B>CAR</B></CODE>, <CODE><B>CADR</B></CODE>, and so on,
|
||
|
functions can also be used as <CODE><B>SETF</B></CODE>able places if you're using
|
||
|
lists nonfunctionally.</P><P>Table 12-1 summarizes some other list functions that I won't cover in
|
||
|
detail. </P><P><DIV CLASS="table-caption">Table 12-1. Other List Functions</DIV></P><TABLE CLASS="book-table"><TR><TD>Function</TD><TD>Description</TD></TR><TR><TD><CODE><B>LAST</B></CODE></TD><TD>Returns the last cons cell in a list. With an integer, argument
|
||
|
returns the last <I>n</I> cons cells.</TD></TR><TR><TD><CODE><B>BUTLAST</B></CODE></TD><TD>Returns a copy of the list, excluding the last cons cell. With an
|
||
|
integer argument, excludes the last <I>n</I> cells.</TD></TR><TR><TD><CODE><B>NBUTLAST</B></CODE></TD><TD>The recycling version of <CODE><B>BUTLAST</B></CODE>; may modify and return the
|
||
|
argument list but has no reliable side effects.</TD></TR><TR><TD><CODE><B>LDIFF</B></CODE></TD><TD>Returns a copy of a list up to a given cons cell.</TD></TR><TR><TD><CODE><B>TAILP</B></CODE></TD><TD>Returns true if a given object is a cons cell that's part of the
|
||
|
structure of a list.</TD></TR><TR><TD><CODE><B>LIST*</B></CODE></TD><TD>Builds a list to hold all but the last of its arguments and then
|
||
|
makes the last argument the <CODE><B>CDR</B></CODE> of the last cell in the list. In
|
||
|
other words, a cross between <CODE><B>LIST</B></CODE> and <CODE><B>APPEND</B></CODE>.</TD></TR><TR><TD><CODE><B>MAKE-LIST</B></CODE></TD><TD>Builds an <I>n</I> item list. The initial elements of the list are
|
||
|
<CODE><B>NIL</B></CODE> or the value specified with the <CODE>:initial-element</CODE>
|
||
|
keyword argument.</TD></TR><TR><TD><CODE><B>REVAPPEND</B></CODE></TD><TD>Combination of <CODE><B>REVERSE</B></CODE> and <CODE><B>APPEND</B></CODE>; reverses first
|
||
|
argument as with <CODE><B>REVERSE</B></CODE> and then appends the second argument.</TD></TR><TR><TD><CODE><B>NRECONC</B></CODE></TD><TD>Recycling version of <CODE><B>REVAPPEND</B></CODE>; reverses first argument as if
|
||
|
by <CODE><B>NREVERSE</B></CODE> and then appends the second argument. No reliable
|
||
|
side effects.</TD></TR><TR><TD><CODE><B>CONSP</B></CODE></TD><TD>Predicate to test whether an object is a cons cell.</TD></TR><TR><TD><CODE><B>ATOM</B></CODE></TD><TD>Predicate to test whether an object is <I>not</I> a cons cell.</TD></TR><TR><TD><CODE><B>LISTP</B></CODE></TD><TD>Predicate to test whether an object is either a cons cell or
|
||
|
<CODE><B>NIL</B></CODE>.</TD></TR><TR><TD><CODE><B>NULL</B></CODE></TD><TD>Predicate to test whether an object is <CODE><B>NIL</B></CODE>. Functionally
|
||
|
equivalent to <CODE><B>NOT</B></CODE> but stylistically preferable when testing for
|
||
|
an empty list as opposed to boolean false.</TD></TR></TABLE><A NAME="mapping"><H2>Mapping</H2></A><P>Another important aspect of the functional style is the use of
|
||
|
higher-order functions, functions that take other functions as
|
||
|
arguments or return functions as values. You saw several examples of
|
||
|
higher-order functions, such as <CODE><B>MAP</B></CODE>, in the previous chapter.
|
||
|
Although <CODE><B>MAP</B></CODE> can be used with both lists and vectors (that is,
|
||
|
with any kind of sequence), Common Lisp also provides six mapping
|
||
|
functions specifically for lists. The differences between the six
|
||
|
functions have to do with how they build up their result and whether
|
||
|
they apply the function to the elements of the list or to the cons
|
||
|
cells of the list structure.</P><P><CODE><B>MAPCAR</B></CODE> is the function most like <CODE><B>MAP</B></CODE>. Because it always
|
||
|
returns a list, it doesn't require the result-type argument <CODE><B>MAP</B></CODE>
|
||
|
does. Instead, its first argument is the function to apply, and
|
||
|
subsequent arguments are the lists whose elements will provide the
|
||
|
arguments to the function. Otherwise, it behaves like <CODE><B>MAP</B></CODE>: the
|
||
|
function is applied to successive elements of the list arguments,
|
||
|
taking one element from each list per application of the function.
|
||
|
The results of each function call are collected into a new list. For
|
||
|
example:</P><PRE>(mapcar #'(lambda (x) (* 2 x)) (list 1 2 3)) ==> (2 4 6)
|
||
|
(mapcar #'+ (list 1 2 3) (list 10 20 30)) ==> (11 22 33)</PRE><P><CODE><B>MAPLIST</B></CODE> is just like <CODE><B>MAPCAR</B></CODE> except instead of passing the
|
||
|
elements of the list to the function, it passes the actual cons
|
||
|
cells.<SUP>13</SUP> Thus, the function has access not only to the value of
|
||
|
each element of the list (via the <CODE><B>CAR</B></CODE> of the cons cell) but also
|
||
|
to the rest of the list (via the <CODE><B>CDR</B></CODE>).</P><P><CODE><B>MAPCAN</B></CODE> and <CODE><B>MAPCON</B></CODE> work like <CODE><B>MAPCAR</B></CODE> and <CODE><B>MAPLIST</B></CODE>
|
||
|
except for the way they build up their result. While <CODE><B>MAPCAR</B></CODE> and
|
||
|
<CODE><B>MAPLIST</B></CODE> build a completely new list to hold the results of the
|
||
|
function calls, <CODE><B>MAPCAN</B></CODE> and <CODE><B>MAPCON</B></CODE> build their result by
|
||
|
splicing together the results--which must be lists--as if by
|
||
|
<CODE><B>NCONC</B></CODE>. Thus, each function invocation can provide any number of
|
||
|
elements to be included in the result.<SUP>14</SUP> <CODE><B>MAPCAN</B></CODE>, like <CODE><B>MAPCAR</B></CODE>, passes the elements of the list to
|
||
|
the mapped function while <CODE><B>MAPCON</B></CODE>, like <CODE><B>MAPLIST</B></CODE>, passes the
|
||
|
cons cells.</P><P>Finally, the functions <CODE><B>MAPC</B></CODE> and <CODE><B>MAPL</B></CODE> are control constructs
|
||
|
disguised as functions--they simply return their first list argument,
|
||
|
so they're useful only when the side effects of the mapped function
|
||
|
do something interesting. <CODE><B>MAPC</B></CODE> is the cousin of <CODE><B>MAPCAR</B></CODE> and
|
||
|
<CODE><B>MAPCAN</B></CODE> while <CODE><B>MAPL</B></CODE> is in the <CODE><B>MAPLIST</B></CODE>/<CODE><B>MAPCON</B></CODE>
|
||
|
family. </P><A NAME="other-structures"><H2>Other Structures</H2></A><P>While cons cells and lists are typically considered to be synonymous,
|
||
|
that's not quite right--as I mentioned earlier, you can use lists of
|
||
|
lists to represent trees. Just as the functions discussed in this
|
||
|
chapter allow you to treat structures built out of cons cells as
|
||
|
lists, other functions allow you to use cons cells to represent
|
||
|
trees, sets, and two kinds of key/value maps. I'll discuss some of
|
||
|
those functions in the next chapter.
|
||
|
</P><HR/><DIV CLASS="notes"><P><SUP>1</SUP>Adapted from <I>The Matrix</I>
|
||
|
(<CODE>http://us.imdb.com/Quotes?0133093</CODE>)</P><P><SUP>2</SUP><CODE><B>CONS</B></CODE> was originally short for the verb
|
||
|
<I>construct</I>.</P><P><SUP>3</SUP>When the place given to <CODE><B>SETF</B></CODE> is a <CODE><B>CAR</B></CODE> or
|
||
|
<CODE><B>CDR</B></CODE>, it expands into a call to the function <CODE><B>RPLACA</B></CODE> or
|
||
|
<CODE><B>RPLACD</B></CODE>; some old-school Lispers--the same ones who still use
|
||
|
<CODE><B>SETQ</B></CODE>--will still use <CODE><B>RPLACA</B></CODE> and <CODE><B>RPLACD</B></CODE> directly, but
|
||
|
modern style is to use <CODE><B>SETF</B></CODE> of <CODE><B>CAR</B></CODE> or <CODE><B>CDR</B></CODE>.</P><P><SUP>4</SUP>Typically,
|
||
|
simple objects such as numbers are drawn within the appropriate box,
|
||
|
and more complex objects will be drawn outside the box with an arrow
|
||
|
from the box indicating the reference. This actually corresponds well
|
||
|
with how many Common Lisp implementations work--although all objects
|
||
|
are conceptually stored by reference, certain simple immutable
|
||
|
objects can be stored directly in a cons cell.</P><P><SUP>5</SUP>The phrase <I>for-side-effect</I> is used in the
|
||
|
language standard, but <I>recycling</I> is my own invention; most Lisp
|
||
|
literature simply uses the term <I>destructive</I> for both kinds of
|
||
|
operations, leading to the confusion I'm trying to dispel.</P><P><SUP>6</SUP>The string functions <CODE><B>NSTRING-CAPITALIZE</B></CODE>,
|
||
|
<CODE><B>NSTRING-DOWNCASE</B></CODE>, and <CODE><B>NSTRING-UPCASE</B></CODE> are similar--they
|
||
|
return the same results as their N-less counterparts but are
|
||
|
specified to modify their string argument in place.</P><P><SUP>7</SUP>For example, in an examination of all uses of recycling
|
||
|
functions in the Common Lisp Open Code Collection (CLOCC), a diverse
|
||
|
set of libraries written by various authors, instances of the
|
||
|
<CODE><B>PUSH</B></CODE>/<CODE><B>NREVERSE</B></CODE> idiom accounted for nearly half of all uses
|
||
|
of recycling functions.</P><P><SUP>8</SUP>There
|
||
|
are, of course, other ways to do this same thing. The extended
|
||
|
<CODE><B>LOOP</B></CODE> macro, for instance, makes it particularly easy and likely
|
||
|
generates code that's even more efficient than the <CODE><B>PUSH</B></CODE>/
|
||
|
<CODE><B>NREVERSE</B></CODE> version.</P><P><SUP>9</SUP>This idiom accounts for 30
|
||
|
percent of uses of recycling in the CLOCC code base.</P><P><SUP>10</SUP><CODE><B>SORT</B></CODE>
|
||
|
and <CODE><B>STABLE-SORT</B></CODE> can be used as for-side-effect operations on
|
||
|
vectors, but since they still return the sorted vector, you should
|
||
|
ignore that fact and use them for return values for the sake of
|
||
|
consistency.</P><P><SUP>11</SUP><CODE><B>NTH</B></CODE> is
|
||
|
roughly equivalent to the sequence function <CODE><B>ELT</B></CODE> but works only
|
||
|
with lists. Also, confusingly, <CODE><B>NTH</B></CODE> takes the index as the first
|
||
|
argument, the opposite of <CODE><B>ELT</B></CODE>. Another difference is that
|
||
|
<CODE><B>ELT</B></CODE> will signal an error if you try to access an element at an
|
||
|
index greater than or equal to the length of the list, but <CODE><B>NTH</B></CODE>
|
||
|
will return <CODE><B>NIL</B></CODE>.</P><P><SUP>12</SUP>In
|
||
|
particular, they used to be used to extract the various parts of
|
||
|
expressions passed to macros before the invention of destructuring
|
||
|
parameter lists. For example, you could take apart the following
|
||
|
expression:</P><PRE>(when (> x 10) (print x))</PRE><P>Like this:</P><PRE>;; the condition
|
||
|
(cadr '(when (> x 10) (print x))) ==> (> X 10)</PRE><PRE>;; the body, as a list
|
||
|
(cddr '(when (> x 10) (print x))) ==> ((PRINT X))</PRE><P><SUP>13</SUP>Thus, <CODE><B>MAPLIST</B></CODE> is the more primitive of the two
|
||
|
functions--if you had only <CODE><B>MAPLIST</B></CODE>, you could build <CODE><B>MAPCAR</B></CODE>
|
||
|
on top of it, but you couldn't build <CODE><B>MAPLIST</B></CODE> on top of
|
||
|
<CODE><B>MAPCAR</B></CODE>.</P><P><SUP>14</SUP>In Lisp dialects that
|
||
|
didn't have filtering functions like <CODE><B>REMOVE</B></CODE>, the idiomatic way
|
||
|
to filter a list was with <CODE><B>MAPCAN</B></CODE>.</P><PRE>(mapcan #'(lambda (x) (if (= x 10) nil (list x))) list) === (remove 10 list)</PRE></DIV></BODY></HTML>
|