439 lines
35 KiB
HTML
439 lines
35 KiB
HTML
![]() |
<HTML><HEAD><TITLE>Numbers, Characters, and Strings</TITLE><LINK REL="stylesheet" TYPE="text/css" HREF="style.css"/></HEAD><BODY><DIV CLASS="copyright">Copyright © 2003-2005, Peter Seibel</DIV><H1>10. Numbers, Characters, and Strings</H1><P>While functions, variables, macros, and 25 special operators provide
|
||
|
the basic building blocks of the language itself, the building blocks
|
||
|
of your programs will be the data structures you use. As Fred Brooks
|
||
|
observed in <I>The Mythical Man-Month</I>, "Representation <I>is</I> the
|
||
|
essence of programming."<SUP>1</SUP></P><P>Common Lisp provides built-in support for most of the data types
|
||
|
typically found in modern languages: numbers (integer, floating
|
||
|
point, and complex), characters, strings, arrays (including
|
||
|
multidimensional arrays), lists, hash tables, input and output
|
||
|
streams, and an abstraction for portably representing filenames.
|
||
|
Functions are also a first-class data type in Lisp--they can be
|
||
|
stored in variables, passed as arguments, returned as return values,
|
||
|
and created at runtime.</P><P>And these built-in types are just the beginning. They're defined in
|
||
|
the language standard so programmers can count on them being
|
||
|
available and because they tend to be easier to implement efficiently
|
||
|
when tightly integrated with the rest of the implementation. But, as
|
||
|
you'll see in later chapters, Common Lisp also provides several ways
|
||
|
for you to define new data types, define operations on them, and
|
||
|
integrate them with the built-in data types.</P><P>For now, however, you can start with the built-in data types. Because
|
||
|
Lisp is a high-level language, the details of exactly how different
|
||
|
data types are implemented are largely hidden. From your point of
|
||
|
view as a user of the language, the built-in data types are defined
|
||
|
by the functions that operate on them. So to learn a data type, you
|
||
|
just have to learn about the functions you can use with it.
|
||
|
Additionally, most of the built-in data types have a special syntax
|
||
|
that the Lisp reader understands and that the Lisp printer uses.
|
||
|
That's why, for instance, you can write strings as <CODE>"foo"</CODE>;
|
||
|
numbers as <CODE>123</CODE>, <CODE>1/23</CODE>, and <CODE>1.23</CODE>; and lists as
|
||
|
<CODE>(a b c)</CODE>. I'll describe the syntax for different kinds of
|
||
|
objects when I describe the functions for manipulating them.</P><P>In this chapter, I'll cover the built-in "scalar" data types:
|
||
|
numbers, characters, and strings. Technically, strings aren't true
|
||
|
scalars--a string is a sequence of characters, and you can access
|
||
|
individual characters and manipulate strings with a function that
|
||
|
operates on sequences. But I'll discuss strings here because most of
|
||
|
the string-specific functions manipulate them as single values and
|
||
|
also because of the close relation between several of the string
|
||
|
functions and their character counterparts.</P><A NAME="numbers"><H2>Numbers</H2></A><P>Math, as Barbie says, is hard.<SUP>2</SUP> Common
|
||
|
Lisp can't make the math part any easier, but it does tend to get in
|
||
|
the way a lot less than other programming languages. That's not
|
||
|
surprising given its mathematical heritage. Lisp was originally
|
||
|
designed by a mathematician as a tool for studying mathematical
|
||
|
functions. And one of the main projects of the MAC project at MIT was
|
||
|
the Macsyma symbolic algebra system, written in Maclisp, one of
|
||
|
Common Lisp's immediate predecessors. Additionally, Lisp has been
|
||
|
used as a teaching language at places such as MIT where even the
|
||
|
computer science professors cringe at the thought of telling their
|
||
|
students that <I>10/4 = 2</I>, leading to Lisp's support for exact
|
||
|
ratios. And at various times Lisp has been called upon to compete
|
||
|
with FORTRAN in the high-performance numeric computing arena.</P><P>One of the reasons Lisp is a nice language for math is its numbers
|
||
|
behave more like true mathematical numbers than the approximations of
|
||
|
numbers that are easy to implement in finite computer hardware. For
|
||
|
instance, integers in Common Lisp can be almost arbitrarily large
|
||
|
rather than being limited by the size of a machine
|
||
|
word.<SUP>3</SUP> And dividing two integers results in an exact
|
||
|
ratio, not a truncated value. And since ratios are represented as
|
||
|
pairs of arbitrarily sized integers, ratios can represent arbitrarily
|
||
|
precise fractions.<SUP>4</SUP></P><P>On the other hand, for high-performance numeric programming, you may
|
||
|
be willing to trade the exactitude of rationals for the speed offered
|
||
|
by using the hardware's underlying floating-point operations. So,
|
||
|
Common Lisp also offers several types of floating-point numbers,
|
||
|
which are mapped by the implementation to the appropriate
|
||
|
hardware-supported floating-point representations.<SUP>5</SUP> Floats are also used
|
||
|
to represent the results of a computation whose true mathematical
|
||
|
value would be an irrational number.</P><P>Finally, Common Lisp supports complex numbers--the numbers that
|
||
|
result from doing things such as taking square roots and logarithms
|
||
|
of negative numbers. The Common Lisp standard even goes so far as to
|
||
|
specify the principal values and branch cuts for irrational and
|
||
|
transcendental functions on the complex domain. </P><A NAME="numeric-literals"><H2>Numeric Literals</H2></A><P>You can write numeric literals in a variety of ways; you saw a few
|
||
|
examples in Chapter 4. However, it's important to keep in mind the
|
||
|
division of labor between the Lisp reader and the Lisp evaluator--the
|
||
|
reader is responsible for translating text into Lisp objects, and the
|
||
|
Lisp evaluator then deals only with those objects. For a given number
|
||
|
of a given type, there can be many different textual representations,
|
||
|
all of which will be translated to the same object representation by
|
||
|
the Lisp reader. For instance, you can write the integer 10 as
|
||
|
<CODE>10</CODE>, <CODE>20/2</CODE>, <CODE>#xA</CODE>, or any of a number of other ways,
|
||
|
but the reader will translate all these to the same object. When
|
||
|
numbers are printed back out--say, at the REPL--they're printed in a
|
||
|
canonical textual syntax that may be different from the syntax used
|
||
|
to enter the number. For example:</P><PRE>CL-USER> 10
|
||
|
10
|
||
|
CL-USER> 20/2
|
||
|
10
|
||
|
CL-USER> #xa
|
||
|
10</PRE><P>The syntax for integer values is an optional sign (<CODE>+</CODE> or
|
||
|
<CODE>-</CODE>) followed by one or more digits. Ratios are written as an
|
||
|
optional sign and a sequence of digits, representing the numerator, a
|
||
|
slash (<CODE>/</CODE>), and another sequence of digits representing the
|
||
|
denominator. All rational numbers are "canonicalized" as they're
|
||
|
read--that's why <CODE>10</CODE> and <CODE>20/2</CODE> are both read as the same
|
||
|
number, as are <CODE>3/4</CODE> and <CODE>6/8</CODE>. Rationals are printed in
|
||
|
"reduced" form--integer values are printed in integer syntax and
|
||
|
ratios with the numerator and denominator reduced to lowest terms. </P><P>It's also possible to write rationals in bases other than 10. If
|
||
|
preceded by <CODE>#B</CODE> or <CODE>#b</CODE>, a rational literal is read as a
|
||
|
binary number with <CODE>0</CODE> and <CODE>1</CODE> as the only legal digits. An
|
||
|
<CODE>#O</CODE> or <CODE>#o</CODE> indicates an octal number (legal digits
|
||
|
<CODE>0</CODE>-<CODE>7</CODE>), and <CODE>#X</CODE> or <CODE>#x</CODE> indicates hexadecimal
|
||
|
(legal digits <CODE>0</CODE>-<CODE>F</CODE> or <CODE>0</CODE>-<CODE>f</CODE>). You can write
|
||
|
rationals in other bases from 2 to 36 with <CODE>#nR</CODE> where <I>n</I> is
|
||
|
the base (always written in decimal). Additional "digits" beyond 9
|
||
|
are taken from the letters <CODE>A</CODE>-<CODE>Z</CODE> or <CODE>a</CODE>-<CODE>z</CODE>.
|
||
|
Note that these radix indicators apply to the whole rational--it's
|
||
|
not possible to write a ratio with the numerator in one base and
|
||
|
denominator in another. Also, you can write integer values, but not
|
||
|
ratios, as decimal digits terminated with a decimal point.<SUP>6</SUP> Some examples of rationals,
|
||
|
with their canonical, decimal representation are as follows: </P><PRE>123 ==> 123
|
||
|
+123 ==> 123
|
||
|
-123 ==> -123
|
||
|
123. ==> 123
|
||
|
2/3 ==> 2/3
|
||
|
-2/3 ==> -2/3
|
||
|
4/6 ==> 2/3
|
||
|
6/3 ==> 2
|
||
|
#b10101 ==> 21
|
||
|
#b1010/1011 ==> 10/11
|
||
|
#o777 ==> 511
|
||
|
#xDADA ==> 56026
|
||
|
#36rABCDEFGHIJKLMNOPQRSTUVWXYZ ==> 8337503854730415241050377135811259267835</PRE><P>You can also write floating-point numbers in a variety of ways.
|
||
|
Unlike rational numbers, the syntax used to notate a floating-point
|
||
|
number can affect the actual type of number read. Common Lisp defines
|
||
|
four subtypes of floating-point number: short, single, double, and
|
||
|
long. Each subtype can use a different number of bits in its
|
||
|
representation, which means each subtype can represent values
|
||
|
spanning a different range and with different precision. More bits
|
||
|
gives a wider range and more precision.<SUP>7</SUP> </P><P>The basic format for floating-point numbers is an optional sign
|
||
|
followed by a nonempty sequence of decimal digits possibly with an
|
||
|
embedded decimal point. This sequence can be followed by an exponent
|
||
|
marker for "computerized scientific notation."<SUP>8</SUP> The exponent marker consists of a single
|
||
|
letter followed by an optional sign and a sequence of digits, which
|
||
|
are interpreted as the power of ten by which the number before the
|
||
|
exponent marker should be multiplied. The letter does double duty: it
|
||
|
marks the beginning of the exponent and indicates what floating-
|
||
|
point representation should be used for the number. The exponent
|
||
|
markers <I>s</I>, <I>f</I>, <I>d</I>, <I>l</I> (and their uppercase equivalents)
|
||
|
indicate short, single, double, and long floats, respectively. The
|
||
|
letter <I>e</I> indicates that the default representation (initially
|
||
|
single-float) should be used.</P><P>Numbers with no exponent marker are read in the default
|
||
|
representation and must contain a decimal point followed by at least
|
||
|
one digit to distinguish them from integers. The digits in a
|
||
|
floating-point number are always treated as base 10 digits--the
|
||
|
<CODE>#B</CODE>, <CODE>#X</CODE>, <CODE>#O</CODE>, and <CODE>#R</CODE> syntaxes work only
|
||
|
with rationals. The following are some example floating-point numbers
|
||
|
along with their canonical representation:</P><PRE>1.0 ==> 1.0
|
||
|
1e0 ==> 1.0
|
||
|
1d0 ==> 1.0d0
|
||
|
123.0 ==> 123.0
|
||
|
123e0 ==> 123.0
|
||
|
0.123 ==> 0.123
|
||
|
.123 ==> 0.123
|
||
|
123e-3 ==> 0.123
|
||
|
123E-3 ==> 0.123
|
||
|
0.123e20 ==> 1.23e+19
|
||
|
123d23 ==> 1.23d+25</PRE><P>Finally, complex numbers are written in their own syntax, namely,
|
||
|
<CODE>#C</CODE> or <CODE>#c</CODE> followed by a list of two real numbers
|
||
|
representing the real and imaginary part of the complex number. There
|
||
|
are actually five kinds of complex numbers because the real and
|
||
|
imaginary parts must either both be rational or both be the same kind
|
||
|
of floating-point number. </P><P>But you can write them however you want--if a complex is written with
|
||
|
one rational and one floating-point part, the rational is converted
|
||
|
to a float of the appropriate representation. Similarly, if the real
|
||
|
and imaginary parts are both floats of different representations, the
|
||
|
one in the smaller representation will be <I>upgraded</I>.</P><P>However, no complex numbers have a rational real component and a zero
|
||
|
imaginary part--since such values are, mathematically speaking,
|
||
|
rational, they're represented by the appropriate rational value. The
|
||
|
same mathematical argument could be made for complex numbers with
|
||
|
floating-point components, but for those complex types a number with
|
||
|
a zero imaginary part is always a different object than the
|
||
|
floating-point number representing the real component. Here are some
|
||
|
examples of numbers written the complex number syntax: </P><PRE>#c(2 1) ==> #c(2 1)
|
||
|
#c(2/3 3/4) ==> #c(2/3 3/4)
|
||
|
#c(2 1.0) ==> #c(2.0 1.0)
|
||
|
#c(2.0 1.0d0) ==> #c(2.0d0 1.0d0)
|
||
|
#c(1/2 1.0) ==> #c(0.5 1.0)
|
||
|
#c(3 0) ==> 3
|
||
|
#c(3.0 0.0) ==> #c(3.0 0.0)
|
||
|
#c(1/2 0) ==> 1/2
|
||
|
#c(-6/3 0) ==> -2</PRE><A NAME="basic-math"><H2>Basic Math</H2></A><P>The basic arithmetic operations--addition, subtraction,
|
||
|
multiplication, and division--are supported for all the different
|
||
|
kinds of Lisp numbers with the functions <CODE><B>+</B></CODE>, <CODE><B>-</B></CODE>, <CODE><B>*</B></CODE>, and
|
||
|
<CODE><B>/</B></CODE>. Calling any of these functions with more than two arguments is
|
||
|
equivalent to calling the same function on the first two arguments and
|
||
|
then calling it again on the resulting value and the rest of the
|
||
|
arguments. For example, <CODE>(+ 1 2 3)</CODE> is equivalent to <CODE>(+ (+
|
||
|
1 2) 3)</CODE>. With only one argument, <CODE><B>+</B></CODE> and <CODE><B>*</B></CODE> return the value;
|
||
|
<CODE><B>-</B></CODE> returns its negation and <CODE><B>/</B></CODE> its reciprocal.<SUP>9</SUP></P><PRE>(+ 1 2) ==> 3
|
||
|
(+ 1 2 3) ==> 6
|
||
|
(+ 10.0 3.0) ==> 13.0
|
||
|
(+ #c(1 2) #c(3 4)) ==> #c(4 6)
|
||
|
(- 5 4) ==> 1
|
||
|
(- 2) ==> -2
|
||
|
(- 10 3 5) ==> 2
|
||
|
(* 2 3) ==> 6
|
||
|
(* 2 3 4) ==> 24
|
||
|
(/ 10 5) ==> 2
|
||
|
(/ 10 5 2) ==> 1
|
||
|
(/ 2 3) ==> 2/3
|
||
|
(/ 4) ==> 1/4</PRE><P>If all the arguments are the same type of number (rational, floating
|
||
|
point, or complex), the result will be the same type except in the
|
||
|
case where the result of an operation on complex numbers with
|
||
|
rational components yields a number with a zero imaginary part, in
|
||
|
which case the result will be a rational. However, floating-point and
|
||
|
complex numbers are <I>contagious</I>--if all the arguments are reals
|
||
|
but one or more are floating-point numbers, the other arguments are
|
||
|
converted to the nearest floating-point value in a "largest"
|
||
|
floating-point representation of the actual floating-point arguments.
|
||
|
Floating-point numbers in a "smaller" representation are also
|
||
|
converted to the larger representation. Similarly, if any of the
|
||
|
arguments are complex, any real arguments are converted to the
|
||
|
complex equivalents. </P><PRE>(+ 1 2.0) ==> 3.0
|
||
|
(/ 2 3.0) ==> 0.6666667
|
||
|
(+ #c(1 2) 3) ==> #c(4 2)
|
||
|
(+ #c(1 2) 3/2) ==> #c(5/2 2)
|
||
|
(+ #c(1 1) #c(2 -1)) ==> 3</PRE><P>Because <CODE><B>/</B></CODE> doesn't truncate, Common Lisp provides four flavors of
|
||
|
truncating and rounding for converting a real number (rational or
|
||
|
floating point) to an integer: <CODE><B>FLOOR</B></CODE> truncates toward negative
|
||
|
infinity, returning the largest integer less than or equal to the
|
||
|
argument. <CODE><B>CEILING</B></CODE> truncates toward positive infinity, returning
|
||
|
the smallest integer greater than or equal to the argument.
|
||
|
<CODE><B>TRUNCATE</B></CODE> truncates toward zero, making it equivalent to
|
||
|
<CODE><B>FLOOR</B></CODE> for positive arguments and to <CODE><B>CEILING</B></CODE> for negative
|
||
|
arguments. And <CODE><B>ROUND</B></CODE> rounds to the nearest integer. If the
|
||
|
argument is exactly halfway between two integers, it rounds to the
|
||
|
nearest even integer. </P><P>Two related functions are <CODE><B>MOD</B></CODE> and <CODE><B>REM</B></CODE>, which return the
|
||
|
modulus and remainder of a truncating division on real numbers. These
|
||
|
two functions are related to the <CODE><B>FLOOR</B></CODE> and <CODE><B>TRUNCATE</B></CODE>
|
||
|
functions as follows:</P><PRE>(+ (* (floor (/ x y)) y) (mod x y)) === x
|
||
|
(+ (* (truncate (/ x y)) y) (rem x y)) === x</PRE><P>Thus, for positive quotients they're equivalent, but for negative
|
||
|
quotients they produce different results.<SUP>10</SUP></P><P>The functions <CODE><B>1+</B></CODE> and <CODE><B>1-</B></CODE> provide a shorthand way to express
|
||
|
adding and subtracting one from a number. Note that these are
|
||
|
different from the macros <CODE><B>INCF</B></CODE> and <CODE><B>DECF</B></CODE>. <CODE><B>1+</B></CODE> and
|
||
|
<CODE><B>1-</B></CODE> are just functions that return a new value, but <CODE><B>INCF</B></CODE> and
|
||
|
<CODE><B>DECF</B></CODE> modify a place. The following equivalences show the
|
||
|
relation between <CODE><B>INCF</B></CODE>/<CODE><B>DECF</B></CODE>, <CODE><B>1+</B></CODE>/<CODE><B>1-</B></CODE>, and
|
||
|
<CODE><B>+</B></CODE>/<CODE><B>-</B></CODE>:</P><PRE>(incf x) === (setf x (1+ x)) === (setf x (+ x 1))
|
||
|
(decf x) === (setf x (1- x)) === (setf x (- x 1))
|
||
|
(incf x 10) === (setf x (+ x 10))
|
||
|
(decf x 10) === (setf x (- x 10))</PRE><A NAME="numeric-comparisons"><H2>Numeric Comparisons</H2></A><P>The function <CODE><B>=</B></CODE> is the numeric equality predicate. It compares
|
||
|
numbers by mathematical value, ignoring differences in type. Thus,
|
||
|
<CODE><B>=</B></CODE> will consider mathematically equivalent values of different
|
||
|
types equivalent while the generic equality predicate <CODE><B>EQL</B></CODE> would
|
||
|
consider them inequivalent because of the difference in type. (The
|
||
|
generic equality predicate <CODE><B>EQUALP</B></CODE>, however, uses <CODE><B>=</B></CODE> to
|
||
|
compare numbers.) If it's called with more than two arguments, it
|
||
|
returns true only if they all have the same value. Thus:</P><PRE>(= 1 1) ==> T
|
||
|
(= 10 20/2) ==> T
|
||
|
(= 1 1.0 #c(1.0 0.0) #c(1 0)) ==> T</PRE><P>The <CODE><B>/=</B></CODE> function, conversely, returns true only if <I>all</I> its
|
||
|
arguments are different values.</P><PRE>(/= 1 1) ==> NIL
|
||
|
(/= 1 2) ==> T
|
||
|
(/= 1 2 3) ==> T
|
||
|
(/= 1 2 3 1) ==> NIL
|
||
|
(/= 1 2 3 1.0) ==> NIL</PRE><P>The functions <CODE><B><</B></CODE>, <CODE><B>></B></CODE>, <CODE><B><=</B></CODE>, and <CODE><B>>=</B></CODE> order rationals
|
||
|
and floating-point numbers (in other words, the real numbers.) Like
|
||
|
<CODE><B>=</B></CODE> and <CODE><B>/=</B></CODE>, these functions can be called with more than two
|
||
|
arguments, in which case each argument is compared to the argument to
|
||
|
its right. </P><PRE>(< 2 3) ==> T
|
||
|
(> 2 3) ==> NIL
|
||
|
(> 3 2) ==> T
|
||
|
(< 2 3 4) ==> T
|
||
|
(< 2 3 3) ==> NIL
|
||
|
(<= 2 3 3) ==> T
|
||
|
(<= 2 3 3 4) ==> T
|
||
|
(<= 2 3 4 3) ==> NIL</PRE><P>To pick out the smallest or largest of several numbers, you can use
|
||
|
the function <CODE><B>MIN</B></CODE> or <CODE><B>MAX</B></CODE>, which takes any number of real
|
||
|
number arguments and returns the minimum or maximum value.</P><PRE>(max 10 11) ==> 11
|
||
|
(min -12 -10) ==> -12
|
||
|
(max -1 2 -3) ==> 2</PRE><P>Some other handy functions are <CODE><B>ZEROP</B></CODE>, <CODE><B>MINUSP</B></CODE>, and
|
||
|
<CODE><B>PLUSP</B></CODE>, which test whether a single real number is equal to, less
|
||
|
than, or greater than zero. Two other predicates, <CODE><B>EVENP</B></CODE> and
|
||
|
<CODE><B>ODDP</B></CODE>, test whether a single integer argument is even or odd. The
|
||
|
<I>P</I> suffix on the names of these functions is a standard naming
|
||
|
convention for predicate functions, functions that test some
|
||
|
condition and return a boolean. </P><A NAME="higher-math"><H2>Higher Math</H2></A><P>The functions you've seen so far are the beginning of the built-in
|
||
|
mathematical functions. Lisp also supports logarithms: <CODE><B>LOG</B></CODE>;
|
||
|
exponentiation: <CODE><B>EXP</B></CODE> and <CODE><B>EXPT</B></CODE>; the basic trigonometric
|
||
|
functions: <CODE><B>SIN</B></CODE>, <CODE><B>COS</B></CODE>, and <CODE><B>TAN</B></CODE>; their inverses:
|
||
|
<CODE><B>ASIN</B></CODE>, <CODE><B>ACOS</B></CODE>, and <CODE><B>ATAN</B></CODE>; hyperbolic functions: <CODE><B>SINH</B></CODE>,
|
||
|
<CODE><B>COSH</B></CODE>, and <CODE><B>TANH</B></CODE>; and their inverses: <CODE><B>ASINH</B></CODE>, <CODE><B>ACOSH</B></CODE>,
|
||
|
and <CODE><B>ATANH</B></CODE>. It also provides functions to get at the individual
|
||
|
bits of an integer and to extract the parts of a ratio or a complex
|
||
|
number. For a complete list, see any Common Lisp reference.</P><A NAME="characters"><H2>Characters</H2></A><P>Common Lisp characters are a distinct type of object from numbers.
|
||
|
That's as it should be--characters are <I>not</I> numbers, and
|
||
|
languages that treat them as if they are tend to run into problems
|
||
|
when character encodings change, say, from 8-bit ASCII to 21-bit
|
||
|
Unicode.<SUP>11</SUP> Because the Common Lisp
|
||
|
standard didn't mandate a particular representation for characters,
|
||
|
today several Lisp implementations use Unicode as their "native"
|
||
|
character encoding despite Unicode being only a gleam in a standards
|
||
|
body's eye at the time Common Lisp's own standardization was being
|
||
|
wrapped up.</P><P>The read syntax for characters objects is simple: <CODE>#\</CODE> followed
|
||
|
by the desired character. Thus, <CODE>#\x</CODE> is the character <CODE>x</CODE>.
|
||
|
Any character can be used after the <CODE>#\</CODE>, including otherwise
|
||
|
special characters such as <CODE>"</CODE>, <CODE>(</CODE>, and whitespace.
|
||
|
However, writing whitespace characters this way isn't very (human)
|
||
|
readable; an alternative syntax for certain characters is <CODE>#\</CODE>
|
||
|
followed by the character's name. Exactly what names are supported
|
||
|
depends on the character set and on the Lisp implementation, but all
|
||
|
implementations support the names <I>Space</I> and <I>Newline</I>. Thus,
|
||
|
you should write <CODE>#\Space</CODE> instead of <CODE>#\ </CODE>, though the
|
||
|
latter is technically legal. Other semistandard names (that
|
||
|
implementations must use if the character set has the appropriate
|
||
|
characters) are <I>Tab</I>, <I>Page</I>, <I>Rubout</I>, <I>Linefeed</I>,
|
||
|
<I>Return</I>, and <I>Backspace</I>. </P><A NAME="character-comparisons"><H2>Character Comparisons</H2></A><P>The main thing you can do with characters, other than putting them
|
||
|
into strings (which I'll get to later in this chapter), is to compare
|
||
|
them with other characters. Since characters aren't numbers, you
|
||
|
can't use the numeric comparison functions, such as <CODE><B><</B></CODE> and
|
||
|
<CODE><B>></B></CODE>. Instead, two sets of functions provide character-specific
|
||
|
analogs to the numeric comparators; one set is case-sensitive and
|
||
|
the other case-insensitive.</P><P>The case-sensitive analog to the numeric <CODE><B>=</B></CODE> is the function
|
||
|
<CODE><B>CHAR=</B></CODE>. Like <CODE><B>=</B></CODE>, <CODE><B>CHAR=</B></CODE> can take any number of arguments
|
||
|
and returns true only if they're all the same character. The case-
|
||
|
insensitive version is <CODE><B>CHAR-EQUAL</B></CODE>.</P><P>The rest of the character comparators follow this same naming scheme:
|
||
|
the case-sensitive comparators are named by prepending the analogous
|
||
|
numeric comparator with <CODE>CHAR</CODE>; the case-insensitive versions
|
||
|
spell out the comparator name, separated from the <CODE>CHAR</CODE> with a
|
||
|
hyphen. Note, however, that <CODE><=</CODE> and <CODE>>=</CODE> are "spelled out"
|
||
|
with the logical equivalents <CODE><B>NOT-GREATERP</B></CODE> and <CODE><B>NOT-LESSP</B></CODE>
|
||
|
rather than the more verbose <CODE>LESSP-OR-EQUALP</CODE> and
|
||
|
<CODE>GREATERP-OR-EQUALP</CODE>. Like their numeric counterparts, all these
|
||
|
functions can take one or more arguments. Table 10-1 summarizes the
|
||
|
relation between the numeric and character comparison functions. </P><P><DIV CLASS="table-caption">Table 10-1. Character Comparison Functions</DIV></P><TABLE CLASS="book-table"><TR><TD><B>Numeric Analog</B></TD><TD><B>Case-Sensitive</B></TD><TD><B>Case-Insensitive</B></TD></TR><TR><TD><CODE><B>=</B></CODE></TD><TD><CODE><B>CHAR=</B></CODE></TD><TD><CODE><B>CHAR-EQUAL</B></CODE></TD></TR><TR><TD><CODE><B>/=</B></CODE></TD><TD><CODE><B>CHAR/=</B></CODE></TD><TD><CODE><B>CHAR-NOT-EQUAL</B></CODE></TD></TR><TR><TD><CODE><B><</B></CODE></TD><TD><CODE><B>CHAR<</B></CODE></TD><TD><CODE><B>CHAR-LESSP</B></CODE></TD></TR><TR><TD><CODE><B>></B></CODE></TD><TD><CODE><B>CHAR></B></CODE></TD><TD><CODE><B>CHAR-GREATERP</B></CODE></TD></TR><TR><TD><CODE><B><=</B></CODE></TD><TD><CODE><B>CHAR<=</B></CODE></TD><TD><CODE><B>CHAR-NOT-GREATERP</B></CODE></TD></TR><TR><TD><CODE><B>>=</B></CODE></TD><TD><CODE><B>CHAR>=</B></CODE></TD><TD><CODE><B>CHAR-NOT-LESSP</B></CODE></TD></TR></TABLE><P>Other functions that deal with characters provide functions for, among other things, testing whether a given character is alphabetic or
|
||
|
a digit character, testing the case of a character, obtaining a
|
||
|
corresponding character in a different case, and translating between
|
||
|
numeric values representing character codes and actual character
|
||
|
objects. Again, for complete details, see your favorite Common Lisp
|
||
|
reference. </P><A NAME="strings"><H2>Strings</H2></A><P>As mentioned earlier, strings in Common Lisp are really a composite
|
||
|
data type, namely, a one-dimensional array of characters.
|
||
|
Consequently, I'll cover many of the things you can do with strings in
|
||
|
the next chapter when I discuss the many functions for manipulating
|
||
|
sequences, of which strings are just one type. But strings also have
|
||
|
their own literal syntax and a library of functions for performing
|
||
|
string-specific operations. I'll discuss these aspects of strings in
|
||
|
this chapter and leave the others for Chapter 11.</P><P>As you've seen, literal strings are written enclosed in double
|
||
|
quotes. You can include any character supported by the character set
|
||
|
in a literal string except double quote (<CODE>"</CODE>) and backslash (\).
|
||
|
And you can include these two as well if you escape them with a
|
||
|
backslash. In fact, backslash always escapes the next character,
|
||
|
whatever it is, though this isn't necessary for any character except
|
||
|
for <CODE>"</CODE> and itself. Table 10-2 shows how various literal
|
||
|
strings will be read by the Lisp reader.</P><P><DIV CLASS="table-caption">Table 10-2. Literal Strings</DIV></P><TABLE CLASS="book-table"><TR><TD><B>Literal</B></TD><TD><B>Contents</B></TD><TD><B>Comment</B></TD></TR><TR><TD><CODE>"foobar"</CODE></TD><TD>foobar</TD><TD>Plain string.</TD></TR><TR><TD><CODE>"foo\"bar"</CODE></TD><TD>foo"bar</TD><TD>The backslash escapes quote.</TD></TR><TR><TD><CODE>"foo\\bar"</CODE></TD><TD>foo\bar</TD><TD>The first backslash escapes second backslash.</TD></TR><TR><TD><CODE>"\"foobar\""</CODE></TD><TD>"foobar"</TD><TD>The backslashes escape quotes.</TD></TR><TR><TD><CODE>"foo\bar"</CODE></TD><TD>foobar</TD><TD>The backslash "escapes" <I>b</I></TD></TR></TABLE><P>Note that the REPL will ordinarily print strings in readable form,
|
||
|
adding the enclosing quotation marks and any necessary escaping
|
||
|
backslashes, so if you want to see the actual contents of a string,
|
||
|
you need to use function such as <CODE><B>FORMAT</B></CODE> designed to print
|
||
|
human-readable output. For example, here's what you see if you type a
|
||
|
string containing an embedded quotation mark at the REPL: </P><PRE>CL-USER> "foo\"bar"
|
||
|
"foo\"bar"</PRE><P><CODE><B>FORMAT</B></CODE>, on the other hand, will show you the actual string
|
||
|
contents:<SUP>12</SUP> </P><PRE>CL-USER> (format t "foo\"bar")
|
||
|
foo"bar
|
||
|
NIL</PRE><A NAME="string-comparisons"><H2>String Comparisons</H2></A><P>You can compare strings using a set of functions that follow the same
|
||
|
naming convention as the character comparison functions except with
|
||
|
<CODE>STRING</CODE> as the prefix rather than <CODE>CHAR</CODE> (see Table 10-3).</P><P><DIV CLASS="table-caption">Table 10-3. String Comparison Functions</DIV></P><TABLE CLASS="book-table"><TR><TD><B>Numeric Analog</B></TD><TD><B>Case-Sensitive</B></TD><TD><B>Case-Insensitive</B></TD></TR><TR><TD><CODE><B>=</B></CODE></TD><TD><CODE><B>STRING=</B></CODE></TD><TD><CODE><B>STRING-EQUAL</B></CODE></TD></TR><TR><TD><CODE><B>/=</B></CODE></TD><TD><CODE><B>STRING/=</B></CODE></TD><TD><CODE><B>STRING-NOT-EQUAL</B></CODE></TD></TR><TR><TD><CODE><B><</B></CODE></TD><TD><CODE><B>STRING<</B></CODE></TD><TD><CODE><B>STRING-LESSP</B></CODE></TD></TR><TR><TD><CODE><B>></B></CODE></TD><TD><CODE><B>STRING></B></CODE></TD><TD><CODE><B>STRING-GREATERP</B></CODE></TD></TR><TR><TD><CODE><B><=</B></CODE></TD><TD><CODE><B>STRING<=</B></CODE></TD><TD><CODE><B>STRING-NOT-GREATERP</B></CODE></TD></TR><TR><TD><CODE><B>>=</B></CODE></TD><TD><CODE><B>STRING>=</B></CODE></TD><TD><CODE><B>STRING-NOT-LESSP</B></CODE></TD></TR></TABLE><P>However, unlike the character and number comparators, the string
|
||
|
comparators can compare only two strings. That's because they also
|
||
|
take keyword arguments that allow you to restrict the comparison to a
|
||
|
substring of either or both strings. The arguments--<CODE>:start1</CODE>,
|
||
|
<CODE>:end1</CODE>, <CODE>:start2</CODE>, and <CODE>:end2</CODE>--specify the starting
|
||
|
(inclusive) and ending (exclusive) indices of substrings in the first
|
||
|
and second string arguments. Thus, the following: </P><PRE>(string= "foobarbaz" "quuxbarfoo" :start1 3 :end1 6 :start2 4 :end2 7)</PRE><P>compares the substring "bar" in the two arguments and returns true.
|
||
|
The <CODE>:end1</CODE> and <CODE>:end2</CODE> arguments can be <CODE><B>NIL</B></CODE> (or the
|
||
|
keyword argument omitted altogether) to indicate that the
|
||
|
corresponding substring extends to the end of the string.</P><P>The comparators that return true when their arguments differ--that
|
||
|
is, all of them except <CODE><B>STRING=</B></CODE> and <CODE><B>STRING-EQUAL</B></CODE>--return the
|
||
|
index in the first string where the mismatch was detected.</P><PRE>(string/= "lisp" "lissome") ==> 3</PRE><P>If the first string is a prefix of the second, the return value will
|
||
|
be the length of the first string, that is, one greater than the
|
||
|
largest valid index into the string.</P><PRE>(string< "lisp" "lisper") ==> 4</PRE><P>When comparing substrings, the resulting value is still an index into
|
||
|
the string as a whole. For instance, the following compares the
|
||
|
substrings "bar" and "baz" but returns 5 because that's the index of
|
||
|
the <I>r</I> in the first string: </P><PRE>(string< "foobar" "abaz" :start1 3 :start2 1) ==> 5 ; N.B. not 2</PRE><P>Other string functions allow you to convert the case of strings and
|
||
|
trim characters from one or both ends of a string. And, as I
|
||
|
mentioned previously, since strings are really a kind of sequence,
|
||
|
all the sequence functions I'll discuss in the next chapter can be
|
||
|
used with strings. For instance, you can discover the length of a
|
||
|
string with the <CODE><B>LENGTH</B></CODE> function and can get and set individual
|
||
|
characters of a string with the generic sequence element accessor
|
||
|
function, <CODE><B>ELT</B></CODE>, or the generic array element accessor function,
|
||
|
<CODE><B>AREF</B></CODE>. Or you can use the string-specific accessor, <CODE><B>CHAR</B></CODE>.
|
||
|
But those functions, and others, are the topic of the next chapter,
|
||
|
so let's move on.
|
||
|
</P><HR/><DIV CLASS="notes"><P><SUP>1</SUP>Fred Brooks, <I>The Mythical Man-Month</I>,
|
||
|
20th Anniversary Edition (Boston: Addison-Wesley, 1995), p. 103.
|
||
|
Emphasis in original.</P><P><SUP>2</SUP>Mattel's Teen Talk Barbie</P><P><SUP>3</SUP>Obviously, the size of a number that can be represented on
|
||
|
a computer with finite memory is still limited in practice;
|
||
|
furthermore, the actual representation of <I>bignums</I> used in a
|
||
|
particular Common Lisp implementation may place other limits on the
|
||
|
size of number that can be represented. But these limits are going to
|
||
|
be well beyond "astronomically" large numbers. For instance, the
|
||
|
number of atoms in the universe is estimated to be less than 2^269;
|
||
|
current Common Lisp implementations can easily handle numbers up to
|
||
|
and beyond 2^262144.</P><P><SUP>4</SUP>Folks interested in using Common Lisp for
|
||
|
intensive numeric computation should note that a naive comparison of
|
||
|
the performance of numeric code in Common Lisp and languages such as
|
||
|
C or FORTRAN will probably show Common Lisp to be much slower. This
|
||
|
is because something as simple as <CODE>(+ a b)</CODE> in Common Lisp is
|
||
|
doing a lot more than the seemingly equivalent <CODE>a + b</CODE> in one of
|
||
|
those languages. Because of Lisp's dynamic typing and support for
|
||
|
things such as arbitrary precision rationals and complex numbers, a
|
||
|
seemingly simple addition is doing a lot more than an addition of two
|
||
|
numbers that are known to be represented by machine words. However,
|
||
|
you can use declarations to give Common Lisp information about the
|
||
|
types of numbers you're using that will enable it to generate code
|
||
|
that does only as much work as the code that would be generated by a
|
||
|
C or FORTRAN compiler. Tuning numeric code for this kind of
|
||
|
performance is beyond the scope of this book, but it's certainly
|
||
|
possible.</P><P><SUP>5</SUP>While the
|
||
|
standard doesn't require it, many Common Lisp implementations support
|
||
|
the IEEE standard for floating-point arithmetic, <I>IEEE Standard for
|
||
|
Binary Floating-Point Arithmetic, ANSI/ IEEE Std 754-1985</I> (Institute
|
||
|
of Electrical and Electronics Engineers, 1985).</P><P><SUP>6</SUP>It's
|
||
|
also possible to change the default base the reader uses for numbers
|
||
|
without a specific radix marker by changing the value of the global
|
||
|
variable <CODE><B>*READ-BASE*</B></CODE>. However, it's not clear that's the path to
|
||
|
anything other than complete insanity.</P><P><SUP>7</SUP>Since the purpose of
|
||
|
floating-point numbers is to make efficient use of floating-point
|
||
|
hardware, each Lisp implementation is allowed to map these four
|
||
|
subtypes onto the native floating-point types as appropriate. If the
|
||
|
hardware supports fewer than four distinct representations, one or
|
||
|
more of the types may be equivalent.</P><P><SUP>8</SUP>"Computerized
|
||
|
scientific notation" is in scare quotes because, while commonly used
|
||
|
in computer languages since the days of FORTRAN, it's actually quite
|
||
|
different from real scientific notation. In particular, something
|
||
|
like <CODE>1.0e4</CODE> means <CODE>10000.0</CODE>, but in true scientific
|
||
|
notation that would be written as 1.0 x 10^4. And to further confuse
|
||
|
matters, in true scientific notation the letter <I>e</I> stands for the
|
||
|
base of the natural logarithm, so something like 1.0 x <I>e</I>^4, while
|
||
|
superficially similar to <CODE>1.0e4</CODE>, is a completely different
|
||
|
value, approximately 54.6.</P><P><SUP>9</SUP>For
|
||
|
mathematical consistency, <CODE><B>+</B></CODE> and <CODE><B>*</B></CODE> can also be called with no
|
||
|
arguments, in which case they return the appropriate identity: 0 for
|
||
|
<CODE><B>+</B></CODE> and 1 for <CODE><B>*</B></CODE>.</P><P><SUP>10</SUP>Roughly speaking,
|
||
|
<CODE><B>MOD</B></CODE> is equivalent to the <CODE>%</CODE> operator in Perl and Python,
|
||
|
and <CODE><B>REM</B></CODE> is equivalent to the % in C and Java. (Technically, the
|
||
|
exact behavior of % in C wasn't specified until the C99 standard.)</P><P><SUP>11</SUP>Even Java, which was designed from the beginning to use
|
||
|
Unicode characters on the theory that Unicode was the going to be
|
||
|
<I>the</I> character encoding of the future, has run into trouble since
|
||
|
Java characters are defined to be a 16-bit quantity and the Unicode
|
||
|
3.1 standard extended the range of the Unicode character set to
|
||
|
require a 21-bit representation. Ooops.</P><P><SUP>12</SUP>Note, however, that not all literal strings can be
|
||
|
printed by passing them as the second argument to <CODE><B>FORMAT</B></CODE> since
|
||
|
certain sequences of characters have a special meaning to
|
||
|
<CODE><B>FORMAT</B></CODE>. To safely print an arbitrary string--say, the value of a
|
||
|
variable s--with <CODE><B>FORMAT</B></CODE> you should write (format t "~a" s).</P></DIV></BODY></HTML>
|