244 lines
12 KiB
HTML
244 lines
12 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||
|
<!Converted with LaTeX2HTML 0.6.5 (Tue Nov 15 1994) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||
|
<HEAD>
|
||
|
<TITLE>18.3. String Construction and Manipulation</TITLE>
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
<meta name="description" value=" String Construction and Manipulation">
|
||
|
<meta name="keywords" value="clm">
|
||
|
<meta name="resource-type" value="document">
|
||
|
<meta name="distribution" value="global">
|
||
|
<P>
|
||
|
<b>Common Lisp the Language, 2nd Edition</b>
|
||
|
<BR> <HR><A NAME=tex2html3608 HREF="node168.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3606 HREF="node164.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3602 HREF="node166.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3610 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3611 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
|
||
|
<B> Next:</B> <A NAME=tex2html3609 HREF="node168.html"> Structures</A>
|
||
|
<B>Up:</B> <A NAME=tex2html3607 HREF="node164.html"> Strings</A>
|
||
|
<B> Previous:</B> <A NAME=tex2html3603 HREF="node166.html"> String Comparison</A>
|
||
|
<HR> <P>
|
||
|
<H1><A NAME=SECTION002230000000000000000>18.3. String Construction and Manipulation</A></H1>
|
||
|
<P>
|
||
|
Most of the interesting operations on strings may be performed
|
||
|
with the generic sequence functions described in chapter <A HREF="node141.html#KSEQUE">14</A>.
|
||
|
The following functions perform additional operations that are specific
|
||
|
to strings.
|
||
|
<p>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
Note that this remark, predating the design of the Common Lisp Object System,
|
||
|
uses the term ``generic'' in a generic sense and not necessarily
|
||
|
in the technical sense used by CLOS
|
||
|
(see chapter <A HREF="node15.html#DTYPES">2</A>).
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
<img align=bottom alt="old_change_begin" src="gif/old_change_begin.gif"><br>
|
||
|
<BR><b>[Function]</b><BR>
|
||
|
<tt>make-string <i>size</i> &key :initial-element</tt><P>This returns a string (in fact a simple string)
|
||
|
of length <i>size</i>, each of whose characters
|
||
|
has been initialized to the <tt>:initial-element</tt> argument.
|
||
|
If an <tt>:initial-element</tt> argument is not specified, then the string will
|
||
|
be initialized in an implementation-dependent way.
|
||
|
<P>
|
||
|
<hr>
|
||
|
<b>Implementation note:</b> It may be convenient to initialize the string
|
||
|
to null characters, or to spaces, or to garbage (``whatever was there'').
|
||
|
<hr>
|
||
|
<P>
|
||
|
A string is really just a one-dimensional array of ``string
|
||
|
characters'' (that is, those characters that are members of type
|
||
|
<tt>string-char</tt>). More complex character arrays may be constructed using the
|
||
|
function <tt>make-array</tt>.
|
||
|
<br><img align=bottom alt="old_change_end" src="gif/old_change_end.gif">
|
||
|
<P>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=18277> </A>
|
||
|
to eliminate the type <tt>string-char</tt> and to add a keyword
|
||
|
argument <tt>:element-type</tt> to <tt>make-string</tt>. The new function
|
||
|
description is as follows.
|
||
|
<P>
|
||
|
<BR><b>[Function]</b><BR>
|
||
|
<tt>make-string <i>size</i> &key :initial-element :element-type</tt><P>This returns a simple string
|
||
|
of length <i>size</i>, each of whose characters
|
||
|
has been initialized to the <tt>:initial-element</tt> argument.
|
||
|
If an <tt>:initial-element</tt> argument is not specified, then the string will
|
||
|
be initialized in an implementation-dependent way.
|
||
|
<P>
|
||
|
The <tt>:element-type</tt> argument names the type of the elements
|
||
|
of the string; a string is constructed of the most specialized type
|
||
|
that can accommodate elements of the given type. If <tt>:element-type</tt>
|
||
|
is omitted, the type <tt>character</tt> is the default.
|
||
|
<P>
|
||
|
X3J13 voted in January 1989
|
||
|
(ARGUMENTS-UNDERSPECIFIED) <A NAME=18288> </A>
|
||
|
to clarify that the <i>size</i> argument
|
||
|
must be a non-negative integer less than the value of
|
||
|
<tt>array-dimension-limit</tt>.
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
<BR><b>[Function]</b><BR>
|
||
|
<tt>string-trim <i>character-bag</i> <i>string</i> <BR></tt><tt>string-left-trim <i>character-bag</i> <i>string</i> <BR></tt><tt>string-right-trim <i>character-bag</i> <i>string</i></tt><P><tt>string-trim</tt> returns a substring of <i>string</i>, with all characters in
|
||
|
<i>character-bag</i> stripped off the beginning and end.
|
||
|
The function <tt>string-left-trim</tt> is similar but strips characters
|
||
|
off only the beginning; <tt>string-right-trim</tt> strips off only the end.
|
||
|
The argument <i>character-bag</i> may be any sequence containing
|
||
|
characters.
|
||
|
For example:
|
||
|
<P><pre>
|
||
|
(string-trim '(#\Space #\Tab #\Newline) " garbanzo beans
|
||
|
") => "garbanzo beans"
|
||
|
(string-trim " (*)" " ( *three (silly) words* ) ")
|
||
|
=> "three (silly) words"
|
||
|
(string-left-trim " (*)" " ( *three (silly) words* ) ")
|
||
|
=> "three (silly) words* ) "
|
||
|
(string-right-trim " (*)" " ( *three (silly) words* ) ")
|
||
|
=> " ( *three (silly) words"
|
||
|
</pre><P>
|
||
|
If no characters need to be trimmed from the <i>string</i>,
|
||
|
then either the argument <i>string</i> itself or a copy of it may
|
||
|
be returned, at the discretion of the implementation.
|
||
|
<P>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
X3J13 voted in June 1989 (STRING-COERCION) <A NAME=18308> </A>
|
||
|
to clarify string coercion (see <tt>string</tt>).
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
<BR><b>[Function]</b><BR>
|
||
|
<tt>string-upcase <i>string</i> &key :start :end <BR></tt><tt>string-downcase <i>string</i> &key :start :end <BR></tt><tt>string-capitalize <i>string</i> &key :start :end</tt><P><tt>string-upcase</tt> returns a string just like <i>string</i> with all lowercase
|
||
|
characters replaced by the corresponding uppercase characters. More
|
||
|
precisely, each character of the result string is produced by applying
|
||
|
the function <tt>char-upcase</tt> to the corresponding character of
|
||
|
<i>string</i>.
|
||
|
<P>
|
||
|
<tt>string-downcase</tt> is similar, except that uppercase characters are
|
||
|
converted to lowercase characters (using <tt>char-downcase</tt>).
|
||
|
<P>
|
||
|
The keyword arguments <tt>:start</tt> and <tt>:end</tt> delimit the portion
|
||
|
of the string to be affected. The result is always of the same length
|
||
|
as <i>string</i>, however.
|
||
|
<P>
|
||
|
The argument is not destroyed. However, if no characters in the argument
|
||
|
require conversion, the result may be either the argument or a copy of it,
|
||
|
at the implementation's discretion.
|
||
|
For example:
|
||
|
<P><pre>
|
||
|
(string-upcase "Dr. Livingstone, I presume?")
|
||
|
=> "DR. LIVINGSTONE, I PRESUME?"
|
||
|
(string-downcase "Dr. Livingstone, I presume?")
|
||
|
=> "dr. livingstone, i presume?"
|
||
|
(string-upcase "Dr. Livingstone, I presume?" <tt>:start</tt> 6 <tt>:end</tt> 10)
|
||
|
=> "Dr. LiVINGstone, I presume?"
|
||
|
</pre><P>
|
||
|
<P>
|
||
|
<tt>string-capitalize</tt> produces a copy of <i>string</i> such that,
|
||
|
for every word in the copy, the first character of the word,
|
||
|
if case-modifiable, is uppercase and
|
||
|
any other case-modifiable characters in the word are lowercase.
|
||
|
For the purposes of <tt>string-capitalize</tt>,
|
||
|
a word is defined to be a
|
||
|
consecutive subsequence consisting of alphanumeric characters or digits,
|
||
|
delimited at each end either by a non-alphanumeric character
|
||
|
or by an end of the string.
|
||
|
For example:
|
||
|
<P><pre>
|
||
|
(string-capitalize " hello ") => " Hello "
|
||
|
(string-capitalize
|
||
|
¯"occlUDeD cASEmenTs FOreSTAll iNADVertent DEFenestraTION")
|
||
|
=>\>"Occluded Casements Forestall Inadvertent Defenestration"
|
||
|
(string-capitalize 'kludgy-hash-search) => "Kludgy-Hash-Search"
|
||
|
(string-capitalize "DON'T!") => "Don'T!" ;<i>not</i> "Don't!"
|
||
|
(string-capitalize "pipe 13a, foo16c") => "Pipe 13a, Foo16c"
|
||
|
</pre><P>
|
||
|
<P>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
X3J13 voted in June 1989 (STRING-COERCION) <A NAME=18333> </A>
|
||
|
to clarify string coercion (see <tt>string</tt>).
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
<hr>
|
||
|
<b>Compatibility note:</b> Some very approximate Interlisp equivalents to
|
||
|
<tt>string-upcase</tt>, <tt>string-downcase</tt>, and <tt>string-capitalize</tt>
|
||
|
are <tt>u-case</tt>, <tt>l-case</tt> with second argument <tt>nil</tt>,
|
||
|
and <tt>l-case</tt> with second argument <tt>t</tt>.
|
||
|
<hr>
|
||
|
<P>
|
||
|
<BR><b>[Function]</b><BR>
|
||
|
<tt>nstring-upcase <i>string</i> &key :start :end <BR></tt><tt>nstring-downcase <i>string</i> &key :start :end <BR></tt><tt>nstring-capitalize <i>string</i> &key :start :end</tt><P>These three functions are just like <tt>string-upcase</tt>,
|
||
|
<tt>string-downcase</tt>, and <tt>string-capitalize</tt>
|
||
|
but destructively modify the argument <i>string</i> by altering
|
||
|
case-modifiable characters as necessary.
|
||
|
<P>
|
||
|
The keyword arguments <tt>:start</tt> and <tt>:end</tt> delimit the portion
|
||
|
of the string to be affected. The argument <i>string</i> is returned as
|
||
|
the result.
|
||
|
<P>
|
||
|
<BR><b>[Function]</b><BR>
|
||
|
<tt>string <i>x</i></tt><P>Most of the string
|
||
|
functions effectively apply <tt>string</tt>
|
||
|
to such of their arguments as are supposed to be
|
||
|
strings.
|
||
|
If <i>x</i> is a string, it is returned.
|
||
|
If <i>x</i> is a symbol, its print name is returned.
|
||
|
<p>
|
||
|
<img align=bottom alt="old_change_begin" src="gif/old_change_begin.gif"><br>
|
||
|
If <i>x</i> is a string character (a character of type <tt>string-char</tt>),
|
||
|
then a string containing that one character is returned.
|
||
|
<br><img align=bottom alt="old_change_end" src="gif/old_change_end.gif">
|
||
|
<P>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=18365> </A>
|
||
|
to eliminate the type <tt>string-char</tt> and to redefine the type
|
||
|
<tt>string</tt> to be the union of one or more specialized vector
|
||
|
types, the types of whose elements are subtypes of the type <tt>character</tt>.
|
||
|
Presumably converting a character to a string always works according
|
||
|
to this vote.
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
In any other situation, an error is signaled.
|
||
|
<P>
|
||
|
To convert a sequence of characters to a string, use <tt>coerce</tt>.
|
||
|
(Note that <tt>(coerce x 'string)</tt> will not succeed if <tt>x</tt> is a symbol.
|
||
|
Conversely, <tt>string</tt> will not convert a list or other sequence
|
||
|
to be a string.)
|
||
|
<P>
|
||
|
To get the string representation of a number or any other Lisp
|
||
|
object, use <tt>prin1-to-string</tt>, <tt>princ-to-string</tt>,
|
||
|
or <tt>format</tt>.
|
||
|
<P>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
X3J13 voted in June 1989 (STRING-COERCION) <A NAME=18378> </A>
|
||
|
to specify that the following functions perform coercion
|
||
|
on their <i>string</i> arguments identical to that performed
|
||
|
by the function <tt>string</tt>.
|
||
|
<P>
|
||
|
<pre>
|
||
|
string= string-equal string-trim
|
||
|
string< string-lessp string-left-trim
|
||
|
string> string-greaterp string-right-trim
|
||
|
string<= string-not-greaterp string-upcase
|
||
|
string>= string-not-lessp string-downcase
|
||
|
string/= string-not-equal string-capitalize
|
||
|
</pre>
|
||
|
<P>
|
||
|
Note that <tt>nstring-upcase</tt>, <tt>nstring-downcase</tt>, and
|
||
|
<tt>nstring-capitalize</tt> are absent from this list; because they modify destructively,
|
||
|
the argument must be a string.
|
||
|
<P>
|
||
|
As part of the same vote X3J13 specified that <tt>string</tt>
|
||
|
may perform additional implementation-dependent coercions
|
||
|
but the returned value must be of type <tt>string</tt>.
|
||
|
Only when no coercion is defined, whether standard or implementation-dependent,
|
||
|
is <tt>string</tt> required to signal an error, in which case the error condition
|
||
|
must be of type <tt>type-error</tt>.
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
|
||
|
<P>
|
||
|
<BR> <HR><A NAME=tex2html3608 HREF="node168.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3606 HREF="node164.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3602 HREF="node166.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3610 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3611 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
|
||
|
<B> Next:</B> <A NAME=tex2html3609 HREF="node168.html"> Structures</A>
|
||
|
<B>Up:</B> <A NAME=tex2html3607 HREF="node164.html"> Strings</A>
|
||
|
<B> Previous:</B> <A NAME=tex2html3603 HREF="node166.html"> String Comparison</A>
|
||
|
<HR> <P>
|
||
|
<HR>
|
||
|
<P><ADDRESS>
|
||
|
AI.Repository@cs.cmu.edu
|
||
|
</ADDRESS>
|
||
|
</BODY>
|