emacs.d/clones/lisp/www.cs.cmu.edu/Groups/AI/html/cltl/clm/node137.html

323 lines
16 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 0.6.5 (Tue Nov 15 1994) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>13.2. Predicates on Characters</TITLE>
</HEAD>
<BODY>
<meta name="description" value=" Predicates on Characters">
<meta name="keywords" value="clm">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
<b>Common Lisp the Language, 2nd Edition</b>
<BR> <HR><A NAME=tex2html3238 HREF="node138.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3236 HREF="node135.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3230 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3240 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3241 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME=tex2html3239 HREF="node138.html"> Character Construction and </A>
<B>Up:</B> <A NAME=tex2html3237 HREF="node135.html"> Characters</A>
<B> Previous:</B> <A NAME=tex2html3231 HREF="node136.html"> Character Attributes</A>
<HR> <P>
<H1><A NAME=SECTION001720000000000000000>13.2. Predicates on Characters</A></H1>
<P>
The predicate <tt>characterp</tt> may be used to determine
whether any Lisp object is a character object.
<P>
<BR><b>[Function]</b><BR>
<tt>standard-char-p <i>char</i></tt><P>The argument <i>char</i> must be a character object.
<tt>standard-char-p</tt> is true if the argument is a ``standard character,''
that is, an object of type <tt>standard-char</tt>.
<P>
Note that any character with a non-zero bits or
font attribute is non-standard.
<P>
<BR><b>[Function]</b><BR>
<tt>graphic-char-p <i>char</i></tt><P>The argument <i>char</i> must be a character object.
<tt>graphic-char-p</tt> is true if the argument is a ``graphic'' (printing)
character, and false if it is a ``non-graphic'' (formatting or control)
character. Graphic characters have a standard textual representation
as a single glyph, such as <tt>A</tt> or <tt>*</tt> or <tt>=</tt>.
By convention, the space character is considered to be graphic.
Of the standard characters
all but <tt>#\Newline</tt> are graphic.
The semi-standard characters
<tt>#\Backspace</tt>, <tt>#\Tab</tt>, <tt>#\Rubout</tt>, <tt>#\Linefeed</tt>, <tt>#\Return</tt>,
and <tt>#\Page</tt> are not graphic.
<P>
Programs may assume that
graphic characters of font 0 are all of the same width
when printed, for example, for purposes of columnar
formatting. (This does not prohibit the use of a variable-pitch font
as font 0, but merely implies that every implementation of Common Lisp
must provide <i>some</i> mode of operation in which font 0 is
a fixed-pitch font.)
Portable programs should assume that, in general,
non-graphic characters and characters of
other fonts may be of varying widths.
<P>
Any character with a non-zero bits attribute is non-graphic.
<P>
<img align=bottom alt="old_change_begin" src="gif/old_change_begin.gif">
<BR><b>[Function]</b><BR>
<tt>string-char-p <i>char</i></tt><P>The argument <i>char</i> must be a character object.
<tt>string-char-p</tt> is true if <i>char</i> can be stored into
a string, and otherwise is false.
Any character that satisfies <tt>standard-char-p</tt>
also satisfies <tt>string-char-p</tt>; others may also.
<br><img align=bottom alt="old_change_end" src="gif/old_change_end.gif">
<P>
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=14529>&#160;</A>
to eliminate <tt>string-char-p</tt>.
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
<P>
<BR><b>[Function]</b><BR>
<tt>alpha-char-p <i>char</i></tt><P>The argument <i>char</i> must be a character object.
<tt>alpha-char-p</tt> is true if the argument is an alphabetic
character, and otherwise is false.
<P>
If a character is alphabetic, then it is perforce graphic.
Therefore any character with a non-zero bits attribute cannot be alphabetic.
Whether a character is alphabetic may depend on its font number.
<P>
Of the standard characters (as defined by <tt>standard-char-p</tt>),
the letters <tt>A</tt> through <tt>Z</tt> and <tt>a</tt> through <tt>z</tt> are alphabetic.
<P>
<BR><b>[Function]</b><BR>
<tt>upper-case-p <i>char</i> <BR></tt><tt>lower-case-p <i>char</i> <BR></tt><tt>both-case-p <i>char</i></tt><P>The argument <i>char</i> must be a character object.
<P>
<tt>upper-case-p</tt> is true if the argument is an uppercase
character, and otherwise is false.
<P>
<tt>lower-case-p</tt> is true if the argument is a lowercase
character, and otherwise is false.
<P>
<tt>both-case-p</tt> is true if the argument is an uppercase character and
there is a corresponding lowercase character (which can be obtained
using <tt>char-downcase</tt>), or if the argument is a lowercase character and
there is a corresponding uppercase character (which can be obtained
using <tt>char-upcase</tt>).
<P>
If a character is either uppercase or lowercase, it is necessarily
alphabetic (and therefore is graphic, and therefore has a zero bits
attribute). However, it is permissible in theory for an alphabetic
character to be neither uppercase nor lowercase (in a non-Roman font,
for example).
<P>
Of the standard characters (as defined by <tt>standard-char-p</tt>),
the letters <tt>A</tt> through <tt>Z</tt> are uppercase and <tt>a</tt>
through <tt>z</tt> are lowercase.
<P>
<BR><b>[Function]</b><BR>
<tt>digit-char-p <i>char</i> &amp;optional (<i>radix</i> 10)</tt><P>The argument <i>char</i> must be a character object,
and <i>radix</i> must be a non-negative integer.
If <i>char</i> is not a digit of the radix
specified by <i>radix</i>, then <tt>digit-char-p</tt> is
false; otherwise it returns
a non-negative integer that is the ``weight'' of <i>char</i> in that radix.
<P>
Digits are necessarily graphic characters.
<P>
Of the standard characters (as defined by <tt>standard-char-p</tt>),
the characters <tt>0</tt> through <tt>9</tt>, <tt>A</tt> through <tt>Z</tt>,
and <tt>a</tt> through <tt>z</tt>
are digits. The weights of <tt>0</tt> through <tt>9</tt> are the integers 0 through 9,
and of <tt>A</tt> through <tt>Z</tt> (and also <tt>a</tt> through <tt>z</tt>) are 10 through 35.
<tt>digit-char-p</tt> returns the weight for one of these digits if and only if
its weight is strictly less than <i>radix</i>. Thus, for example,
the digits for radix 16 are
<P><pre>
0 1 2 3 4 5 6 7 8 9 A B C D E F
</pre><P>
<P>
Here is an example of the use of <tt>digit-char-p</tt>:
<P><pre>
(defun convert-string-to-integer (str &amp;optional (radix 10))
&quot;Given a digit string and optional radix, return an integer.&quot;
(do ((j 0 (+ j 1))
(n 0 (+ (* n radix)
(or (digit-char-p (char str j) radix)
(error &quot;Bad radix-~D digit: ~C&quot;
radix
(char str j))))))
((= j (length str)) n)))
</pre><P>
<P>
<BR><b>[Function]</b><BR>
<tt>alphanumericp <i>char</i></tt><P>The argument <i>char</i> must be a character object.
<tt>alphanumericp</tt> is true if <i>char</i> is either alphabetic
or numeric. By definition,
<P><pre>
(alphanumericp x)
== (or (alpha-char-p x) (not (null (digit-char-p x))))
</pre><P>
Alphanumeric characters are therefore necessarily graphic
(as defined by the predicate <tt>graphic-char-p</tt>).
<P>
Of the standard characters (as defined by <tt>standard-char-p</tt>),
the characters <tt>0</tt> through <tt>9</tt>, <tt>A</tt> through <tt>Z</tt>,
and <tt>a</tt> through <tt>z</tt> are alphanumeric.
<P>
<BR><b>[Function]</b><BR>
<tt>char= <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char/= <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char&lt; <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char&gt; <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char&lt;= <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char&gt;= <i>character</i> &amp;rest <i>more-characters</i></tt><P>The arguments must all be character objects.
These functions compare the objects using the implementation-dependent
total ordering on characters, in a manner analogous to numeric
comparisons by <tt>=</tt> and related functions.
<P>
The total ordering on characters is guaranteed to have the following
properties:
<UL><LI>
The standard alphanumeric characters obey the following partial ordering:
<P><pre>
A&lt;B&lt;C&lt;D&lt;E&lt;F&lt;G&lt;H&lt;I&lt;J&lt;K&lt;L&lt;M&lt;N&lt;O&lt;P&lt;Q&lt;R&lt;S&lt;T&lt;U&lt;V&lt;W&lt;X&lt;Y&lt;Z
to 0pta&lt;b&lt;c&lt;d&lt;e&lt;f&lt;g&lt;h&lt;i&lt;j&lt;k&lt;l&lt;m&lt;n&lt;o&lt; p&lt;q&lt;r&lt;s&lt;t&lt;u&lt;v&lt;w&lt;x&lt;y&lt;z
0&lt;1&lt;2&lt;3&lt;4&lt;5&lt;6&lt;7&lt;8&lt;9
<i>either</i> 9&lt;A <i>or</i> Z&lt;0
<i>either</i> 9&lt;a <i>or</i> z&lt;0
</pre><P>
This implies that alphabetic ordering holds within each case (upper and
lower), and that the digits as a group
are not interleaved with letters. However, the ordering
or possible interleaving of
uppercase letters and lowercase letters is unspecified.
(Note that both the ASCII and the EBCDIC character sets
conform to this specification. As it happens, neither ordering
interleaves uppercase and lowercase letters:
in the ASCII ordering, <tt>9&lt;A</tt> and <tt>Z&lt;a</tt>,
whereas in the EBCDIC ordering <tt>z&lt;A</tt> and <tt>Z&lt;0</tt>.)
</UL>
<P>
<img align=bottom alt="old_change_begin" src="gif/old_change_begin.gif"><br>
<UL><LI>
If two characters have the same bits and font attributes,
then their ordering by <tt>char&lt;</tt> is consistent with the numerical
ordering by the predicate <tt>&lt;</tt> on their code attributes.
<P>
<LI>
If two characters differ in any attribute (code, bits, or font), then they
are different.
</UL>
<br><img align=bottom alt="old_change_end" src="gif/old_change_end.gif">
<P>
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=14622>&#160;</A>
to replace the notion of bits and font attributes with
that of implementation-defined attributes.
<P>
<UL><LI>
If two characters have identical implementation-defined attributes,
then their ordering by <tt>char&lt;</tt> is consistent with the numerical
ordering by the predicate <tt>&lt;</tt> on their codes, and similarly
for <tt>char&gt;</tt>, <tt>char&lt;=</tt>, and <tt>char&gt;=</tt>.
<P>
<LI>
If two characters differ in any implementation-defined
attribute, then they are not <tt>char=</tt>.
</UL>
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
<P>
The total ordering is not necessarily the same as the total
ordering on the integers produced by applying <tt>char-int</tt> to the
characters (although it is a reasonable implementation technique to
use that ordering).
<P>
While alphabetic characters of a given case must be
properly ordered, they need not be contiguous; thus <tt>(char&lt;= #\a x
#\z)</tt> is <i>not</i> a valid way of determining whether or not <tt>x</tt> is a
lowercase letter. That is why a separate
<tt>lower-case-p</tt> predicate is provided.
<P>
<P><pre>
(char= #\d #\d) is true.
(char/= #\d #\d) is false.
(char= #\d #\x) is false.
(char/= #\d #\x) is true.
(char= #\d #\D) is false.
(char/= #\d #\D) is true.
(char= #\d #\d #\d #\d) is true.
(char/= #\d #\d #\d #\d) is false.
(char= #\d #\d #\x #\d) is false.
(char/= #\d #\d #\x #\d) is false.
(char= #\d #\y #\x #\c) is false.
(char/= #\d #\y #\x #\c) is true.
(char= #\d #\c #\d) is false.
(char/= #\d #\c #\d) is false.
(char&lt; #\d #\x) is true.
(char&lt;= #\d #\x) is true.
(char&lt; #\d #\d) is false.
(char&lt;= #\d #\d) is true.
(char&lt; #\a #\e #\y #\z) is true.
(char&lt;= #\a #\e #\y #\z) is true.
(char&lt; #\a #\e #\e #\y) is false.
(char&lt;= #\a #\e #\e #\y) is true.
(char&gt; #\e #\d) is true.
(char&gt;= #\e #\d) is true.
(char&gt; #\d #\c #\b #\a) is true.
(char&gt;= #\d #\c #\b #\a) is true.
(char&gt; #\d #\d #\c #\a) is false.
(char&gt;= #\d #\d #\c #\a) is true.
(char&gt; #\e #\d #\b #\c #\a) is false.
(char&gt;= #\e #\d #\b #\c #\a) is false.
(char&gt; #\z #\A) may be true or false.
(char&gt; #\Z #\a) may be true or false.
</pre><P>
<P>
There is no requirement that <tt>(eq c1 c2)</tt> be true merely because
<tt>(char= c1 c2)</tt> is true. While <tt>eq</tt> may distinguish two character
objects that <tt>char=</tt> does not, it is distinguishing them not
as <i>characters</i>, but in some sense on the basis of a lower-level
implementation characteristic.
(Of course, if <tt>(eq c1 c2)</tt> is true,
then one may expect <tt>(char= c1 c2)</tt> to be true.)
However, <tt>eql</tt> and <tt>equal</tt>
compare character objects in the same
way that <tt>char=</tt> does.
<P>
<BR><b>[Function]</b><BR>
<tt>char-equal <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char-not-equal <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char-lessp <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char-greaterp <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char-not-greaterp <i>character</i> &amp;rest <i>more-characters</i> <BR></tt><tt>char-not-lessp <i>character</i> &amp;rest <i>more-characters</i></tt>
<P>
<img align=bottom alt="old_change_begin" src="gif/old_change_begin.gif"><br>
The predicate <tt>char-equal</tt> is like <tt>char=</tt>, and similarly
for the others, except according to a different ordering such that
differences of bits attributes and case are ignored,
and font information is taken into account in an implementation-dependent
manner.
<br><img align=bottom alt="old_change_end" src="gif/old_change_end.gif">
<P>
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=14789>&#160;</A>
to replace the notion of bits and font attributes with
that of implementation-defined attributes. The effect, if any,
of each such attribute on the behavior of
<tt>char-equal</tt>, <tt>char-not-equal</tt>, <tt>char-lessp</tt>, <tt>char-greaterp</tt>,
<tt>char-not-greaterp</tt>, and <tt>char-not-lessp</tt> must be specified
as part of the definition of that attribute.
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
<P>
For the standard characters, the ordering is such that
<tt>A=a</tt>, <tt>B=b</tt>, and so on, up to <tt>Z=z</tt>, and furthermore either
<tt>9&lt;A</tt> or <tt>Z&lt;0</tt>.
For example:
<P><pre>
(char-equal #\A #\a) is true.
(char= #\A #\a) is false.
(char-equal #\A #\Control-A) is true.
</pre><P>
<img align=bottom alt="old_change_begin" src="gif/old_change_begin.gif"><br>
The ordering may depend on the font information. For example, an implementation
might decree that <tt>(char-equal #\p #\<i>p</i>)</tt> be true, but that
<tt>(char-equal #\p #\</tt><b>pi</b><tt>)</tt> be false
(where <tt>#\</tt><b>pi</b> is a
lowercase <tt>p</tt> in some font). Assuming italics to be in font 1
and the Greek alphabet in font 2, this is the same as saying that
<tt>(char-equal #0\p #1\p)</tt> may be true and at the same time
<tt>(char-equal #0\p #2\p)</tt> may be false.
<br><img align=bottom alt="old_change_end" src="gif/old_change_end.gif">
<P>
<BR> <HR><A NAME=tex2html3238 HREF="node138.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3236 HREF="node135.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3230 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3240 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3241 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME=tex2html3239 HREF="node138.html"> Character Construction and </A>
<B>Up:</B> <A NAME=tex2html3237 HREF="node135.html"> Characters</A>
<B> Previous:</B> <A NAME=tex2html3231 HREF="node136.html"> Character Attributes</A>
<HR> <P>
<HR>
<P><ADDRESS>
AI.Repository@cs.cmu.edu
</ADDRESS>
</BODY>