185 lines
11 KiB
HTML
185 lines
11 KiB
HTML
![]() |
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||
|
<!Converted with LaTeX2HTML 0.6.5 (Tue Nov 15 1994) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||
|
<HEAD>
|
||
|
<TITLE>13. Characters</TITLE>
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
<meta name="description" value=" Characters">
|
||
|
<meta name="keywords" value="clm">
|
||
|
<meta name="resource-type" value="document">
|
||
|
<meta name="distribution" value="global">
|
||
|
<P>
|
||
|
<b>Common Lisp the Language, 2nd Edition</b>
|
||
|
<BR> <HR><A NAME=tex2html3209 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3207 HREF="clm.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3201 HREF="node134.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3211 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3212 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
|
||
|
<B> Next:</B> <A NAME=tex2html3210 HREF="node136.html"> Character Attributes</A>
|
||
|
<B>Up:</B> <A NAME=tex2html3208 HREF="clm.html">Common Lisp the Language</A>
|
||
|
<B> Previous:</B> <A NAME=tex2html3202 HREF="node134.html"> Implementation Parameters</A>
|
||
|
<HR> <P>
|
||
|
<H1><A NAME=SECTION001700000000000000000>13. Characters</A></H1>
|
||
|
<P>
|
||
|
Common Lisp provides a character data type; objects of this type
|
||
|
represent printed symbols such as letters.
|
||
|
<P>
|
||
|
In general, characters in Common Lisp are not true objects; <tt>eq</tt> cannot
|
||
|
be counted upon to operate on them reliably. In particular,
|
||
|
it is possible that the expression
|
||
|
<P><pre>
|
||
|
(let ((x z) (y z)) (eq x y))
|
||
|
</pre><P>
|
||
|
may be false rather than true, if the value of <tt>z</tt> is a character.
|
||
|
<P>
|
||
|
<hr>
|
||
|
<b>Rationale:</b> This odd breakdown of <tt>eq</tt> in the case of characters
|
||
|
allows the implementor enough design freedom to produce exceptionally
|
||
|
efficient code on conventional architectures. In this respect the
|
||
|
treatment of characters exactly parallels that of numbers, as described
|
||
|
in chapter <A HREF="node121.html#NUMBER">12</A>.
|
||
|
<hr>
|
||
|
<P>
|
||
|
<A NAME=STANDARDCHARREPERTOIRETABLE> </A><listing>
|
||
|
------------------------------------------------------------------------------
|
||
|
Table 13-1: Standard Character Labels, Glyphs, and Descriptions
|
||
|
|
||
|
SM05 @ commercial at SD13 ` grave accent
|
||
|
SP02 ! exclamation mark LA02 A capital A LA01 a small a
|
||
|
SP04 " quotation mark LB02 B capital B LB01 b small b
|
||
|
SM01 # number sign LC02 C capital C LC01 c small c
|
||
|
SC03 $ dollar sign LD02 D capital D LD01 d small d
|
||
|
SM02 % percent sign LE02 E capital E LE01 e small e
|
||
|
SM03 & ampersand LF02 F capital F LF01 f small f
|
||
|
SP05 ' apostrophe LG02 G capital G LG01 g small g
|
||
|
SP06 ( left parenthesis LH02 H capital H LH01 h small h
|
||
|
SP07 ) right parenthesis LI02 I capital I LI01 i small i
|
||
|
SM04 * asterisk LJ02 J capital J LJ01 j small j
|
||
|
SA01 + plus sign LK02 K capital K LK01 k small k
|
||
|
SP08 , comma LL02 L capital L LL01 l small l
|
||
|
SP10 - hyphen or minus sign LM02 M capital M LM01 m small m
|
||
|
SP11 . period or full stop LN02 N capital N LN01 n small n
|
||
|
SP12 / solidus LO02 O capital O LO01 o small o
|
||
|
ND10 0 digit 0 LP02 P capital P LP01 p small p
|
||
|
ND01 1 digit 1 LQ02 Q capital Q LQ01 q small q
|
||
|
ND02 2 digit 2 LR02 R capital R LR01 r small r
|
||
|
ND03 3 digit 3 LS02 S capital S LS01 s small s
|
||
|
ND04 4 digit 4 LT02 T capital T LT01 t small t
|
||
|
ND05 5 digit 5 LU02 U capital U LU01 u small u
|
||
|
ND06 6 digit 6 LV02 V capital V LV01 v small v
|
||
|
ND07 7 digit 7 LW02 W capital W LW01 w small w
|
||
|
ND08 8 digit 8 LX02 X capital X LX01 x small x
|
||
|
ND09 9 digit 9 LY02 Y capital Y LY01 y small y
|
||
|
SP13 : colon LZ02 Z capital Z LZ01 z small z
|
||
|
SP14 ; semicolon SM06 [ left square bracket SM11 { left curly bracket
|
||
|
SA03 < less-than sign SM07 \ reverse solidus SM13 | vertical bar
|
||
|
SA04 = equals sign SM08 ] right square bracket SM14 } right curly bracket
|
||
|
SA05 > greater-than sign SD15 ^ circumflex accent SD19 ~ tilde
|
||
|
SP15 ? question mark SP09 _ low line
|
||
|
------------------------------------------------------------------------------
|
||
|
</listing>
|
||
|
|
||
|
<p>
|
||
|
|
||
|
If two objects are to be compared for ``identity,'' but either might be
|
||
|
a character, then the predicate <tt>eql</tt> is probably appropriate.
|
||
|
<P>
|
||
|
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
|
||
|
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=14406> </A>
|
||
|
to approve the following definitions and terminology for use in
|
||
|
discussing character facilities in Common Lisp.
|
||
|
<P>
|
||
|
A <i>character repertoire</i> defines a collection of characters
|
||
|
independent of their specific rendered image or font. (This corresponds
|
||
|
to the mathematical notion of a set, but the term <i>character set</i>
|
||
|
is avoided here because it has been used in the past to mean
|
||
|
both what is here called a repertoire and what is here called a coded
|
||
|
character set.)
|
||
|
Character repertoires are specified independent of coding and their characters
|
||
|
are identified only with a unique <i>character label</i>,
|
||
|
a graphic symbol, and a character description.
|
||
|
As an example, table <A HREF="node135.html#STANDARDCHARREPERTOIRETABLE">13-1</A>
|
||
|
shows the character labels, graphic symbols, and character descriptions for
|
||
|
all of the characters in the repertoire <tt>standard-char</tt>
|
||
|
except for <tt>#\Space</tt> and <tt>#\Newline</tt>.
|
||
|
<P>
|
||
|
Every Common Lisp implementation must support the standard character repertoire
|
||
|
as well as repertoires named <tt>base-character</tt>, <tt>extended-character</tt>,
|
||
|
and <tt>character</tt>. Other repertoires may be supported as well.
|
||
|
X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) <A NAME=14417> </A> to specify that names of
|
||
|
repertoires may be used as type specifiers. Such types must be subtypes of <tt>character</tt>;
|
||
|
that is, in a given implementation
|
||
|
the repertoire named <tt>character</tt> must encompass all the character objects supported
|
||
|
by that implementation.
|
||
|
<P>
|
||
|
A <i>coded character set</i> is a character repertoire plus an <i>encoding</i>
|
||
|
that provides a bijective mapping between each character in the set and a number
|
||
|
(typically a non-negative integer)
|
||
|
that serves as the character representation.
|
||
|
There are numerous internationally standardized coded character sets.
|
||
|
<P>
|
||
|
A character may be included in one or more character repertoires.
|
||
|
Similarly, a character may be included in one or more coded character sets.
|
||
|
<P>
|
||
|
To ensure that each character is uniquely defined, we may use a universal registry of
|
||
|
characters that incorporates a collection of distinguished repertoires
|
||
|
called <i>character scripts</i> that form an exhaustive partition of all characters.
|
||
|
That is, each character is included in exactly one character script.
|
||
|
(Draft ISO 10646 Coded Character Set Standard, if eventually approved as a standard,
|
||
|
may become the practical realization of this universal registry.)
|
||
|
<P>
|
||
|
(X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) <A NAME=14423> </A> to specify that
|
||
|
an implementation must document the character scripts it supports.
|
||
|
For each script the documentation should discuss character labels,
|
||
|
glyphs, and descriptions; any canonicalization processes performed
|
||
|
by the reader that result in treating distinct characters as equivalent;
|
||
|
any canonicalization performed by <tt>format</tt> in processing directives;
|
||
|
the behavior of <tt>char-upcase</tt>, <tt>char-downcase</tt>, and the predicates
|
||
|
<tt>alpha-char-p</tt>, <tt>upper-case-p</tt>, <tt>lower-case-p</tt>, <tt>both-case-p</tt>,
|
||
|
<tt>graphic-char-p</tt>, <tt>alphanumericp</tt>, <tt>char-equal</tt>, <tt>char-not-equal</tt>,
|
||
|
<tt>char-lessp</tt>, <tt>char-greaterp</tt>, <tt>char-not-greaterp</tt>, and <tt>char-not-lessp</tt>
|
||
|
for characters in the script; and behavior with respect to input and output,
|
||
|
including coded character sets and external coding schemes.)
|
||
|
<P>
|
||
|
In Common Lisp a <i>character</i> data object is identified by its <i>character code</i>,
|
||
|
a unique numerical code. Each character code is composed from a character script
|
||
|
and a character label. The convention by which a character script and
|
||
|
character label compose a character code is implementation dependent.
|
||
|
[X3J13 did not approve all parts of the proposal from its Subcommittee
|
||
|
on Characters. As a result, some features that were approved appear to
|
||
|
have no purpose. X3J13 wished to support the standardization by ISO of character
|
||
|
scripts and coded character sets but declined to design facilities for use in
|
||
|
Common Lisp until there has been more progress by ISO in this area.
|
||
|
The approval of the terminology for scripts and labels gives a hint
|
||
|
to implementors of likely directions for Common Lisp in the future.]
|
||
|
<P>
|
||
|
A character object that is classified as <i>graphic</i>, or displayable,
|
||
|
has an associated <i>glpyh</i>. The glyph is the visual representation
|
||
|
of the character. All other character data objects are classified as
|
||
|
<i>non-graphic</i>.
|
||
|
<P>
|
||
|
This terminology assigns names to Common Lisp concepts
|
||
|
in a manner consistent with
|
||
|
related concepts discussed in various ISO standards for coded
|
||
|
character sets and provides a demarcation between standardization
|
||
|
activities. For example, facilities for manipulating characters,
|
||
|
character scripts, and coded character sets are properly defined
|
||
|
by a Common Lisp standard, but Common Lisp should not define
|
||
|
standard character sets or standard character scripts.
|
||
|
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
|
||
|
<P>
|
||
|
<HR>
|
||
|
<UL>
|
||
|
<LI> <A NAME=tex2html3213 HREF="node136.html#SECTION001710000000000000000"> Character Attributes</A>
|
||
|
<LI> <A NAME=tex2html3214 HREF="node137.html#SECTION001720000000000000000"> Predicates on Characters</A>
|
||
|
<LI> <A NAME=tex2html3215 HREF="node138.html#SECTION001730000000000000000"> Character Construction and Selection</A>
|
||
|
<LI> <A NAME=tex2html3216 HREF="node139.html#SECTION001740000000000000000"> Character Conversions</A>
|
||
|
<LI> <A NAME=tex2html3217 HREF="node140.html#SECTION001750000000000000000"> Character Control-Bit Functions</A>
|
||
|
</UL>
|
||
|
<BR> <HR><A NAME=tex2html3209 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3207 HREF="clm.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3201 HREF="node134.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3211 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3212 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
|
||
|
<B> Next:</B> <A NAME=tex2html3210 HREF="node136.html"> Character Attributes</A>
|
||
|
<B>Up:</B> <A NAME=tex2html3208 HREF="clm.html">Common Lisp the Language</A>
|
||
|
<B> Previous:</B> <A NAME=tex2html3202 HREF="node134.html"> Implementation Parameters</A>
|
||
|
<HR> <P>
|
||
|
<HR>
|
||
|
<P><ADDRESS>
|
||
|
AI.Repository@cs.cmu.edu
|
||
|
</ADDRESS>
|
||
|
</BODY>
|