cl-sites/www.cs.cmu.edu/Groups/AI/html/cltl/clm/node135.html

<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
<!Converted with LaTeX2HTML 0.6.5 (Tue Nov 15 1994) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
<HEAD>
<TITLE>13. Characters</TITLE>
</HEAD>
<BODY>
<meta name="description" value=" Characters">
<meta name="keywords" value="clm">
<meta name="resource-type" value="document">
<meta name="distribution" value="global">
<P>
<b>Common Lisp the Language, 2nd Edition</b>
 <BR> <HR><A NAME=tex2html3209 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3207 HREF="clm.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3201 HREF="node134.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3211 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3212 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME=tex2html3210 HREF="node136.html"> Character Attributes</A>
<B>Up:</B> <A NAME=tex2html3208 HREF="clm.html">Common Lisp the Language</A>
<B> Previous:</B> <A NAME=tex2html3202 HREF="node134.html"> Implementation Parameters</A>
<HR> <P>
<H1><A NAME=SECTION001700000000000000000>13. Characters</A></H1>
<P>
Common Lisp provides a character data type; objects of this type
represent printed symbols such as letters.
<P>
In general, characters in Common Lisp are not true objects; <tt>eq</tt> cannot
be counted upon to operate on them reliably.  In particular,
it is possible that the expression
<P><pre>
(let ((x z) (y z)) (eq x y))
</pre><P>
may be false rather than true, if the value of <tt>z</tt> is a character.
<P>
<hr>
<b>Rationale:</b> This odd breakdown of <tt>eq</tt> in the case of characters
allows the implementor enough design freedom to produce exceptionally
efficient code on conventional architectures.  In this respect the
treatment of characters exactly parallels that of numbers, as described
in chapter <A HREF="node121.html#NUMBER">12</A>.
<hr>
<P>
<A NAME=STANDARDCHARREPERTOIRETABLE>&#160;</A><listing>
------------------------------------------------------------------------------
Table 13-1: Standard Character Labels, Glyphs, and Descriptions

                            SM05 @ commercial at        SD13 ` grave accent 
SP02 ! exclamation mark     LA02 A capital A            LA01 a small a 
SP04 " quotation mark       LB02 B capital B            LB01 b small b 
SM01 # number sign          LC02 C capital C            LC01 c small c 
SC03 $ dollar sign          LD02 D capital D            LD01 d small d 
SM02 % percent sign         LE02 E capital E            LE01 e small e 
SM03 & ampersand            LF02 F capital F            LF01 f small f 
SP05 ' apostrophe           LG02 G capital G            LG01 g small g 
SP06 ( left parenthesis     LH02 H capital H            LH01 h small h 
SP07 ) right parenthesis    LI02 I capital I            LI01 i small i 
SM04 * asterisk             LJ02 J capital J            LJ01 j small j 
SA01 + plus sign            LK02 K capital K            LK01 k small k 
SP08 , comma                LL02 L capital L            LL01 l small l 
SP10 - hyphen or minus sign LM02 M capital M            LM01 m small m 
SP11 . period or full stop  LN02 N capital N            LN01 n small n 
SP12 / solidus              LO02 O capital O            LO01 o small o 
ND10 0 digit 0              LP02 P capital P            LP01 p small p 
ND01 1 digit 1              LQ02 Q capital Q            LQ01 q small q 
ND02 2 digit 2              LR02 R capital R            LR01 r small r 
ND03 3 digit 3              LS02 S capital S            LS01 s small s 
ND04 4 digit 4              LT02 T capital T            LT01 t small t 
ND05 5 digit 5              LU02 U capital U            LU01 u small u 
ND06 6 digit 6              LV02 V capital V            LV01 v small v 
ND07 7 digit 7              LW02 W capital W            LW01 w small w 
ND08 8 digit 8              LX02 X capital X            LX01 x small x 
ND09 9 digit 9              LY02 Y capital Y            LY01 y small y 
SP13 : colon                LZ02 Z capital Z            LZ01 z small z 
SP14 ; semicolon            SM06 [ left square bracket  SM11 { left curly bracket 
SA03 < less-than sign       SM07 \ reverse solidus      SM13 | vertical bar 
SA04 = equals sign          SM08 ] right square bracket SM14 } right curly bracket 
SA05 > greater-than sign    SD15 ^ circumflex accent    SD19 ~ tilde 
SP15 ? question mark        SP09 _ low line        
------------------------------------------------------------------------------
</listing>

<p>

If two objects are to be compared for ``identity,'' but either might be
a character, then the predicate <tt>eql</tt> is probably appropriate.
<P>
<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=14406>&#160;</A> 
to approve the following definitions and terminology for use in
discussing character facilities in Common Lisp.
<P>
A <i>character repertoire</i> defines a collection of characters
independent of their specific rendered image or font.  (This corresponds
to the mathematical notion of a set, but the term <i>character set</i>
is avoided here because it has been used in the past to mean
both what is here called a repertoire and what is here called a coded
character set.)
Character repertoires are specified independent of coding and their characters
are identified only with a unique <i>character label</i>,
a graphic symbol, and a character description.
As an example, table <A HREF="node135.html#STANDARDCHARREPERTOIRETABLE">13-1</A>
shows the character labels, graphic symbols, and character descriptions for
all of the characters in the repertoire <tt>standard-char</tt>
except for <tt>#\Space</tt> and <tt>#\Newline</tt>.
<P>
Every Common Lisp implementation must support the standard character repertoire
as well as repertoires named <tt>base-character</tt>, <tt>extended-character</tt>,
and <tt>character</tt>.  Other repertoires may be supported as well.
X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) <A NAME=14417>&#160;</A>  to specify that names of
repertoires may be used as type specifiers.  Such types must be subtypes of <tt>character</tt>;
that is, in a given implementation
the repertoire named <tt>character</tt> must encompass all the character objects supported
by that implementation.
<P>
A <i>coded character set</i> is a character repertoire plus an <i>encoding</i>
that provides a bijective mapping between each character in the set and a number
(typically a non-negative integer)
that serves as the character representation.
There are numerous internationally standardized coded character sets.
<P>
A character may be included in one or more character repertoires.
Similarly, a character may be included in one or more coded character sets.
<P>
To ensure that each character is uniquely defined, we may use a universal registry of
characters that incorporates a collection of distinguished repertoires
called <i>character scripts</i> that form an exhaustive partition of all characters.
That is, each character is included in exactly one character script.
(Draft ISO 10646 Coded Character Set Standard, if eventually approved as a standard,
may become the practical realization of this universal registry.)
<P>
(X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) <A NAME=14423>&#160;</A>  to specify that
an implementation must document the character scripts it supports.
For each script the documentation should discuss character labels,
glyphs, and descriptions; any canonicalization processes performed
by the reader that result in treating distinct characters as equivalent;
any canonicalization performed by <tt>format</tt> in processing directives;
the behavior of <tt>char-upcase</tt>, <tt>char-downcase</tt>, and the predicates
<tt>alpha-char-p</tt>, <tt>upper-case-p</tt>, <tt>lower-case-p</tt>, <tt>both-case-p</tt>,
<tt>graphic-char-p</tt>, <tt>alphanumericp</tt>, <tt>char-equal</tt>, <tt>char-not-equal</tt>,
<tt>char-lessp</tt>, <tt>char-greaterp</tt>, <tt>char-not-greaterp</tt>, and <tt>char-not-lessp</tt>
for characters in the script; and behavior with respect to input and output,
including coded character sets and external coding schemes.)
<P>
In Common Lisp a <i>character</i> data object is identified by its <i>character code</i>,
a unique numerical code.  Each character code is composed from a character script
and a character label.  The convention by which a character script and
character label compose a character code is implementation dependent.
[X3J13 did not approve all parts of the proposal from its Subcommittee
on Characters.  As a result, some features that were approved appear to
have no purpose.  X3J13 wished to support the standardization by ISO of character
scripts and coded character sets but declined to design facilities for use in
Common Lisp until there has been more progress by ISO in this area.
The approval of the terminology for scripts and labels gives a hint
to implementors of likely directions for Common Lisp in the future.]
<P>
A character object that is classified as <i>graphic</i>, or displayable,
has an associated <i>glpyh</i>.  The glyph is the visual representation
of the character.  All other character data objects are classified as
<i>non-graphic</i>.
<P>
This terminology assigns names to Common Lisp concepts
in a manner consistent with
related concepts discussed in various ISO standards for coded
character sets and provides a demarcation between standardization
activities.  For example, facilities for manipulating characters,
character scripts, and coded character sets are properly defined
by a Common Lisp standard, but Common Lisp should not define
standard character sets or standard character scripts.
<br><img align=bottom alt="change_end" src="gif/change_end.gif">
<P>
<HR>
<UL> 
<LI> <A NAME=tex2html3213 HREF="node136.html#SECTION001710000000000000000"> Character Attributes</A>
<LI> <A NAME=tex2html3214 HREF="node137.html#SECTION001720000000000000000"> Predicates on Characters</A>
<LI> <A NAME=tex2html3215 HREF="node138.html#SECTION001730000000000000000"> Character Construction and Selection</A>
<LI> <A NAME=tex2html3216 HREF="node139.html#SECTION001740000000000000000"> Character Conversions</A>
<LI> <A NAME=tex2html3217 HREF="node140.html#SECTION001750000000000000000"> Character Control-Bit Functions</A>
</UL>
<BR> <HR><A NAME=tex2html3209 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3207 HREF="clm.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3201 HREF="node134.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3211 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3212 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
<B> Next:</B> <A NAME=tex2html3210 HREF="node136.html"> Character Attributes</A>
<B>Up:</B> <A NAME=tex2html3208 HREF="clm.html">Common Lisp the Language</A>
<B> Previous:</B> <A NAME=tex2html3202 HREF="node134.html"> Implementation Parameters</A>
<HR> <P>
<HR>
<P><ADDRESS>
AI.Repository@cs.cmu.edu
</ADDRESS>
</BODY>
Init commit 2023-10-25 11:23:21 +02:00			`<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">`
			`<!Converted with LaTeX2HTML 0.6.5 (Tue Nov 15 1994) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >`
			`<HEAD>`
			`<TITLE>13. Characters</TITLE>`
			`</HEAD>`
			`<BODY>`
			`<meta name="description" value=" Characters">`
			`<meta name="keywords" value="clm">`
			`<meta name="resource-type" value="document">`
			`<meta name="distribution" value="global">`
			`<P>`
			`<b>Common Lisp the Language, 2nd Edition</b>`
			<BR> <HR><A NAME=tex2html3209 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3207 HREF="clm.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3201 HREF="node134.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3211 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3212 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
			`<B> Next:</B> <A NAME=tex2html3210 HREF="node136.html"> Character Attributes</A>`
			`<B>Up:</B> <A NAME=tex2html3208 HREF="clm.html">Common Lisp the Language</A>`
			`<B> Previous:</B> <A NAME=tex2html3202 HREF="node134.html"> Implementation Parameters</A>`
			`<HR> <P>`
			`<H1><A NAME=SECTION001700000000000000000>13. Characters</A></H1>`
			`<P>`
			`Common Lisp provides a character data type; objects of this type`
			`represent printed symbols such as letters.`
			`<P>`
			`In general, characters in Common Lisp are not true objects; <tt>eq</tt> cannot`
			`be counted upon to operate on them reliably. In particular,`
			`it is possible that the expression`
			`<P><pre>`
			`(let ((x z) (y z)) (eq x y))`
			`</pre><P>`
			`may be false rather than true, if the value of <tt>z</tt> is a character.`
			`<P>`
			`<hr>`
			`<b>Rationale:</b> This odd breakdown of <tt>eq</tt> in the case of characters`
			`allows the implementor enough design freedom to produce exceptionally`
			`efficient code on conventional architectures. In this respect the`
			`treatment of characters exactly parallels that of numbers, as described`
			`in chapter <A HREF="node121.html#NUMBER">12</A>.`
			`<hr>`
			`<P>`
			`<A NAME=STANDARDCHARREPERTOIRETABLE> </A><listing>`
			`------------------------------------------------------------------------------`
			`Table 13-1: Standard Character Labels, Glyphs, and Descriptions`

			SM05 @ commercial at SD13 ` grave accent
			`SP02 ! exclamation mark LA02 A capital A LA01 a small a`
			`SP04 " quotation mark LB02 B capital B LB01 b small b`
			`SM01 # number sign LC02 C capital C LC01 c small c`
			`SC03 $ dollar sign LD02 D capital D LD01 d small d`
			`SM02 % percent sign LE02 E capital E LE01 e small e`
			`SM03 & ampersand LF02 F capital F LF01 f small f`
			`SP05 ' apostrophe LG02 G capital G LG01 g small g`
			`SP06 ( left parenthesis LH02 H capital H LH01 h small h`
			`SP07 ) right parenthesis LI02 I capital I LI01 i small i`
			`SM04 * asterisk LJ02 J capital J LJ01 j small j`
			`SA01 + plus sign LK02 K capital K LK01 k small k`
			`SP08 , comma LL02 L capital L LL01 l small l`
			`SP10 - hyphen or minus sign LM02 M capital M LM01 m small m`
			`SP11 . period or full stop LN02 N capital N LN01 n small n`
			`SP12 / solidus LO02 O capital O LO01 o small o`
			`ND10 0 digit 0 LP02 P capital P LP01 p small p`
			`ND01 1 digit 1 LQ02 Q capital Q LQ01 q small q`
			`ND02 2 digit 2 LR02 R capital R LR01 r small r`
			`ND03 3 digit 3 LS02 S capital S LS01 s small s`
			`ND04 4 digit 4 LT02 T capital T LT01 t small t`
			`ND05 5 digit 5 LU02 U capital U LU01 u small u`
			`ND06 6 digit 6 LV02 V capital V LV01 v small v`
			`ND07 7 digit 7 LW02 W capital W LW01 w small w`
			`ND08 8 digit 8 LX02 X capital X LX01 x small x`
			`ND09 9 digit 9 LY02 Y capital Y LY01 y small y`
			`SP13 : colon LZ02 Z capital Z LZ01 z small z`
			`SP14 ; semicolon SM06 [ left square bracket SM11 { left curly bracket`
			`SA03 < less-than sign SM07 \ reverse solidus SM13 \| vertical bar`
			`SA04 = equals sign SM08 ] right square bracket SM14 } right curly bracket`
			`SA05 > greater-than sign SD15 ^ circumflex accent SD19 ~ tilde`
			`SP15 ? question mark SP09 _ low line`
			`------------------------------------------------------------------------------`
			`</listing>`

			`<p>`

			If two objects are to be compared for ``identity,'' but either might be
			`a character, then the predicate <tt>eql</tt> is probably appropriate.`
			`<P>`
			`<img align=bottom alt="change_begin" src="gif/change_begin.gif"><br>`
			`X3J13 voted in March 1989 (CHARACTER-PROPOSAL) <A NAME=14406> </A>`
			`to approve the following definitions and terminology for use in`
			`discussing character facilities in Common Lisp.`
			`<P>`
			`A <i>character repertoire</i> defines a collection of characters`
			`independent of their specific rendered image or font. (This corresponds`
			`to the mathematical notion of a set, but the term <i>character set</i>`
			`is avoided here because it has been used in the past to mean`
			`both what is here called a repertoire and what is here called a coded`
			`character set.)`
			`Character repertoires are specified independent of coding and their characters`
			`are identified only with a unique <i>character label</i>,`
			`a graphic symbol, and a character description.`
			`As an example, table <A HREF="node135.html#STANDARDCHARREPERTOIRETABLE">13-1</A>`
			`shows the character labels, graphic symbols, and character descriptions for`
			`all of the characters in the repertoire <tt>standard-char</tt>`
			`except for <tt>#\Space</tt> and <tt>#\Newline</tt>.`
			`<P>`
			`Every Common Lisp implementation must support the standard character repertoire`
			`as well as repertoires named <tt>base-character</tt>, <tt>extended-character</tt>,`
			`and <tt>character</tt>. Other repertoires may be supported as well.`
			`X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) <A NAME=14417> </A> to specify that names of`
			`repertoires may be used as type specifiers. Such types must be subtypes of <tt>character</tt>;`
			`that is, in a given implementation`
			`the repertoire named <tt>character</tt> must encompass all the character objects supported`
			`by that implementation.`
			`<P>`
			`A <i>coded character set</i> is a character repertoire plus an <i>encoding</i>`
			`that provides a bijective mapping between each character in the set and a number`
			`(typically a non-negative integer)`
			`that serves as the character representation.`
			`There are numerous internationally standardized coded character sets.`
			`<P>`
			`A character may be included in one or more character repertoires.`
			`Similarly, a character may be included in one or more coded character sets.`
			`<P>`
			`To ensure that each character is uniquely defined, we may use a universal registry of`
			`characters that incorporates a collection of distinguished repertoires`
			`called <i>character scripts</i> that form an exhaustive partition of all characters.`
			`That is, each character is included in exactly one character script.`
			`(Draft ISO 10646 Coded Character Set Standard, if eventually approved as a standard,`
			`may become the practical realization of this universal registry.)`
			`<P>`
			`(X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) <A NAME=14423> </A> to specify that`
			`an implementation must document the character scripts it supports.`
			`For each script the documentation should discuss character labels,`
			`glyphs, and descriptions; any canonicalization processes performed`
			`by the reader that result in treating distinct characters as equivalent;`
			`any canonicalization performed by <tt>format</tt> in processing directives;`
			`the behavior of <tt>char-upcase</tt>, <tt>char-downcase</tt>, and the predicates`
			`<tt>alpha-char-p</tt>, <tt>upper-case-p</tt>, <tt>lower-case-p</tt>, <tt>both-case-p</tt>,`
			`<tt>graphic-char-p</tt>, <tt>alphanumericp</tt>, <tt>char-equal</tt>, <tt>char-not-equal</tt>,`
			`<tt>char-lessp</tt>, <tt>char-greaterp</tt>, <tt>char-not-greaterp</tt>, and <tt>char-not-lessp</tt>`
			`for characters in the script; and behavior with respect to input and output,`
			`including coded character sets and external coding schemes.)`
			`<P>`
			`In Common Lisp a <i>character</i> data object is identified by its <i>character code</i>,`
			`a unique numerical code. Each character code is composed from a character script`
			`and a character label. The convention by which a character script and`
			`character label compose a character code is implementation dependent.`
			`[X3J13 did not approve all parts of the proposal from its Subcommittee`
			`on Characters. As a result, some features that were approved appear to`
			`have no purpose. X3J13 wished to support the standardization by ISO of character`
			`scripts and coded character sets but declined to design facilities for use in`
			`Common Lisp until there has been more progress by ISO in this area.`
			`The approval of the terminology for scripts and labels gives a hint`
			`to implementors of likely directions for Common Lisp in the future.]`
			`<P>`
			`A character object that is classified as <i>graphic</i>, or displayable,`
			`has an associated <i>glpyh</i>. The glyph is the visual representation`
			`of the character. All other character data objects are classified as`
			`<i>non-graphic</i>.`
			`<P>`
			`This terminology assigns names to Common Lisp concepts`
			`in a manner consistent with`
			`related concepts discussed in various ISO standards for coded`
			`character sets and provides a demarcation between standardization`
			`activities. For example, facilities for manipulating characters,`
			`character scripts, and coded character sets are properly defined`
			`by a Common Lisp standard, but Common Lisp should not define`
			`standard character sets or standard character scripts.`
			`<br><img align=bottom alt="change_end" src="gif/change_end.gif">`
			`<P>`
			`<HR>`
			`<UL>`
			`<LI> <A NAME=tex2html3213 HREF="node136.html#SECTION001710000000000000000"> Character Attributes</A>`
			`<LI> <A NAME=tex2html3214 HREF="node137.html#SECTION001720000000000000000"> Predicates on Characters</A>`
			`<LI> <A NAME=tex2html3215 HREF="node138.html#SECTION001730000000000000000"> Character Construction and Selection</A>`
			`<LI> <A NAME=tex2html3216 HREF="node139.html#SECTION001740000000000000000"> Character Conversions</A>`
			`<LI> <A NAME=tex2html3217 HREF="node140.html#SECTION001750000000000000000"> Character Control-Bit Functions</A>`
			`</UL>`
			<BR> <HR><A NAME=tex2html3209 HREF="node136.html"><IMG ALIGN=BOTTOM ALT="next" SRC="icons/next_motif.gif"></A> <A NAME=tex2html3207 HREF="clm.html"><IMG ALIGN=BOTTOM ALT="up" SRC="icons/up_motif.gif"></A> <A NAME=tex2html3201 HREF="node134.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="icons/previous_motif.gif"></A> <A NAME=tex2html3211 HREF="node1.html"><IMG ALIGN=BOTTOM ALT="contents" SRC="icons/contents_motif.gif"></A> <A NAME=tex2html3212 HREF="index.html"><IMG ALIGN=BOTTOM ALT="index" SRC="icons/index_motif.gif"></A> <BR>
			`<B> Next:</B> <A NAME=tex2html3210 HREF="node136.html"> Character Attributes</A>`
			`<B>Up:</B> <A NAME=tex2html3208 HREF="clm.html">Common Lisp the Language</A>`
			`<B> Previous:</B> <A NAME=tex2html3202 HREF="node134.html"> Implementation Parameters</A>`
			`<HR> <P>`
			`<HR>`
			`<P><ADDRESS>`
			`AI.Repository@cs.cmu.edu`
			`</ADDRESS>`
			`</BODY>`