emacs.d/clones/lisp/docs.racket-lang.org/guide/encodings.html

26 lines
17 KiB
HTML
Raw Normal View History

2022-08-15 11:06:56 +02:00
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"/><meta name="viewport" content="width=device-width, initial-scale=0.8"/><title>8.5&nbsp;Bytes, Characters, and Encodings</title><link rel="stylesheet" type="text/css" href="../scribble.css" title="default"/><link rel="stylesheet" type="text/css" href="../racket.css" title="default"/><link rel="stylesheet" type="text/css" href="../manual-style.css" title="default"/><link rel="stylesheet" type="text/css" href="../manual-racket.css" title="default"/><link rel="stylesheet" type="text/css" href="../manual-racket.css" title="default"/><link rel="stylesheet" type="text/css" href="../doc-site.css" title="default"/><script type="text/javascript" src="../scribble-common.js"></script><script type="text/javascript" src="../manual-racket.js"></script><script type="text/javascript" src="../manual-racket.js"></script><script type="text/javascript" src="../doc-site.js"></script><script type="text/javascript" src="../local-redirect/local-redirect.js"></script><script type="text/javascript" src="../local-redirect/local-user-redirect.js"></script><!--[if IE 6]><style type="text/css">.SIEHidden { overflow: hidden; }</style><![endif]--></head><body id="doc-racket-lang-org"><div class="tocset"><div class="tocview"><div class="tocviewlist tocviewlisttopspace"><div class="tocviewtitle"><table cellspacing="0" cellpadding="0"><tr><td style="width: 1em;"><a href="javascript:void(0);" title="Expand/Collapse" class="tocviewtoggle" onclick="TocviewToggle(this,&quot;tocview_0&quot;);">&#9658;</a></td><td></td><td><a href="index.html" class="tocviewlink" data-pltdoc="x">The Racket Guide</a></td></tr></table></div><div class="tocviewsublisttop" style="display: none;" id="tocview_0"><table cellspacing="0" cellpadding="0"><tr><td align="right">1&nbsp;</td><td><a href="intro.html" class="tocviewlink" data-pltdoc="x">Welcome to Racket</a></td></tr><tr><td align="right">2&nbsp;</td><td><a href="to-scheme.html" class="tocviewlink" data-pltdoc="x">Racket Essentials</a></td></tr><tr><td align="right">3&nbsp;</td><td><a href="datatypes.html" class="tocviewlink" data-pltdoc="x">Built-<wbr></wbr>In Datatypes</a></td></tr><tr><td align="right">4&nbsp;</td><td><a href="scheme-forms.html" class="tocviewlink" data-pltdoc="x">Expressions and Definitions</a></td></tr><tr><td align="right">5&nbsp;</td><td><a href="define-struct.html" class="tocviewlink" data-pltdoc="x">Programmer-<wbr></wbr>Defined Datatypes</a></td></tr><tr><td align="right">6&nbsp;</td><td><a href="modules.html" class="tocviewlink" data-pltdoc="x">Modules</a></td></tr><tr><td align="right">7&nbsp;</td><td><a href="contracts.html" class="tocviewlink" data-pltdoc="x">Contracts</a></td></tr><tr><td align="right">8&nbsp;</td><td><a href="i_o.html" class="tocviewselflink" data-pltdoc="x">Input and Output</a></td></tr><tr><td align="right">9&nbsp;</td><td><a href="regexp.html" class="tocviewlink" data-pltdoc="x">Regular Expressions</a></td></tr><tr><td align="right">10&nbsp;</td><td><a href="control.html" class="tocviewlink" data-pltdoc="x">Exceptions and Control</a></td></tr><tr><td align="right">11&nbsp;</td><td><a href="for.html" class="tocviewlink" data-pltdoc="x">Iterations and Comprehensions</a></td></tr><tr><td align="right">12&nbsp;</td><td><a href="match.html" class="tocviewlink" data-pltdoc="x">Pattern Matching</a></td></tr><tr><td align="right">13&nbsp;</td><td><a href="classes.html" class="tocviewlink" data-pltdoc="x">Classes and Objects</a></td></tr><tr><td align="right">14&nbsp;</td><td><a href="units.html" class="tocviewlink" data-pltdoc="x">Units</a></td></tr><tr><td align="right">15&nbsp;</td><td><a href="reflection.html" class="tocviewlink" data-pltdoc="x">Reflection and Dynamic Evaluation</a></td></tr><tr><td align="right">16&nbsp;</td><td><a href="macros.html" class="tocviewlink" data-pltdoc="x">Macros</a></td></tr><tr><td align="right">17&nbsp;</td><td><a href="languages.html" class="tocviewlink" data-pltdoc="x">Creating Languages</a></td></tr><tr><td align="right">18&nbsp;</td><td><a href="concurrency.html"
and <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Writing.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write</a></span> all work in terms of <a href="characters.html#%28tech._character%29" class="techoutside" data-pltdoc="x"><span class="techinside">characters</span></a> (which
correspond to Unicode scalar values). Conceptually, they are
implemented in terms of <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-char</a></span> and <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Output.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write-char</a></span>.</p><p>More primitively, ports read and write <a href="bytestrings.html#%28tech._byte%29" class="techoutside" data-pltdoc="x"><span class="techinside">bytes</span></a>, instead of
<a href="characters.html#%28tech._character%29" class="techoutside" data-pltdoc="x"><span class="techinside">characters</span></a>. The functions <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-byte%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-byte</a></span> and
<span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Output.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write-byte%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write-byte</a></span> read and write raw bytes. Other functions, such as
<span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-bytes-line%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-bytes-line</a></span>, build on top of byte operations instead of
character operations.</p><p>In fact, the <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-char</a></span> and <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Output.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write-char</a></span> functions are
conceptually implemented in terms of <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-byte%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-byte</a></span> and
<span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Output.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write-byte%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write-byte</a></span>. When a single byte&rsquo;s value is less than 128, then
it corresponds to an ASCII character. Any other byte is treated as
part of a UTF-8 sequence, where UTF-8 is a particular standard way of
encoding Unicode scalar values in bytes (which has the nice property
that ASCII characters are encoded as themselves). Thus, a single
<span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-char</a></span> may call <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-byte%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-byte</a></span> multiple times, and a
single <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Output.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write-char</a></span> may generate multiple output bytes.</p><p>The <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-char</a></span> and <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Output.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._write-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">write-char</a></span> operations
<span class="emph">always</span> use a UTF-8 encoding. If you have a text stream that
uses a different encoding, or if you want to generate a text stream in
a different encoding, use <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=port-lib.html%23%2528def._%2528%2528lib._racket%252Fport..rkt%2529._reencode-input-port%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">reencode-input-port</a></span> or
<span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=port-lib.html%23%2528def._%2528%2528lib._racket%252Fport..rkt%2529._reencode-output-port%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">reencode-output-port</a></span>. The <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=port-lib.html%23%2528def._%2528%2528lib._racket%252Fport..rkt%2529._reencode-input-port%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">reencode-input-port</a></span>
function converts an input stream from an encoding that you specify
into a UTF-8 stream; that way, <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-char%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-char</a></span> sees UTF-8
encodings, even though the original used a different encoding. Beware,
however, that <span class="RktSym"><a href="https://download.racket-lang.org/releases/8.6/doc/local-redirect/index.html?doc=reference&amp;rel=Byte_and_String_Input.html%23%2528def._%2528%2528quote._%7E23%7E25kernel%2529._read-byte%2529%2529&amp;version=8.6" class="RktValLink Sq" data-pltdoc="x">read-byte</a></span> also sees the re-encoded data,
instead of the original byte stream.</p><div class="navsetbottom"><span class="navleft"><form class="searchform"><input class="searchbox" id="searchbox" type="text" tabindex="1" placeholder="...search manuals..." title="Enter a search string to search the manuals" onkeypress="return DoSearchKey(event, this, &quot;8.6&quot;, &quot;../&quot;);"/></form>&nbsp;&nbsp;<a href="https://docs.racket-lang.org/index.html" title="up to the documentation top" data-pltdoc="x" onclick="return GotoPLTRoot(&quot;8.6&quot;);">top</a><span class="tocsettoggle">&nbsp;&nbsp;<a href="javascript:void(0);" title="show/hide table of contents" onclick="TocsetToggle();">contents</a></span></span><span class="navright">&nbsp;&nbsp;<a href="serialization.html" title="backward to &quot;8.4 Datatypes and Serialization&quot;" data-pltdoc="x">&larr; prev</a>&nbsp;&nbsp;<a href="i_o.html" title="up to &quot;8 Input and Output&quot;" data-pltdoc="x">up</a>&nbsp;&nbsp;<a href="io-patterns.html" title="forward to &quot;8.6 I/O Patterns&quot;" data-pltdoc="x">next &rarr;</a></span>&nbsp;</div></div></div><div id="contextindicator">&nbsp;</div></body></html>