1
0
Fork 0
cl-sites/guile.html_node/BOM-Handling.html

137 lines
6.8 KiB
HTML
Raw Normal View History

2024-12-17 12:49:28 +01:00
<!DOCTYPE html>
<html>
<!-- Created by GNU Texinfo 7.1, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This manual documents Guile version 3.0.10.
Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,
Inc.
Copyright (C) 2021 Maxime Devos
Copyright (C) 2024 Tomas Volf
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
copy of the license is included in the section entitled "GNU Free
Documentation License." -->
<title>BOM Handling (Guile Reference Manual)</title>
<meta name="description" content="BOM Handling (Guile Reference Manual)">
<meta name="keywords" content="BOM Handling (Guile Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content=".texi2any-real">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Concept-Index.html" rel="index" title="Concept Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Input-and-Output.html" rel="up" title="Input and Output">
<link href="Non_002dBlocking-I_002fO.html" rel="prev" title="Non-Blocking I/O">
<style type="text/css">
<!--
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
span:hover a.copiable-link {visibility: visible}
ul.mark-bullet {list-style-type: disc}
-->
</style>
<link rel="stylesheet" type="text/css" href="https://www.gnu.org/software/gnulib/manual.css">
</head>
<body lang="en">
<div class="subsection-level-extent" id="BOM-Handling">
<div class="nav-panel">
<p>
Previous: <a href="Non_002dBlocking-I_002fO.html" accesskey="p" rel="prev">Non-Blocking I/O</a>, Up: <a href="Input-and-Output.html" accesskey="u" rel="up">Input and Output</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<h4 class="subsection" id="Handling-of-Unicode-Byte-Order-Marks"><span>6.12.14 Handling of Unicode Byte Order Marks<a class="copiable-link" href="#Handling-of-Unicode-Byte-Order-Marks"> &para;</a></span></h4>
<a class="index-entry-id" id="index-BOM"></a>
<a class="index-entry-id" id="index-byte-order-mark"></a>
<p>This section documents the finer points of Guile&rsquo;s handling of Unicode
byte order marks (BOMs). A byte order mark (U+FEFF) is typically found
at the start of a UTF-16 or UTF-32 stream, to allow readers to reliably
determine the byte order. Occasionally, a BOM is found at the start of
a UTF-8 stream, but this is much less common and not generally
recommended.
</p>
<p>Guile attempts to handle BOMs automatically, and in accordance with the
recommendations of the Unicode Standard, when the port encoding is set
to <code class="code">UTF-8</code>, <code class="code">UTF-16</code>, or <code class="code">UTF-32</code>. In brief, Guile
automatically writes a BOM at the start of a UTF-16 or UTF-32 stream,
and automatically consumes one from the start of a UTF-8, UTF-16, or
UTF-32 stream.
</p>
<p>As specified in the Unicode Standard, a BOM is only handled specially at
the start of a stream, and only if the port encoding is set to
<code class="code">UTF-8</code>, <code class="code">UTF-16</code> or <code class="code">UTF-32</code>. If the port encoding is
set to <code class="code">UTF-16BE</code>, <code class="code">UTF-16LE</code>, <code class="code">UTF-32BE</code>, or
<code class="code">UTF-32LE</code>, then BOMs are <em class="emph">not</em> handled specially, and none of
the special handling described in this section applies.
</p>
<ul class="itemize mark-bullet">
<li>To ensure that Guile will properly detect the byte order of a UTF-16 or
UTF-32 stream, you must perform a textual read before any writes, seeks,
or binary I/O. Guile will not attempt to read a BOM unless a read is
explicitly requested at the start of the stream.
</li><li>If a textual write is performed before the first read, then an arbitrary
byte order will be chosen. Currently, big endian is the default on all
platforms, but that may change in the future. If you wish to explicitly
control the byte order of an output stream, set the port encoding to
<code class="code">UTF-16BE</code>, <code class="code">UTF-16LE</code>, <code class="code">UTF-32BE</code>, or <code class="code">UTF-32LE</code>,
and explicitly write a BOM (<code class="code">#\xFEFF</code>) if desired.
</li><li>If <code class="code">set-port-encoding!</code> is called in the middle of a stream, Guile
treats this as a new logical &ldquo;start of stream&rdquo; for purposes of BOM
handling, and will forget about any BOMs that had previously been seen.
Therefore, it may choose a different byte order than had been used
previously. This is intended to support multiple logical text streams
embedded within a larger binary stream.
</li><li>Binary I/O operations are not guaranteed to update Guile&rsquo;s notion of
whether the port is at the &ldquo;start of the stream&rdquo;, nor are they
guaranteed to produce or consume BOMs.
</li><li>For ports that support seeking (e.g. normal files), the input and output
streams are considered linked: if the user reads first, then a BOM will
be consumed (if appropriate), but later writes will <em class="emph">not</em> produce a
BOM. Similarly, if the user writes first, then later reads will
<em class="emph">not</em> consume a BOM.
</li><li>For ports that are not random access (e.g. pipes, sockets, and
terminals), the input and output streams are considered
<em class="emph">independent</em> for purposes of BOM handling: the first read will
consume a BOM (if appropriate), and the first write will <em class="emph">also</em>
produce a BOM (if appropriate). However, the input and output streams
will always use the same byte order.
</li><li>Seeks to the beginning of a file will set the &ldquo;start of stream&rdquo; flags.
Therefore, a subsequent textual read or write will consume or produce a
BOM. However, unlike <code class="code">set-port-encoding!</code>, if a byte order had
already been chosen for the port, it will remain in effect after a seek,
and cannot be changed by the presence of a BOM. Seeks anywhere other
than the beginning of a file clear the &ldquo;start of stream&rdquo; flags.
</li></ul>
</div>
<hr>
<div class="nav-panel">
<p>
Previous: <a href="Non_002dBlocking-I_002fO.html">Non-Blocking I/O</a>, Up: <a href="Input-and-Output.html">Input and Output</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>