137 lines
6.8 KiB
HTML
137 lines
6.8 KiB
HTML
|
<!DOCTYPE html>
|
||
|
<html>
|
||
|
<!-- Created by GNU Texinfo 7.1, https://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<!-- This manual documents Guile version 3.0.10.
|
||
|
|
||
|
Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,
|
||
|
Inc.
|
||
|
|
||
|
Copyright (C) 2021 Maxime Devos
|
||
|
|
||
|
Copyright (C) 2024 Tomas Volf
|
||
|
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
|
any later version published by the Free Software Foundation; with no
|
||
|
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
|
||
|
copy of the license is included in the section entitled "GNU Free
|
||
|
Documentation License." -->
|
||
|
<title>BOM Handling (Guile Reference Manual)</title>
|
||
|
|
||
|
<meta name="description" content="BOM Handling (Guile Reference Manual)">
|
||
|
<meta name="keywords" content="BOM Handling (Guile Reference Manual)">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content=".texi2any-real">
|
||
|
<meta name="viewport" content="width=device-width,initial-scale=1">
|
||
|
|
||
|
<link href="index.html" rel="start" title="Top">
|
||
|
<link href="Concept-Index.html" rel="index" title="Concept Index">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="Input-and-Output.html" rel="up" title="Input and Output">
|
||
|
<link href="Non_002dBlocking-I_002fO.html" rel="prev" title="Non-Blocking I/O">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
|
||
|
span:hover a.copiable-link {visibility: visible}
|
||
|
ul.mark-bullet {list-style-type: disc}
|
||
|
-->
|
||
|
</style>
|
||
|
<link rel="stylesheet" type="text/css" href="https://www.gnu.org/software/gnulib/manual.css">
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en">
|
||
|
<div class="subsection-level-extent" id="BOM-Handling">
|
||
|
<div class="nav-panel">
|
||
|
<p>
|
||
|
Previous: <a href="Non_002dBlocking-I_002fO.html" accesskey="p" rel="prev">Non-Blocking I/O</a>, Up: <a href="Input-and-Output.html" accesskey="u" rel="up">Input and Output</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<h4 class="subsection" id="Handling-of-Unicode-Byte-Order-Marks"><span>6.12.14 Handling of Unicode Byte Order Marks<a class="copiable-link" href="#Handling-of-Unicode-Byte-Order-Marks"> ¶</a></span></h4>
|
||
|
<a class="index-entry-id" id="index-BOM"></a>
|
||
|
<a class="index-entry-id" id="index-byte-order-mark"></a>
|
||
|
|
||
|
<p>This section documents the finer points of Guile’s handling of Unicode
|
||
|
byte order marks (BOMs). A byte order mark (U+FEFF) is typically found
|
||
|
at the start of a UTF-16 or UTF-32 stream, to allow readers to reliably
|
||
|
determine the byte order. Occasionally, a BOM is found at the start of
|
||
|
a UTF-8 stream, but this is much less common and not generally
|
||
|
recommended.
|
||
|
</p>
|
||
|
<p>Guile attempts to handle BOMs automatically, and in accordance with the
|
||
|
recommendations of the Unicode Standard, when the port encoding is set
|
||
|
to <code class="code">UTF-8</code>, <code class="code">UTF-16</code>, or <code class="code">UTF-32</code>. In brief, Guile
|
||
|
automatically writes a BOM at the start of a UTF-16 or UTF-32 stream,
|
||
|
and automatically consumes one from the start of a UTF-8, UTF-16, or
|
||
|
UTF-32 stream.
|
||
|
</p>
|
||
|
<p>As specified in the Unicode Standard, a BOM is only handled specially at
|
||
|
the start of a stream, and only if the port encoding is set to
|
||
|
<code class="code">UTF-8</code>, <code class="code">UTF-16</code> or <code class="code">UTF-32</code>. If the port encoding is
|
||
|
set to <code class="code">UTF-16BE</code>, <code class="code">UTF-16LE</code>, <code class="code">UTF-32BE</code>, or
|
||
|
<code class="code">UTF-32LE</code>, then BOMs are <em class="emph">not</em> handled specially, and none of
|
||
|
the special handling described in this section applies.
|
||
|
</p>
|
||
|
<ul class="itemize mark-bullet">
|
||
|
<li>To ensure that Guile will properly detect the byte order of a UTF-16 or
|
||
|
UTF-32 stream, you must perform a textual read before any writes, seeks,
|
||
|
or binary I/O. Guile will not attempt to read a BOM unless a read is
|
||
|
explicitly requested at the start of the stream.
|
||
|
|
||
|
</li><li>If a textual write is performed before the first read, then an arbitrary
|
||
|
byte order will be chosen. Currently, big endian is the default on all
|
||
|
platforms, but that may change in the future. If you wish to explicitly
|
||
|
control the byte order of an output stream, set the port encoding to
|
||
|
<code class="code">UTF-16BE</code>, <code class="code">UTF-16LE</code>, <code class="code">UTF-32BE</code>, or <code class="code">UTF-32LE</code>,
|
||
|
and explicitly write a BOM (<code class="code">#\xFEFF</code>) if desired.
|
||
|
|
||
|
</li><li>If <code class="code">set-port-encoding!</code> is called in the middle of a stream, Guile
|
||
|
treats this as a new logical “start of stream” for purposes of BOM
|
||
|
handling, and will forget about any BOMs that had previously been seen.
|
||
|
Therefore, it may choose a different byte order than had been used
|
||
|
previously. This is intended to support multiple logical text streams
|
||
|
embedded within a larger binary stream.
|
||
|
|
||
|
</li><li>Binary I/O operations are not guaranteed to update Guile’s notion of
|
||
|
whether the port is at the “start of the stream”, nor are they
|
||
|
guaranteed to produce or consume BOMs.
|
||
|
|
||
|
</li><li>For ports that support seeking (e.g. normal files), the input and output
|
||
|
streams are considered linked: if the user reads first, then a BOM will
|
||
|
be consumed (if appropriate), but later writes will <em class="emph">not</em> produce a
|
||
|
BOM. Similarly, if the user writes first, then later reads will
|
||
|
<em class="emph">not</em> consume a BOM.
|
||
|
|
||
|
</li><li>For ports that are not random access (e.g. pipes, sockets, and
|
||
|
terminals), the input and output streams are considered
|
||
|
<em class="emph">independent</em> for purposes of BOM handling: the first read will
|
||
|
consume a BOM (if appropriate), and the first write will <em class="emph">also</em>
|
||
|
produce a BOM (if appropriate). However, the input and output streams
|
||
|
will always use the same byte order.
|
||
|
|
||
|
</li><li>Seeks to the beginning of a file will set the “start of stream” flags.
|
||
|
Therefore, a subsequent textual read or write will consume or produce a
|
||
|
BOM. However, unlike <code class="code">set-port-encoding!</code>, if a byte order had
|
||
|
already been chosen for the port, it will remain in effect after a seek,
|
||
|
and cannot be changed by the presence of a BOM. Seeks anywhere other
|
||
|
than the beginning of a file clear the “start of stream” flags.
|
||
|
</li></ul>
|
||
|
|
||
|
|
||
|
</div>
|
||
|
<hr>
|
||
|
<div class="nav-panel">
|
||
|
<p>
|
||
|
Previous: <a href="Non_002dBlocking-I_002fO.html">Non-Blocking I/O</a>, Up: <a href="Input-and-Output.html">Input and Output</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|