1
0
Fork 0
cl-sites/guile.html_node/Conversion-to_002ffrom-C.html
2024-12-17 12:49:28 +01:00

283 lines
23 KiB
HTML

<!DOCTYPE html>
<html>
<!-- Created by GNU Texinfo 7.1, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This manual documents Guile version 3.0.10.
Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,
Inc.
Copyright (C) 2021 Maxime Devos
Copyright (C) 2024 Tomas Volf
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
copy of the license is included in the section entitled "GNU Free
Documentation License." -->
<title>Conversion to/from C (Guile Reference Manual)</title>
<meta name="description" content="Conversion to/from C (Guile Reference Manual)">
<meta name="keywords" content="Conversion to/from C (Guile Reference Manual)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content=".texi2any-real">
<meta name="viewport" content="width=device-width,initial-scale=1">
<link href="index.html" rel="start" title="Top">
<link href="Concept-Index.html" rel="index" title="Concept Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Strings.html" rel="up" title="Strings">
<link href="String-Internals.html" rel="next" title="String Internals">
<link href="Representing-Strings-as-Bytes.html" rel="prev" title="Representing Strings as Bytes">
<style type="text/css">
<!--
a.copiable-link {visibility: hidden; text-decoration: none; line-height: 0em}
span:hover a.copiable-link {visibility: visible}
strong.def-name {font-family: monospace; font-weight: bold; font-size: larger}
-->
</style>
<link rel="stylesheet" type="text/css" href="https://www.gnu.org/software/gnulib/manual.css">
</head>
<body lang="en">
<div class="subsubsection-level-extent" id="Conversion-to_002ffrom-C">
<div class="nav-panel">
<p>
Next: <a href="String-Internals.html" accesskey="n" rel="next">String Internals</a>, Previous: <a href="Representing-Strings-as-Bytes.html" accesskey="p" rel="prev">Representing Strings as Bytes</a>, Up: <a href="Strings.html" accesskey="u" rel="up">Strings</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<h4 class="subsubsection" id="Conversion-to_002ffrom-C-1"><span>6.6.5.14 Conversion to/from C<a class="copiable-link" href="#Conversion-to_002ffrom-C-1"> &para;</a></span></h4>
<p>When creating a Scheme string from a C string or when converting a
Scheme string to a C string, the concept of character encoding becomes
important.
</p>
<p>In C, a string is just a sequence of bytes, and the character encoding
describes the relation between these bytes and the actual characters
that make up the string. For Scheme strings, character encoding is not
an issue (most of the time), since in Scheme you usually treat strings
as character sequences, not byte sequences.
</p>
<p>Converting to C and converting from C each have their own challenges.
</p>
<p>When converting from C to Scheme, it is important that the sequence of
bytes in the C string be valid with respect to its encoding. ASCII
strings, for example, can&rsquo;t have any bytes greater than 127. An ASCII
byte greater than 127 is considered <em class="emph">ill-formed</em> and cannot be
converted into a Scheme character.
</p>
<p>Problems can occur in the reverse operation as well. Not all character
encodings can hold all possible Scheme characters. Some encodings, like
ASCII for example, can only describe a small subset of all possible
characters. So, when converting to C, one must first decide what to do
with Scheme characters that can&rsquo;t be represented in the C string.
</p>
<p>Converting a Scheme string to a C string will often allocate fresh
memory to hold the result. You must take care that this memory is
properly freed eventually. In many cases, this can be achieved by
using <code class="code">scm_dynwind_free</code> inside an appropriate dynwind context,
See <a class="xref" href="Dynamic-Wind.html">Dynamic Wind</a>.
</p>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005ffrom_005flocale_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_locale_string</strong> <code class="def-code-arguments">(const char *str)</code><a class="copiable-link" href="#index-scm_005ffrom_005flocale_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ffrom_005flocale_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_locale_stringn</strong> <code class="def-code-arguments">(const char *str, size_t len)</code><a class="copiable-link" href="#index-scm_005ffrom_005flocale_005fstringn"> &para;</a></span></dt>
<dd><p>Creates a new Scheme string that has the same contents as <var class="var">str</var> when
interpreted in the character encoding of the current locale.
</p>
<p>For <code class="code">scm_from_locale_string</code>, <var class="var">str</var> must be null-terminated.
</p>
<p>For <code class="code">scm_from_locale_stringn</code>, <var class="var">len</var> specifies the length of
<var class="var">str</var> in bytes, and <var class="var">str</var> does not need to be null-terminated.
If <var class="var">len</var> is <code class="code">(size_t)-1</code>, then <var class="var">str</var> does need to be
null-terminated and the real length will be found with <code class="code">strlen</code>.
</p>
<p>If the C string is ill-formed, an error will be raised.
</p>
<p>Note that these functions should <em class="emph">not</em> be used to convert C string
constants, because there is no guarantee that the current locale will
match that of the execution character set, used for string and character
constants. Most modern C compilers use UTF-8 by default, so to convert
C string constants we recommend <code class="code">scm_from_utf8_string</code>.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005ftake_005flocale_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_take_locale_string</strong> <code class="def-code-arguments">(char *str)</code><a class="copiable-link" href="#index-scm_005ftake_005flocale_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ftake_005flocale_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_take_locale_stringn</strong> <code class="def-code-arguments">(char *str, size_t len)</code><a class="copiable-link" href="#index-scm_005ftake_005flocale_005fstringn"> &para;</a></span></dt>
<dd><p>Like <code class="code">scm_from_locale_string</code> and <code class="code">scm_from_locale_stringn</code>,
respectively, but also frees <var class="var">str</var> with <code class="code">free</code> eventually.
Thus, you can use this function when you would free <var class="var">str</var> anyway
immediately after creating the Scheme string. In certain cases, Guile
can then use <var class="var">str</var> directly as its internal representation.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005fto_005flocale_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">char *</code> <strong class="def-name">scm_to_locale_string</strong> <code class="def-code-arguments">(SCM str)</code><a class="copiable-link" href="#index-scm_005fto_005flocale_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005fto_005flocale_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">char *</code> <strong class="def-name">scm_to_locale_stringn</strong> <code class="def-code-arguments">(SCM str, size_t *lenp)</code><a class="copiable-link" href="#index-scm_005fto_005flocale_005fstringn"> &para;</a></span></dt>
<dd><p>Returns a C string with the same contents as <var class="var">str</var> in the character
encoding of the current locale. The C string must be freed with
<code class="code">free</code> eventually, maybe by using <code class="code">scm_dynwind_free</code>,
See <a class="xref" href="Dynamic-Wind.html">Dynamic Wind</a>.
</p>
<p>For <code class="code">scm_to_locale_string</code>, the returned string is
null-terminated and an error is signaled when <var class="var">str</var> contains
<code class="code">#\nul</code> characters.
</p>
<p>For <code class="code">scm_to_locale_stringn</code> and <var class="var">lenp</var> not <code class="code">NULL</code>,
<var class="var">str</var> might contain <code class="code">#\nul</code> characters and the length of the
returned string in bytes is stored in <code class="code">*<var class="var">lenp</var></code>. The
returned string will not be null-terminated in this case. If
<var class="var">lenp</var> is <code class="code">NULL</code>, <code class="code">scm_to_locale_stringn</code> behaves like
<code class="code">scm_to_locale_string</code>.
</p>
<p>If a character in <var class="var">str</var> cannot be represented in the character
encoding of the current locale, the default port conversion strategy is
used. See <a class="xref" href="Ports.html">Ports</a>, for more on conversion strategies.
</p>
<p>If the conversion strategy is <code class="code">error</code>, an error will be raised. If
it is <code class="code">substitute</code>, a replacement character, such as a question
mark, will be inserted in its place. If it is <code class="code">escape</code>, a hex
escape will be inserted in its place.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005fto_005flocale_005fstringbuf"><span class="category-def">C Function: </span><span><code class="def-type">size_t</code> <strong class="def-name">scm_to_locale_stringbuf</strong> <code class="def-code-arguments">(SCM str, char *buf, size_t max_len)</code><a class="copiable-link" href="#index-scm_005fto_005flocale_005fstringbuf"> &para;</a></span></dt>
<dd><p>Puts <var class="var">str</var> as a C string in the current locale encoding into the
memory pointed to by <var class="var">buf</var>. The buffer at <var class="var">buf</var> has room for
<var class="var">max_len</var> bytes and <code class="code">scm_to_local_stringbuf</code> will never store
more than that. No terminating <code class="code">'\0'</code> will be stored.
</p>
<p>The return value of <code class="code">scm_to_locale_stringbuf</code> is the number of
bytes that are needed for all of <var class="var">str</var>, regardless of whether
<var class="var">buf</var> was large enough to hold them. Thus, when the return value
is larger than <var class="var">max_len</var>, only <var class="var">max_len</var> bytes have been
stored and you probably need to try again with a larger buffer.
</p></dd></dl>
<p>For most situations, string conversion should occur using the current
locale, such as with the functions above. But there may be cases where
one wants to convert strings from a character encoding other than the
locale&rsquo;s character encoding. For these cases, the lower-level functions
<code class="code">scm_to_stringn</code> and <code class="code">scm_from_stringn</code> are provided. These
functions should seldom be necessary if one is properly using locales.
</p>
<dl class="first-deftp">
<dt class="deftp" id="index-scm_005ft_005fstring_005ffailed_005fconversion_005fhandler"><span class="category-def">C Type: </span><span><strong class="def-name">scm_t_string_failed_conversion_handler</strong><a class="copiable-link" href="#index-scm_005ft_005fstring_005ffailed_005fconversion_005fhandler"> &para;</a></span></dt>
<dd><p>This is an enumerated type that can take one of three values:
<code class="code">SCM_FAILED_CONVERSION_ERROR</code>,
<code class="code">SCM_FAILED_CONVERSION_QUESTION_MARK</code>, and
<code class="code">SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE</code>. They are used to indicate
a strategy for handling characters that cannot be converted to or from a
given character encoding. <code class="code">SCM_FAILED_CONVERSION_ERROR</code> indicates
that a conversion should throw an error if some characters cannot be
converted. <code class="code">SCM_FAILED_CONVERSION_QUESTION_MARK</code> indicates that a
conversion should replace unconvertable characters with the question
mark character. And, <code class="code">SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE</code>
requests that a conversion should replace an unconvertable character
with an escape sequence.
</p>
<p>While all three strategies apply when converting Scheme strings to C,
only <code class="code">SCM_FAILED_CONVERSION_ERROR</code> and
<code class="code">SCM_FAILED_CONVERSION_QUESTION_MARK</code> can be used when converting C
strings to Scheme.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-_002ascm_005fto_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">char</code> <strong class="def-name">*scm_to_stringn</strong> <code class="def-code-arguments">(SCM str, size_t *lenp, const char *encoding, scm_t_string_failed_conversion_handler handler)</code><a class="copiable-link" href="#index-_002ascm_005fto_005fstringn"> &para;</a></span></dt>
<dd><p>This function returns a newly allocated C string from the Guile string
<var class="var">str</var>. The length of the returned string in bytes will be returned in
<var class="var">lenp</var>. The character encoding of the C string is passed as the ASCII,
null-terminated C string <var class="var">encoding</var>. The <var class="var">handler</var> parameter
gives a strategy for dealing with characters that cannot be converted
into <var class="var">encoding</var>.
</p>
<p>If <var class="var">lenp</var> is <code class="code">NULL</code>, this function will return a null-terminated C
string. It will throw an error if the string contains a null
character.
</p>
<p>The Scheme interface to this function is <code class="code">string-&gt;bytevector</code>, from the
<code class="code">ice-9 iconv</code> module. See <a class="xref" href="Representing-Strings-as-Bytes.html">Representing Strings as Bytes</a>.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005ffrom_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_stringn</strong> <code class="def-code-arguments">(const char *str, size_t len, const char *encoding, scm_t_string_failed_conversion_handler handler)</code><a class="copiable-link" href="#index-scm_005ffrom_005fstringn"> &para;</a></span></dt>
<dd><p>This function returns a scheme string from the C string <var class="var">str</var>. The
length in bytes of the C string is input as <var class="var">len</var>. The encoding of the C
string is passed as the ASCII, null-terminated C string <code class="code">encoding</code>.
The <var class="var">handler</var> parameters suggests a strategy for dealing with
unconvertable characters.
</p>
<p>The Scheme interface to this function is <code class="code">bytevector-&gt;string</code>.
See <a class="xref" href="Representing-Strings-as-Bytes.html">Representing Strings as Bytes</a>.
</p></dd></dl>
<p>The following conversion functions are provided as a convenience for the
most commonly used encodings.
</p>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005ffrom_005flatin1_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_latin1_string</strong> <code class="def-code-arguments">(const char *str)</code><a class="copiable-link" href="#index-scm_005ffrom_005flatin1_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ffrom_005futf8_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_utf8_string</strong> <code class="def-code-arguments">(const char *str)</code><a class="copiable-link" href="#index-scm_005ffrom_005futf8_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ffrom_005futf32_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_utf32_string</strong> <code class="def-code-arguments">(const scm_t_wchar *str)</code><a class="copiable-link" href="#index-scm_005ffrom_005futf32_005fstring"> &para;</a></span></dt>
<dd><p>Return a scheme string from the null-terminated C string <var class="var">str</var>,
which is ISO-8859-1-, UTF-8-, or UTF-32-encoded. These functions should
be used to convert hard-coded C string constants into Scheme strings.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005ffrom_005flatin1_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_latin1_stringn</strong> <code class="def-code-arguments">(const char *str, size_t len)</code><a class="copiable-link" href="#index-scm_005ffrom_005flatin1_005fstringn"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ffrom_005futf8_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_utf8_stringn</strong> <code class="def-code-arguments">(const char *str, size_t len)</code><a class="copiable-link" href="#index-scm_005ffrom_005futf8_005fstringn"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ffrom_005futf32_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_utf32_stringn</strong> <code class="def-code-arguments">(const scm_t_wchar *str, size_t len)</code><a class="copiable-link" href="#index-scm_005ffrom_005futf32_005fstringn"> &para;</a></span></dt>
<dd><p>Return a scheme string from C string <var class="var">str</var>, which is ISO-8859-1-,
UTF-8-, or UTF-32-encoded, of length <var class="var">len</var>. <var class="var">len</var> is the number
of bytes pointed to by <var class="var">str</var> for <code class="code">scm_from_latin1_stringn</code> and
<code class="code">scm_from_utf8_stringn</code>; it is the number of elements (code points)
in <var class="var">str</var> in the case of <code class="code">scm_from_utf32_stringn</code>.
</p></dd></dl>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-_002ascm_005fto_005flatin1_005fstringn"><span class="category-def">C function: </span><span><code class="def-type">char</code> <strong class="def-name">*scm_to_latin1_stringn</strong> <code class="def-code-arguments">(SCM str, size_t *lenp)</code><a class="copiable-link" href="#index-_002ascm_005fto_005flatin1_005fstringn"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-_002ascm_005fto_005futf8_005fstringn"><span class="category-def">C function: </span><span><code class="def-type">char</code> <strong class="def-name">*scm_to_utf8_stringn</strong> <code class="def-code-arguments">(SCM str, size_t *lenp)</code><a class="copiable-link" href="#index-_002ascm_005fto_005futf8_005fstringn"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-_002ascm_005fto_005futf32_005fstringn"><span class="category-def">C function: </span><span><code class="def-type">scm_t_wchar</code> <strong class="def-name">*scm_to_utf32_stringn</strong> <code class="def-code-arguments">(SCM str, size_t *lenp)</code><a class="copiable-link" href="#index-_002ascm_005fto_005futf32_005fstringn"> &para;</a></span></dt>
<dd><p>Return a newly allocated, ISO-8859-1-, UTF-8-, or UTF-32-encoded C string
from Scheme string <var class="var">str</var>. An error is thrown when <var class="var">str</var>
cannot be converted to the specified encoding. If <var class="var">lenp</var> is
<code class="code">NULL</code>, the returned C string will be null terminated, and an error
will be thrown if the C string would otherwise contain null
characters. If <var class="var">lenp</var> is not <code class="code">NULL</code>, the string is not null terminated,
and the length of the returned string is returned in <var class="var">lenp</var>. The length
returned is the number of bytes for <code class="code">scm_to_latin1_stringn</code> and
<code class="code">scm_to_utf8_stringn</code>; it is the number of elements (code points)
for <code class="code">scm_to_utf32_stringn</code>.
</p></dd></dl>
<p>It is not often the case, but sometimes when you are dealing with the
implementation details of a port, you need to encode and decode strings
according to the encoding and conversion strategy of the port. There
are some convenience functions for that purpose as well.
</p>
<dl class="first-deftypefn">
<dt class="deftypefn" id="index-scm_005ffrom_005fport_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_port_string</strong> <code class="def-code-arguments">(const char *str, SCM port)</code><a class="copiable-link" href="#index-scm_005ffrom_005fport_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005ffrom_005fport_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">SCM</code> <strong class="def-name">scm_from_port_stringn</strong> <code class="def-code-arguments">(const char *str, size_t len, SCM port)</code><a class="copiable-link" href="#index-scm_005ffrom_005fport_005fstringn"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005fto_005fport_005fstring"><span class="category-def">C Function: </span><span><code class="def-type">char*</code> <strong class="def-name">scm_to_port_string</strong> <code class="def-code-arguments">(SCM str, SCM port)</code><a class="copiable-link" href="#index-scm_005fto_005fport_005fstring"> &para;</a></span></dt>
<dt class="deftypefnx def-cmd-deftypefn" id="index-scm_005fto_005fport_005fstringn"><span class="category-def">C Function: </span><span><code class="def-type">char*</code> <strong class="def-name">scm_to_port_stringn</strong> <code class="def-code-arguments">(SCM str, size_t *lenp, SCM port)</code><a class="copiable-link" href="#index-scm_005fto_005fport_005fstringn"> &para;</a></span></dt>
<dd><p>Like <code class="code">scm_from_stringn</code> and friends, except they take their
encoding and conversion strategy from a given port object.
</p></dd></dl>
</div>
<hr>
<div class="nav-panel">
<p>
Next: <a href="String-Internals.html">String Internals</a>, Previous: <a href="Representing-Strings-as-Bytes.html">Representing Strings as Bytes</a>, Up: <a href="Strings.html">Strings</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>