1
0
Fork 0
cl-sites/lispcookbook.github.io/cl-cookbook/strings.html

1328 lines
44 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<title>Strings</title>
<meta charset="utf-8">
<meta name="description" content="A collection of examples of using Common Lisp">
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<link rel="icon" href=
"assets/cl-logo-blue.png"/>
<link rel="stylesheet" href=
"assets/style.css">
<script type="text/javascript" src=
"assets/highlight-lisp.js">
</script>
<script type="text/javascript" src=
"assets/jquery-3.2.1.min.js">
</script>
<script type="text/javascript" src=
"assets/jquery.toc/jquery.toc.min.js">
</script>
<script type="text/javascript" src=
"assets/toggle-toc.js">
</script>
<link rel="stylesheet" href=
"assets/github.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
</head>
<body>
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Strings</h1>
<div id="logo-container">
<a href="index.html">
<img id="logo" src="assets/cl-logo-blue.png"/>
</a>
<div id="searchform-container">
<form onsubmit="duckSearch()" action="javascript:void(0)">
<input id="searchField" type="text" value="" placeholder="Search...">
</form>
</div>
<div id="toc-container" class="toc-close">
<div id="toc-title">Table of Contents</div>
<ul id="toc" class="list-unstyled"></ul>
</div>
</div>
<div id="content-container">
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Strings</h1>
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
<!-- <p class="announce"> -->
<!-- 📢 🤶 ⭐ -->
<!-- <a style="font-size: 120%" href="https://www.udemy.com/course/common-lisp-programming/?couponCode=LISPY-XMAS2023" title="This course is under a paywall on the Udemy platform. Several videos are freely available so you can judge before diving in. vindarel is (I am) the main contributor to this Cookbook."> Discover our contributor's Lisp course with this Christmas coupon.</a> -->
<!-- <strong> -->
<!-- Recently added: 18 videos on MACROS. -->
<!-- </strong> -->
<!-- <a style="font-size: 90%" href="https://github.com/vindarel/common-lisp-course-in-videos/">Learn more</a>. -->
<!-- </p> -->
<p class="announce">
📢 New videos: <a href="https://www.youtube.com/watch?v=h_noB1sI_e8">web dev demo part 1</a>, <a href="https://www.youtube.com/watch?v=xnwc7irnc8k">dynamic page with HTMX</a>, <a href="https://www.youtube.com/watch?v=Zpn86AQRVN8">Weblocks demo</a>
</p>
<p class="announce-neutral">
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
</p>
<div id="content"
<p>The most important thing to know about strings in Common Lisp is probably that
they are arrays and thus also sequences. This implies that all concepts that are
applicable to arrays and sequences also apply to strings. If you cant find a
particular string function, make sure youve also searched for the more general
<a href="http://www.gigamonkeys.com/book/collections.html">array or sequence functions</a>. Well only cover a fraction of what can be done
with and to strings here.</p>
<p>ASDF3, which is included with almost all Common Lisp implementations,
includes
<a href="https://gitlab.common-lisp.net/asdf/asdf/blob/master/uiop/README.md">Utilities for Implementation- and OS- Portability (UIOP)</a>,
which defines functions to work on strings (<code>strcat</code>,
<code>string-prefix-p</code>, <code>string-enclosed-p</code>, <code>first-char</code>, <code>last-char</code>,
<code>split-string</code>, <code>stripln</code>).</p>
<p>Some external libraries available on Quicklisp bring some more
functionality or some shorter ways to do.</p>
<ul>
<li><a href="https://github.com/vindarel/cl-str">str</a> defines <code>trim</code>, <code>words</code>,
<code>unwords</code>, <code>lines</code>, <code>unlines</code>, <code>concat</code>, <code>split</code>, <code>shorten</code>, <code>repeat</code>,
<code>replace-all</code>, <code>starts-with?</code>, <code>ends-with?</code>, <code>blankp</code>, <code>emptyp</code>, …</li>
<li><a href="https://github.com/ruricolist/serapeum/blob/master/REFERENCE.md#strings">Serapeum</a> is a large set of utilities with many string manipulation functions.</li>
<li><a href="https://github.com/rudolfochrist/cl-change-case">cl-change-case</a>
has functions to convert strings between camelCase, param-case,
snake_case and more. They are also included into <code>str</code>.</li>
<li><a href="https://github.com/cbaggers/mk-string-metrics">mk-string-metrics</a>
has functions to calculate various string metrics efficiently
(Damerau-Levenshtein, Hamming, Jaro, Jaro-Winkler, Levenshtein, etc),</li>
<li>and <code>cl-ppcre</code> can come in handy, for example
<code>ppcre:replace-regexp-all</code>. See the <a href="regexp.html">regexp</a> section.</li>
</ul>
<p>Last but not least, when youll need to tackle the <code>format</code> construct,
dont miss the following resources:</p>
<ul>
<li>the official <a href="http://www.lispworks.com/documentation/HyperSpec/Body/22_c.htm">CLHS documentation</a></li>
<li>a <a href="http://clqr.boundp.org/">quick reference</a></li>
<li>a <a href="https://www.hexstreamsoft.com/articles/common-lisp-format-reference/clhs-summary/#subsections-summary-table">CLHS summary on HexstreamSoft</a></li>
<li>the list of all format directives at the end of this document.</li>
<li>plus a Slime tip: type <code>C-c C-d ~</code> plus a letter of a format directive to open up its documentation. Use TAB-completion to list them all. Again more useful with <code>ivy-mode</code> or <code>helm-mode</code>.</li>
</ul>
<h2 id="creating-strings">Creating strings</h2>
<p>A string is created with double quotes, all right, but we can recall
these other ways:</p>
<ul>
<li>using <code>format nil</code> doesnt <em>print</em> but returns a new string (see
more examples of <code>format</code> below):</li>
</ul>
<pre><code class="language-lisp">(defparameter *person* "you")
(format nil "hello ~a" *person*) ;; =&gt; "hello you"
</code></pre>
<ul>
<li><code>make-string count</code> creates a string of the given length. The
<code>:initial-element</code> character is repeated <code>count</code> times:</li>
</ul>
<pre><code class="language-lisp">(make-string 3 :initial-element #\♥) ;; =&gt; "♥♥♥"
</code></pre>
<h2 id="accessing-substrings">Accessing Substrings</h2>
<p>As a string is a sequence, you can access substrings with the SUBSEQ
function. The index into the string is, as always, zero-based. The third,
optional, argument is the index of the first character which is not a part of
the substring, it is not the length of the substring.</p>
<pre><code class="language-lisp">(defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
(subseq *my-string* 8)
"Marx"
(subseq *my-string* 0 7)
"Groucho"
(subseq *my-string* 1 5)
"rouc"
</code></pre>
<p>You can also manipulate the substring if you use SUBSEQ together with SETF.</p>
<pre><code class="language-lisp">* (defparameter *my-string* (string "Harpo Marx"))
*MY-STRING*
* (subseq *my-string* 0 5)
"Harpo"
* (setf (subseq *my-string* 0 5) "Chico")
"Chico"
* *my-string*
"Chico Marx"
</code></pre>
<p>But note that the string isnt “stretchable”. To cite from the HyperSpec: “If
the subsequence and the new sequence are not of equal length, the shorter length
determines the number of elements that are replaced.” For example:</p>
<pre><code class="language-lisp">* (defparameter *my-string* (string "Karl Marx"))
*MY-STRING*
* (subseq *my-string* 0 4)
"Karl"
* (setf (subseq *my-string* 0 4) "Harpo")
"Harpo"
* *my-string*
"Harp Marx"
* (subseq *my-string* 4)
" Marx"
* (setf (subseq *my-string* 4) "o Marx")
"o Marx"
* *my-string*
"Harpo Mar"
</code></pre>
<h2 id="accessing-individual-characters">Accessing Individual Characters</h2>
<p>You can use the function CHAR to access individual characters of a string. CHAR
can also be used in conjunction with SETF.</p>
<pre><code class="language-lisp">* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (char *my-string* 11)
#\x
* (char *my-string* 7)
#\Space
* (char *my-string* 6)
#\o
* (setf (char *my-string* 6) #\y)
#\y
* *my-string*
"Grouchy Marx"
</code></pre>
<p>Note that theres also SCHAR. If efficiency is important, SCHAR can be a bit
faster where appropriate.</p>
<p>Because strings are arrays and thus sequences, you can also use the more generic
functions AREF and ELT (which are more general while CHAR might be implemented
more efficiently).</p>
<pre><code class="language-lisp">* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (aref *my-string* 3)
#\u
* (elt *my-string* 8)
#\M
</code></pre>
<p>Each character in a string has an integer code. The range of recognized codes
and Lisps ability to print them is directed related to your implementations
character set support, e.g. ISO-8859-1, or Unicode. Here are some examples in
SBCL of UTF-8 which encodes characters as 1 to 4 8 bit bytes. The first example
shows a character outside the first 128 chars, or what is considered the normal
Latin character set. The second example shows a multibyte encoding (beyond the
value 255). Notice the Lisp reader can round-trip characters by name.</p>
<pre><code class="language-lisp">* (stream-external-format *standard-output*)
:UTF-8
* (code-char 200)
#\LATIN_CAPITAL_LETTER_E_WITH_GRAVE
* (char-code #\LATIN_CAPITAL_LETTER_E_WITH_GRAVE)
200
* (code-char 2048)
#\SAMARITAN_LETTER_ALAF
* (char-code #\SAMARITAN_LETTER_ALAF)
2048
</code></pre>
<p>Check out the UTF-8 Wikipedia article for the range of supported characters and
their encodings.</p>
<h2 id="remove-or-replace-characters-from-a-string">Remove or replace characters from a string</h2>
<p>Theres a slew of (sequence) functions that can be used to manipulate a string
and well only provide some examples here. See the sequences dictionary in the
HyperSpec for more.</p>
<p><code>remove</code> one character from a string:</p>
<pre><code class="language-lisp">* (remove #\o "Harpo Marx")
"Harp Marx"
* (remove #\a "Harpo Marx")
"Hrpo Mrx"
* (remove #\a "Harpo Marx" :start 2)
"Harpo Mrx"
* (remove-if #'upper-case-p "Harpo Marx")
"arpo arx"
</code></pre>
<p>Replace one character with <code>substitute</code> (non destructive) or <code>replace</code> (destructive):</p>
<pre><code class="language-lisp">* (substitute #\u #\o "Groucho Marx")
"Gruuchu Marx"
* (substitute-if #\_ #'upper-case-p "Groucho Marx")
"_roucho _arx"
* (defparameter *my-string* (string "Zeppo Marx"))
*MY-STRING*
* (replace *my-string* "Harpo" :end1 5)
"Harpo Marx"
* *my-string*
"Harpo Marx"
</code></pre>
<h2 id="concatenating-strings">Concatenating Strings</h2>
<p>The name says it all: CONCATENATE is your friend. Note that this is a generic
sequence function and you have to provide the result type as the first argument.</p>
<pre><code class="language-lisp">* (concatenate 'string "Karl" " " "Marx")
"Karl Marx"
* (concatenate 'list "Karl" " " "Marx")
(#\K #\a #\r #\l #\Space #\M #\a #\r #\x)
</code></pre>
<p>With UIOP, use <code>strcat</code>:</p>
<pre><code class="language-lisp">* (uiop:strcat "karl" " " marx")
</code></pre>
<p>or with the library <code>str</code>, use <code>concat</code>:</p>
<pre><code class="language-lisp">* (str:concat "foo" "bar")
</code></pre>
<p>If you have to construct a string out of many parts, all of these calls to
CONCATENATE seem wasteful, though. There are at least three other good ways to
construct a string piecemeal, depending on what exactly your data is. If you
build your string one character at a time, make it an adjustable VECTOR (a
one-dimensional ARRAY) of type character with a fill-pointer of zero, then use
VECTOR-PUSH-EXTEND on it. That way, you can also give hints to the system if you
can estimate how long the string will be. (See the optional third argument to
VECTOR-PUSH-EXTEND.)</p>
<pre><code class="language-lisp">* (defparameter *my-string* (make-array 0
:element-type 'character
:fill-pointer 0
:adjustable t))
*MY-STRING*
* *my-string*
""
* (dolist (char '(#\Z #\a #\p #\p #\a))
(vector-push-extend char *my-string*))
NIL
* *my-string*
"Zappa"
</code></pre>
<p>If the string will be constructed out of (the printed representations of)
arbitrary objects, (symbols, numbers, characters, strings, …), you can use
FORMAT with an output stream argument of NIL. This directs FORMAT to return the
indicated output as a string.</p>
<pre><code class="language-lisp">* (format nil "This is a string with a list ~A in it"
'(1 2 3))
"This is a string with a list (1 2 3) in it"
</code></pre>
<p>We can use the looping constructs of the FORMAT mini language to emulate
CONCATENATE.</p>
<pre><code class="language-lisp">* (format nil "The Marx brothers are:~{ ~A~}."
'("Groucho" "Harpo" "Chico" "Zeppo" "Karl"))
"The Marx brothers are: Groucho Harpo Chico Zeppo Karl."
</code></pre>
<p>FORMAT can do a lot more processing but it has a relatively arcane syntax. After
this last example, you can find the details in the CLHS section about formatted
output.</p>
<pre><code class="language-lisp">* (format nil "The Marx brothers are:~{ ~A~^,~}."
'("Groucho" "Harpo" "Chico" "Zeppo" "Karl"))
"The Marx brothers are: Groucho, Harpo, Chico, Zeppo, Karl."
</code></pre>
<p>Another way to create a string out of the printed representation of various
object is using WITH-OUTPUT-TO-STRING. The value of this handy macro is a string
containing everything that was output to the string stream within the body to
the macro. This means you also have the full power of FORMAT at your disposal,
should you need it.</p>
<pre><code class="language-lisp">* (with-output-to-string (stream)
(dolist (char '(#\Z #\a #\p #\p #\a #\, #\Space))
(princ char stream))
(format stream "~S - ~S" 1940 1993))
"Zappa, 1940 - 1993"
</code></pre>
<h2 id="processing-a-string-one-character-at-a-time">Processing a String One Character at a Time</h2>
<p>Use the MAP function to process a string one character at a time.</p>
<pre><code class="language-lisp">* (defparameter *my-string* (string "Groucho Marx"))
*MY-STRING*
* (map 'string (lambda (c) (print c)) *my-string*)
#\G
#\r
#\o
#\u
#\c
#\h
#\o
#\Space
#\M
#\a
#\r
#\x
"Groucho Marx"
</code></pre>
<p>Or do it with LOOP.</p>
<pre><code class="language-lisp">* (loop for char across "Zeppo"
collect char)
(#\Z #\e #\p #\p #\o)
</code></pre>
<h2 id="reversing-a-string-by-word-or-character">Reversing a String by Word or Character</h2>
<p>Reversing a string by character is easy using the built-in REVERSE function (or
its destructive counterpart NREVERSE).</p>
<pre><code class="language-lisp">*(defparameter *my-string* (string "DSL"))
*MY-STRING*
* (reverse *my-string*)
"LSD"
</code></pre>
<p>Theres no one-liner in CL to reverse a string by word (like you would do it in
Perl with split and join). You either have to use functions from an external
library like SPLIT-SEQUENCE or you have to roll your own solution.</p>
<p>Heres an attempt with the <code>str</code> library:</p>
<pre><code class="language-lisp">* (defparameter *singing* "singing in the rain")
*SINGING*
* (str:words *SINGING*)
("singing" "in" "the" "rain")
* (reverse *)
("rain" "the" "in" "singing")
* (str:unwords *)
"rain the in singing"
</code></pre>
<p>And heres another one with no external dependencies:</p>
<pre><code class="language-lisp">* (defun split-by-one-space (string)
"Returns a list of substrings of string
divided by ONE space each.
Note: Two consecutive spaces will be seen as
if there were an empty string between them."
(loop for i = 0 then (1+ j)
as j = (position #\Space string :start i)
collect (subseq string i j)
while j))
SPLIT-BY-ONE-SPACE
* (split-by-one-space "Singing in the rain")
("Singing" "in" "the" "rain")
* (split-by-one-space "Singing in the rain")
("Singing" "in" "the" "" "rain")
* (split-by-one-space "Cool")
("Cool")
* (split-by-one-space " Cool ")
("" "Cool" "")
* (defun join-string-list (string-list)
"Concatenates a list of strings
and puts spaces between the elements."
(format nil "~{~A~^ ~}" string-list))
JOIN-STRING-LIST
* (join-string-list '("We" "want" "better" "examples"))
"We want better examples"
* (join-string-list '("Really"))
"Really"
* (join-string-list '())
""
* (join-string-list
(nreverse
(split-by-one-space
"Reverse this sentence by word")))
"word by sentence this Reverse"
</code></pre>
<h2 id="dealing-with-unicode-strings">Dealing with unicode strings</h2>
<p>Well use here <a href="http://www.sbcl.org/manual/index.html#String-operations">SBCLs string operations</a>. More generally, see <a href="http://www.sbcl.org/manual/index.html#Unicode-Support">SBCLs unicode support</a>.</p>
<h3 id="sorting-unicode-strings-alphabetically">Sorting unicode strings alphabetically</h3>
<p>Sorting unicode strings with <code>string-lessp</code> as the comparison function
isnt satisfying:</p>
<pre><code class="language-lisp">(sort '("Aaa" "Ééé" "Zzz") #'string-lessp)
;; ("Aaa" "Zzz" "Ééé")
</code></pre>
<p>With <a href="http://www.sbcl.org/manual/#String-operations">SBCL</a>, use <code>sb-unicode:unicode&lt;</code>:</p>
<pre><code class="language-lisp">(sort '("Aaa" "Ééé" "Zzz") #'sb-unicode:unicode&lt;)
;; ("Aaa" "Ééé" "Zzz")
</code></pre>
<h3 id="breaking-strings-into-graphenes-sentences-lines-and-words">Breaking strings into graphenes, sentences, lines and words</h3>
<p>These functions use SBCLs <a href="http://www.sbcl.org/manual/#String-operations"><code>sb-unicode</code></a>: they are SBCL specific.</p>
<p>Use <code>sb-unicode:sentences</code> to break a string into sentences according
to the default sentence breaking rules.</p>
<p>Use <code>sb-unicode:lines</code> to break a string into lines that are no wider
than the <code>:margin</code> keyword argument. Combining marks will always be kept together with their base characters, and spaces (but not other types of whitespace) will be removed from the end of lines. If <code>:margin</code> is unspecified, it defaults to 80 characters</p>
<pre><code class="language-lisp">(sb-unicode:lines "A first sentence. A second somewhat long one." :margin 10)
;; =&gt; ("A first"
"sentence."
"A second"
"somewhat"
"long one.")
</code></pre>
<p>See also <code>sb-unicode:words</code> and <code>sb-unicode:graphenes</code>.</p>
<p>Tip: you can ensure these functions are run only in SBCL with a feature flag:</p>
<pre><code>#+sbcl
(runs on sbcl)
#-sbcl
(runs on other implementations)
</code></pre>
<h2 id="controlling-case">Controlling Case</h2>
<p>Common Lisp has a couple of functions to control the case of a string.</p>
<pre><code class="language-lisp">* (string-upcase "cool")
"COOL"
* (string-upcase "Cool")
"COOL"
* (string-downcase "COOL")
"cool"
* (string-downcase "Cool")
"cool"
* (string-capitalize "cool")
"Cool"
* (string-capitalize "cool example")
"Cool Example"
</code></pre>
<p>These functions take the <code>:start</code> and <code>:end</code> keyword arguments so you can optionally
only manipulate a part of the string. They also have destructive counterparts
whose names starts with “N”.</p>
<pre><code class="language-lisp">* (string-capitalize "cool example" :start 5)
"cool Example"
* (string-capitalize "cool example" :end 5)
"Cool example"
* (defparameter *my-string* (string "BIG"))
*MY-STRING*
* (defparameter *my-downcase-string* (nstring-downcase *my-string*))
*MY-DOWNCASE-STRING*
* *my-downcase-string*
"big"
* *my-string*
"big"
</code></pre>
<p>Note this potential caveat: according to the HyperSpec,</p>
<blockquote>
<p>for STRING-UPCASE, STRING-DOWNCASE, and STRING-CAPITALIZE, string is not modified. However, if no characters in string require conversion, the result may be either string or a copy of it, at the implementations discretion.</p>
</blockquote>
<p>This implies that the last result in
the following example is implementation-dependent - it may either be “BIG” or
“BUG”. If you want to be sure, use COPY-SEQ.</p>
<pre><code class="language-lisp">* (defparameter *my-string* (string "BIG"))
*MY-STRING*
* (defparameter *my-upcase-string* (string-upcase *my-string*))
*MY-UPCASE-STRING*
* (setf (char *my-string* 1) #\U)
#\U
* *my-string*
"BUG"
* *my-upcase-string*
"BIG"
</code></pre>
<h3 id="with-the-format-function">With the format function</h3>
<p>The format function has directives to change the case of words:</p>
<h4 id="to-lower-case--">To lower case: ~( ~)</h4>
<pre><code class="language-lisp">(format t "~(~a~)" "HELLO WORLD")
;; =&gt; hello world
</code></pre>
<h4 id="capitalize-every-word--">Capitalize every word: ~:( ~)</h4>
<pre><code class="language-lisp">(format t "~:(~a~)" "HELLO WORLD")
Hello World
NIL
</code></pre>
<h4 id="capitalize-the-first-word--">Capitalize the first word: ~@( ~)</h4>
<pre><code class="language-lisp">(format t "~@(~a~)" "hello world")
Hello world
NIL
</code></pre>
<h4 id="to-upper-case--">To upper case: ~@:( ~)</h4>
<p>Where we re-use the colon and the @:</p>
<pre><code class="language-lisp">(format t "~@:(~a~)" "hello world")
HELLO WORLD
NIL
</code></pre>
<h2 id="trimming-blanks-from-the-ends-of-a-string">Trimming Blanks from the Ends of a String</h2>
<p>Not only can you trim blanks, but you can get rid of arbitrary characters. The
functions STRING-TRIM, STRING-LEFT-TRIM and STRING-RIGHT-TRIM return a substring
of their second argument where all characters that are in the first argument are
removed off the beginning and/or the end. The first argument can be any sequence
of characters.</p>
<pre><code class="language-lisp">* (string-trim " " " trim me ")
"trim me"
* (string-trim " et" " trim me ")
"rim m"
* (string-left-trim " et" " trim me ")
"rim me "
* (string-right-trim " et" " trim me ")
" trim m"
* (string-right-trim '(#\Space #\e #\t) " trim me ")
" trim m"
* (string-right-trim '(#\Space #\e #\t #\m) " trim me ")
</code></pre>
<p>Note: The caveat mentioned in the section about Controlling Case also applies
here.</p>
<h2 id="converting-between-symbols-and-strings">Converting between Symbols and Strings</h2>
<p>The function INTERN will “convert” a string to a symbol. Actually, it will check
whether the symbol denoted by the string (its first argument) is already
accessible in the package (its second, optional, argument which defaults to the
current package) and enter it, if necessary, into this package. It is beyond the
scope of this chapter to explain all the concepts involved and to address the
second return value of this function. See the CLHS chapter about packages for
details.</p>
<p>Note that the case of the string is relevant.</p>
<pre><code class="language-lisp">* (in-package "COMMON-LISP-USER")
#&lt;The COMMON-LISP-USER package, 35/44 internal, 0/9 external&gt;
* (intern "MY-SYMBOL")
MY-SYMBOL
NIL
* (intern "MY-SYMBOL")
MY-SYMBOL
:INTERNAL
* (export 'MY-SYMBOL)
T
* (intern "MY-SYMBOL")
MY-SYMBOL
:EXTERNAL
* (intern "My-Symbol")
|My-Symbol|
NIL
* (intern "MY-SYMBOL" "KEYWORD")
:MY-SYMBOL
NIL
* (intern "MY-SYMBOL" "KEYWORD")
:MY-SYMBOL
:EXTERNAL
</code></pre>
<p>To do the opposite, convert from a symbol to a string, use SYMBOL-NAME or
STRING.</p>
<pre><code class="language-lisp">* (symbol-name 'MY-SYMBOL)
"MY-SYMBOL"
* (symbol-name 'my-symbol)
"MY-SYMBOL"
* (symbol-name '|my-symbol|)
"my-symbol"
* (string 'howdy)
"HOWDY"
</code></pre>
<h2 id="converting-between-characters-and-strings">Converting between Characters and Strings</h2>
<p>You can use COERCE to convert a string of length 1 to a character. You can also
use COERCE to convert any sequence of characters into a string. You can not use
COERCE to convert a character to a string, though - youll have to use STRING
instead.</p>
<pre><code class="language-lisp">* (coerce "a" 'character)
#\a
* (coerce (subseq "cool" 2 3) 'character)
#\o
* (coerce "cool" 'list)
(#\c #\o #\o #\l)
* (coerce '(#\h #\e #\y) 'string)
"hey"
* (coerce (nth 2 '(#\h #\e #\y)) 'character)
#\y
* (defparameter *my-array* (make-array 5 :initial-element #\x))
*MY-ARRAY*
* *my-array*
#(#\x #\x #\x #\x #\x)
* (coerce *my-array* 'string)
"xxxxx"
* (string 'howdy)
"HOWDY"
* (string #\y)
"y"
* (coerce #\y 'string)
#\y can't be converted to type STRING.
[Condition of type SIMPLE-TYPE-ERROR]
</code></pre>
<h2 id="finding-an-element-of-a-string">Finding an Element of a String</h2>
<p>Use <code>find</code>, <code>position</code>, and their <code>…-if</code> counterparts to find characters in a string, with the appropriate <code>:test</code> parameter:</p>
<pre><code class="language-lisp">(find #\t "Tea time." :test #'equal)
#\t
* (find #\t "Tea time." :test #'equalp)
#\T
* (find #\z "Tea time." :test #'equalp)
NIL
* (find-if #'digit-char-p "Tea time.")
#\1
* (find-if #'digit-char-p "Tea time." :from-end t)
#\0
(position #\t "Tea time." :test #'equal)
4 ;; &lt;= the first lowercase t
(position #\t "Tea time." :test #'equalp)
0 ;; &lt;= the first capital T
(position-if #'digit-char-p "Tea time is at 5'00.")
15
(position-if #'digit-char-p "Tea time is at 5'00." :from-end t)
18
</code></pre>
<p>Or use <code>count</code> and friends to count characters in a string:</p>
<pre><code class="language-lisp">(count #\t "Tea time." :test #'equal)
1 ;; &lt;= equal ignores the capital T
(count #\t "Tea time." :test #'equalp)
2 ;; &lt;= equalp counts the capital T
(count-if #'digit-char-p "Tea time is at 5'00.")
3
(count-if #'digit-char-p "Tea time is at 5'00." :start 18)
1
</code></pre>
<h2 id="finding-a-substring-of-a-string">Finding a Substring of a String</h2>
<p>The function <code>search</code> can find substrings of a string.</p>
<pre><code class="language-lisp">* (search "we" "If we can't be free we can at least be cheap")
3
* (search "we" "If we can't be free we can at least be cheap"
:from-end t)
20
* (search "we" "If we can't be free we can at least be cheap"
:start2 4)
20
* (search "we" "If we can't be free we can at least be cheap"
:end2 5 :from-end t)
3
* (search "FREE" "If we can't be free we can at least be cheap")
NIL
* (search "FREE" "If we can't be free we can at least be cheap"
:test #'char-equal)
15
</code></pre>
<h2 id="converting-a-string-to-a-number">Converting a String to a Number</h2>
<h3 id="to-an-integer-parse-integer">To an integer: parse-integer</h3>
<p>CL provides the <code>parse-integer</code> function to convert a string representation of an integer
to the corresponding numeric value. The second return value is the index into
the string where the parsing stopped.</p>
<pre><code class="language-lisp">(parse-integer "42")
42
2
(parse-integer "42" :start 1)
2
2
(parse-integer "42" :end 1)
4
1
(parse-integer "42" :radix 8)
34
2
(parse-integer " 42 ")
42
3
(parse-integer " 42 is forty-two" :junk-allowed t)
42
3
(parse-integer " 42 is forty-two")
Error in function PARSE-INTEGER:
There's junk in this string: " 42 is forty-two".
</code></pre>
<p><code>parse-integer</code> doesnt understand radix specifiers like <code>#X</code>, nor is there a
built-in function to parse other numeric types. You could use <code>read-from-string</code>
in this case.</p>
<h3 id="extracting-many-integers-from-a-string-ppcreall-matches-as-strings">Extracting many integers from a string: <code>ppcre:all-matches-as-strings</code></h3>
<p>We show this in the Regular Expressions chapter but while we are on this topic, you can find it super useful:</p>
<pre><code class="language-lisp">* (ppcre:all-matches-as-strings "-?\\d+" "42 is 41 plus 1")
;; ("42" "41" "1")
* (mapcar #'parse-integer *)
;; (42 41 1)
</code></pre>
<h3 id="to-any-number-read-from-string">To any number: <code>read-from-string</code></h3>
<p>Be aware that the full reader is in effect if youre using this
function. This can lead to vulnerability issues. You should use a
library like <code>parse-number</code> or <code>parse-float</code> instead.</p>
<pre><code class="language-lisp">(read-from-string "#X23")
35
4
(read-from-string "4.5")
4.5
3
(read-from-string "6/8")
3/4
3
(read-from-string "#C(6/8 1)")
#C(3/4 1)
9
(read-from-string "1.2e2")
120.00001
5
(read-from-string "symbol")
SYMBOL
6
(defparameter *foo* 42)
*FOO*
(read-from-string "#.(setq *foo* \"gotcha\")")
"gotcha"
23
*foo*
"gotcha"
</code></pre>
<h3 id="protecting-read-from-string">Protecting <code>read-from-string</code></h3>
<p>At the very least, if you are reading data coming from the outside, use this:</p>
<pre><code class="language-lisp">(let ((cl:*read-eval* nil))
(read-from-string "…"))
</code></pre>
<p>This prevents code to be evaluated at read-time. That way our last example, using the <code>#.</code> reader macro, would not work. Youll get the error “cant read #. while *READ-EVAL* is NIL”.</p>
<p>And better yet, for more protection from a possibly custom readtable that would introduce another reader macro:</p>
<pre><code class="language-lisp">(with-standard-io-syntax
(let ((cl:*read-eval* nil))
(read-from-string "…")))
</code></pre>
<h3 id="to-a-float-the-parse-float-library">To a float: the parse-float library</h3>
<p>There is no built-in function similar to <code>parse-integer</code> to parse
other number types. The external library
<a href="https://github.com/soemraws/parse-float">parse-float</a> does exactly
that. It doesnt use <code>read-from-string</code> so it is safe to use.</p>
<pre><code class="language-lisp">(ql:quickload "parse-float")
(parse-float:parse-float "1.2e2")
;; 120.00001
;; 5
</code></pre>
<p>LispWorks also has a <a href="http://www.lispworks.com/documentation/lw51/LWRM/html/lwref-228.htm">parse-float</a> function.</p>
<p>See also <a href="https://github.com/sharplispers/parse-number">parse-number</a>.</p>
<h2 id="converting-a-number-to-a-string">Converting a Number to a String</h2>
<p>The general function WRITE-TO-STRING or one of its simpler variants
PRIN1-TO-STRING or PRINC-TO-STRING may be used to convert a number to a
string. With WRITE-TO-STRING, the :base keyword argument may be used to change
the output base for a single call. To change the output base globally, set
<em>print-base</em> which defaults to 10. Remember in Lisp, rational numbers are
represented as quotients of two integers even when converted to strings.</p>
<pre><code class="language-lisp">(write-to-string 250)
"250"
(write-to-string 250.02)
"250.02"
(write-to-string 250 :base 5)
"2000"
(write-to-string (/ 1 3))
"1/3"
*
</code></pre>
<h2 id="comparing-strings">Comparing Strings</h2>
<p>The general functions EQUAL and EQUALP can be used to test whether two strings
are equal. The strings are compared element-by-element, either in a
case-sensitive manner (EQUAL) or not (EQUALP). Theres also a bunch of
string-specific comparison functions. Youll want to use these if youre
deploying implementation-defined attributes of characters. Check your vendors
documentation in this case.</p>
<p>Here are a few examples. Note that all functions that test for inequality return the position of the first mismatch as a generalized boolean. You can also use the generic sequence function MISMATCH if you need more versatility.</p>
<pre><code class="language-lisp">(string= "Marx" "Marx")
T
(string= "Marx" "marx")
NIL
(string-equal "Marx" "marx")
T
(string&lt; "Groucho" "Zeppo")
0
(string&lt; "groucho" "Zeppo")
NIL
(string-lessp "groucho" "Zeppo")
0
(mismatch "Harpo Marx" "Zeppo Marx" :from-end t :test #'char=)
3
</code></pre>
<h2 id="string-formatting">String formatting</h2>
<p>The <code>format</code> function has a lot of directives to print strings,
numbers, lists, going recursively, even calling Lisp functions,
etc. Well focus here on a few things to print and format strings.</p>
<p>The need of our examples arise when we want to print many strings and
justify them. Lets work with this list of movies:</p>
<pre><code class="language-lisp">(defparameter movies '(
(1 "Matrix" 5)
(10 "Matrix Trilogy swe sub" 3.3)
))
</code></pre>
<p>We want an aligned and justified result like this:</p>
<pre><code> 1 Matrix 5
10 Matrix Trilogy swe sub 3.3
</code></pre>
<p>Well use <code>mapcar</code> to iterate over our movies and experiment with the
format constructs.</p>
<pre><code class="language-lisp">(mapcar (lambda (it)
(format t "~a ~a ~a~%" (first it) (second it) (third it)))
movies)
</code></pre>
<p>which prints:</p>
<pre><code>1 Matrix 5
10 Matrix Trilogy swe sub 3.3
</code></pre>
<h3 id="structure-of-format">Structure of format</h3>
<p>Format directives start with <code>~</code>. A final character like <code>A</code> or <code>a</code>
(they are case insensitive) defines the directive. In between, it can
accept coma-separated options and parameters.</p>
<p>Print a tilde with <code>~~</code>, or 10 with <code>~10~</code>.</p>
<p>Other directives include:</p>
<ul>
<li><code>R</code>: Roman (e.g., prints in English): <code>(format t "~R" 20)</code> =&gt; “twenty”.</li>
<li><code>$</code>: monetary: <code>(format t "~$" 21982)</code> =&gt; 21982.00</li>
<li><code>D</code>, <code>B</code>, <code>O</code>, <code>X</code>: Decimal, Binary, Octal, Hexadecimal.</li>
<li><code>F</code>: fixed-format Floating point.</li>
<li><code>P</code>: plural: <code>(format nil "~D famil~:@P/~D famil~:@P" 7 1)</code> =&gt; “7 families/1 family”</li>
</ul>
<h3 id="basic-primitive-a-or-a-aesthetics">Basic primitive: ~A or ~a (Aesthetics)</h3>
<p><code>(format t "~a" movies)</code> is the most basic primitive.</p>
<pre><code class="language-lisp">(format nil "~a" movies)
;; =&gt; "((1 Matrix 5) (10 Matrix Trilogy swe sub 3.3))"
</code></pre>
<h3 id="newlines--and-">Newlines: ~% and ~&amp;</h3>
<p><code>~%</code> is the newline character. <code>~10%</code> prints 10 newlines.</p>
<p><code>~&amp;</code> does not print a newline if the output stream is already at one.</p>
<h3 id="tabs">Tabs</h3>
<p>with <code>~T</code>. Also <code>~10T</code> works.</p>
<p>Also <code>i</code> for indentation.</p>
<h3 id="justifying-text--add-padding-on-the-right">Justifying text / add padding on the right</h3>
<p>Use a number as parameter, like <code>~2a</code>:</p>
<pre><code class="language-lisp">(format nil "~20a" "yo")
;; "yo "
</code></pre>
<pre><code class="language-lisp">(mapcar (lambda (it)
(format t "~2a ~a ~a~%" (first it) (second it) (third it)))
movies)
</code></pre>
<pre><code>1 Matrix 5
10 Matrix Trilogy swe sub 3.3
</code></pre>
<p>So, expanding:</p>
<pre><code class="language-lisp">(mapcar (lambda (it)
(format t "~2a ~25a ~2a~%" (first it) (second it) (third it)))
movies)
</code></pre>
<pre><code>1 Matrix 5
10 Matrix Trilogy swe sub 3.3
</code></pre>
<p>text is justified on the right (this would be with option <code>:</code>).</p>
<h4 id="justifying-on-the-left-">Justifying on the left: @</h4>
<p>Use a <code>@</code> as in <code>~2@A</code>:</p>
<pre><code class="language-lisp">(format nil "~20@a" "yo")
;; " yo"
</code></pre>
<pre><code class="language-lisp">(mapcar (lambda (it)
(format nil "~2@a ~25@a ~2a~%" (first it) (second it) (third it)))
movies)
</code></pre>
<pre><code> 1 Matrix 5
10 Matrix Trilogy swe sub 3.3
</code></pre>
<h3 id="justifying-decimals">Justifying decimals</h3>
<p>In <code>~,2F</code>, 2 is the number of decimals and F the floats directive:
<code>(format t "~,2F" 20.1)</code> =&gt; “20.10”.</p>
<p>With <code>~2,2f</code>:</p>
<pre><code class="language-lisp">(mapcar (lambda (it)
(format t "~2@a ~25a ~2,2f~%" (first it) (second it) (third it)))
movies)
</code></pre>
<pre><code> 1 Matrix 5.00
10 Matrix Trilogy swe sub 3.30
</code></pre>
<p>And were happy with this result.</p>
<h3 id="iteration">Iteration</h3>
<p>Create a string from a list with iteration construct <code>~{str~}</code>:</p>
<pre><code class="language-lisp">(format nil "~{~A, ~}" '(a b c))
;; "A, B, C, "
</code></pre>
<p>using <code>~^</code> to avoid printing the comma and space after the last element:</p>
<pre><code class="language-lisp">(format nil "~{~A~^, ~}" '(a b c))
;; "A, B, C"
</code></pre>
<p><code>~:{str~}</code> is similar but for a list of sublists:</p>
<pre><code class="language-lisp">(format nil "~:{~S are ~S. ~}" '((pigeons birds) (dogs mammals)))
;; "PIGEONS are BIRDS. DOGS are MAMMALS. "
</code></pre>
<p><code>~@{str~}</code> is similar to <code>~{str~}</code>, but instead of using one argument that is a list, all the remaining arguments are used as the list of arguments for the iteration:</p>
<pre><code class="language-lisp">(format nil "~@{~S are ~S. ~}" 'pigeons 'birds 'dogs 'mammals)
;; "PIGEONS are BIRDS. DOGS are MAMMALS. "
</code></pre>
<h3 id="formatting-a-format-string-v-">Formatting a format string (<code>~v</code>, <code>~?</code>)</h3>
<p>Sometimes you want to justify a string, but the length is a variable
itself. You cant hardcode its value as in <code>(format nil "~30a"
"foo")</code>. Enters the <code>v</code> directive. We can use it in place of the
comma-separated prefix parameters:</p>
<pre><code class="language-lisp">(let ((padding 30))
(format nil "~va" padding "foo"))
;; "foo "
</code></pre>
<p>Other times, you would like to insert a complete format directive
at run time. Enters the <code>?</code> directive.</p>
<pre><code class="language-lisp">(format nil "~?" "~30a" '("foo"))
;; ^ a list
</code></pre>
<p>or, using <code>~@?</code>:</p>
<pre><code class="language-lisp">(format nil "~@?" "~30a" "foo" )
;; ^ not a list
</code></pre>
<p>Of course, it is always possible to format a format string beforehand:</p>
<pre><code class="language-lisp">(let* ((length 30)
(directive (format nil "~~~aa" length)))
(format nil directive "foo"))
</code></pre>
<h3 id="conditional-formatting">Conditional Formatting</h3>
<p>Choose one value out of many options by specifying a number:</p>
<pre><code class="language-lisp">(format nil "~[dog~;cat~;bird~:;default~]" 0)
;; "dog"
(format nil "~[dog~;cat~;bird~:;default~]" 1)
;; "cat"
</code></pre>
<p>If the number is out of range, the default option (after <code>~:;</code>) is returned:</p>
<pre><code class="language-lisp">(format nil "~[dog~;cat~;bird~:;default~]" 9)
;; "default"
</code></pre>
<p>Combine it with <code>~:*</code> to implement irregular plural:</p>
<pre><code class="language-lisp">(format nil "I saw ~r el~:*~[ves~;f~:;ves~]." 0)
;; =&gt; "I saw zero elves."
(format nil "I saw ~r el~:*~[ves~;f~:;ves~]." 1)
;; =&gt; "I saw one elf."
(format nil "I saw ~r el~:*~[ves~;f~:;ves~]." 2)
;; =&gt; "I saw two elves."
</code></pre>
<h2 id="capturing-what-is-is-printed-into-a-stream">Capturing what is is printed into a stream</h2>
<p>Inside <code>(with-output-to-string (mystream) …)</code>, everything that is
printed into the stream <code>mystream</code> is captured and returned as a
string:</p>
<pre><code class="language-lisp">(defun greet (name &amp;key (stream t))
;; by default, print to standard output.
(format stream "hello ~a" name))
(let ((output (with-output-to-string (stream)
(greet "you" :stream stream))))
(format t "Output is: '~a'. It is indeed a ~a, aka a string.~&amp;" output (type-of output)))
;; Output is: 'hello you'. It is indeed a (SIMPLE-ARRAY CHARACTER (9)), aka a string.
;; NIL
</code></pre>
<h2 id="cleaning-up-strings">Cleaning up strings</h2>
<p>The following examples use the
<a href="https://github.com/EuAndreh/cl-slug/">cl-slug</a> library which,
internally, iterates over the characters of the string and uses
<code>ppcre:regex-replace-all</code>.</p>
<pre><code>(ql:quickload "cl-slug")
</code></pre>
<p>Then it can be used with the <code>slug</code> prefix.</p>
<p>Its main function is to transform a string to a slug, suitable for a websites url:</p>
<pre><code class="language-lisp">(slug:slugify "My new cool article, for the blog (V. 2).")
;; "my-new-cool-article-for-the-blog-v-2"
</code></pre>
<h3 id="removing-accentuated-letters">Removing accentuated letters</h3>
<p>Use <code>slug:asciify</code> to replace accentuated letters by their ascii equivalent:</p>
<pre><code class="language-lisp">(slug:asciify "ñ é ß ğ ö")
;; =&gt; "n e ss g o"
</code></pre>
<p>This function supports many (western) languages:</p>
<pre><code>slug:*available-languages*
((:TR . "Türkçe (Turkish)") (:SV . "Svenska (Swedish)") (:FI . "Suomi (Finnish)")
(:UK . "українська (Ukrainian)") (:RU . "Ру́сский (Russian)") (:RO . "Română (Romanian)")
(:RM . "Rumàntsch (Romansh)") (:PT . "Português (Portuguese)") (:PL . "Polski (Polish)")
(:NO . "Norsk (Norwegian)") (:LT . "Lietuvių (Lithuanian)") (:LV . "Latviešu (Latvian)")
(:LA . "Lingua Latīna (Latin)") (:IT . "Italiano (Italian)") (:EL . "ελληνικά (Greek)")
(:FR . "Français (French)") (:EO . "Esperanto") (:ES . "Español (Spanish)") (:EN . "English")
(:DE . "Deutsch (German)") (:DA . "Dansk (Danish)") (:CS . "Čeština (Czech)")
(:CURRENCY . "Currency"))
</code></pre>
<h3 id="removing-punctuation">Removing punctuation</h3>
<p>Use <code>(str:remove-punctuation s)</code> or <code>(str:no-case s)</code> (same as
<code>(cl-change-case:no-case s)</code>):</p>
<pre><code class="language-lisp">(str:remove-punctuation "HEY! What's up ??")
;; "HEY What s up"
(str:no-case "HEY! What's up ??")
;; "hey what s up"
</code></pre>
<p>They strip the punctuation with one ppcre unicode regexp
(<code>(ppcre:regex-replace-all "[^\\p{L}\\p{N}]+"</code> where <code>p{L}</code> is the
“letter” category and <code>p{N}</code> any kind of numeric character).</p>
<h2 id="appendix">Appendix</h2>
<h3 id="all-format-directives">All format directives</h3>
<p>All directives are case-insensivite: <code>~A</code> is the same as <code>~a</code>.</p>
<pre><code>$ - Monetary Floating-Point
% - Newline
&amp; - Fresh-line
( - Case Conversion
) - End of Case Conversion
* - Go-To
/ - Call Function
; - Clause Separator
&lt; - Justification
&lt; - Logical Block
&gt; - End of Justification
? - Recursive Processing
A - Aesthetic
B - Binary
C - Character
D - Decimal
E - Exponential Floating-Point
F - Fixed-Format Floating-Point
G - General Floating-Point
I - Indent
Missing and Additional FORMAT Arguments
Nesting of FORMAT Operations
Newline: Ignored Newline
O - Octal
P - Plural
R - Radix
S - Standard
T - Tabulate
W - Write
X - Hexadecimal
[ - Conditional Expression
] - End of Conditional Expression
^ - Escape Upward
_ - Conditional Newline
{ - Iteration
| - Page
} - End of Iteration
~ - Tilde
</code></pre>
<h2 id="see-also">See also</h2>
<ul>
<li><a href="https://gist.github.com/WetHat/a49e6f2140b401a190d45d31e052af8f">Pretty printing table data</a>, in ASCII art, a tutorial as a Jupyter notebook.</li>
</ul>
<p class="page-source">
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/strings.md">strings.md</a>
</p>
</div>
<script type="text/javascript">
// Don't write the TOC on the index.
if (window.location.pathname != "/cl-cookbook/") {
$("#toc").toc({
content: "#content", // will ignore the first h1 with the site+page title.
headings: "h1,h2,h3,h4"});
}
$("#two-cols + ul").css({
"column-count": "2",
});
$("#contributors + ul").css({
"column-count": "4",
});
</script>
<div>
<footer class="footer">
<hr/>
&copy; 2002&ndash;2023 the Common Lisp Cookbook Project
<div>
📹 Discover <a style="color: darkgrey; text-decoration: underline", href="https://www.udemy.com/course/common-lisp-programming/?referralCode=2F3D698BBC4326F94358">our contributor's Common Lisp video course on Udemy</a>
</div>
</footer>
</div>
<div id="toc-btn">T<br>O<br>C</div>
</div>
<script text="javascript">
HighlightLisp.highlight_auto({className: null});
</script>
<script type="text/javascript">
function duckSearch() {
var searchField = document.getElementById("searchField");
if (searchField && searchField.value) {
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
// https://duckduckgo.com/params
// kj=b2: blue header in results page
// kf=-1: no favicons
}
}
</script>
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
</body>
</html>