emacs.d/clones/lisp/clojure-doc.org/articles/cookbooks/strings/index.html

464 lines
24 KiB
HTML

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta charset="utf-8"/>
<title>Clojure Guides: Strings</title>
<meta name="description" content="This cookbook covers working with strings in Clojure using built-in
functions, standard and contrib libraries, and parts of the JDK via
interoperability.This work is licensed under a Creative Commons
Attribution 3.0 Unported License (including images &amp;
stylesheets). The source is available on
Github.">
<meta property="og:description" content="This cookbook covers working with strings in Clojure using built-in
functions, standard and contrib libraries, and parts of the JDK via
interoperability.This work is licensed under a Creative Commons
Attribution 3.0 Unported License (including images &amp;
stylesheets). The source is available on
Github.">
<meta property="og:url" content="https://clojure-doc.github.io/articles/cookbooks/strings/" />
<meta property="og:title" content="Strings" />
<meta property="og:type" content="article" />
<link rel="canonical" href="https://clojure-doc.github.io/articles/cookbooks/strings/">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Alegreya:400italic,700italic,400,700" rel="stylesheet"
type="text/css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.0/css/bootstrap.min.css">
<link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.7.0/styles/default.min.css">
<link href="../../../css/screen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<nav class="navbar navbar-default">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="../../../index.html">Clojure Guides</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav navbar-right">
<li ><a href="../../../index.html">Home</a></li>
<li><a href="https://github.com/clojure-doc/clojure-doc.github.io">Contribute</a></li>
</ul>
</div><!--/.nav-collapse -->
</div><!--/.container-fluid -->
</nav>
<div class="container">
<div class="row">
<div class="col-lg-9">
<div id="content">
<div id="custom-page">
<div id="page-header">
<h2>Strings</h2>
</div>
<p>This cookbook covers working with strings in Clojure using built-in
functions, standard and contrib libraries, and parts of the JDK via
interoperability.</p><p>This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons
Attribution 3.0 Unported License</a> (including images &amp;
stylesheets). The source is available <a href="https://github.com/clojure-doc/clojure-doc.github.io">on
Github</a>.</p><h2 id="overview">Overview</h2><ul><li>Strings are <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html">plain Java
strings</a>.
You can use anything which operates on them.</li><li>Java strings are immutable, so they're convenient to use in Clojure.</li><li>You can't add metadata to Java strings.</li><li>Clojure supports some convenient notations:</li></ul><pre><code> "foo" java.lang.String
#"\d" java.util.regex.Pattern (in this case, one which matches a single digit)
\f java.lang.Character (in this case, the letter 'f')
</code></pre><ul><li><strong>Caveat:</strong> Human brains and electronic computers are rather different
devices. So Java strings (sequences of <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#unicode">UTF-16
characters</a>)
don't always map nicely to user-perceived characters. For example, a
single Unicode "code point" doesn't necessarily equal a user-perceived
character. (Like Korean Hangul Jamo, where user-perceived characters
are composed from two or three Unicode code points.) Also, a Unicode
code point may sometimes require 2 UTF-16 characters to encode it.</li></ul><h2 id="preliminaries">Preliminaries</h2><p>Some examples use
<a href="http://clojure.github.io/clojure/clojure.string-api.html">clojure.string</a>,
<a href="https://github.com/edn-format/edn">clojure.edn</a> and
<a href="http://clojure.github.io/clojure/clojure.pprint-api.html">clojure.pprint</a>. We'll
assume your <code>ns</code> macro contains:</p><pre><code class="clojure">(:require [clojure.string :as str]
[clojure.edn :as edn]
[clojure.pprint :as pp])
</code></pre><p>or else in the repl you've loaded it:</p><pre><code class="clojure">(require '[clojure.string :as str])
(require '[clojure.edn :as edn])
(require '[clojure.pprint :as pp])
</code></pre><h2 id="recipes">Recipes</h2><h3 id="basics">Basics</h3><pre><code class="clojure">;; Size measurements
(count "0123") ;=&gt; 4
(empty? "0123") ;=&gt; false
(empty? "") ;=&gt; true
(str/blank? " ") ;=&gt; true
;; Concatenate
(str "foo" "bar") ;=&gt; "foobar"
(str/join ["0" "1" "2"]) ;=&gt; "012"
(str/join "." ["0" "1" "2"]) ;=&gt; "0.1.2"
;; Matching using plain Java methods.
;;
;; You might prefer regexes for these. For instance, failure returns
;; -1, which you have to test for. And characters like \o are
;; instances of java.lang.Character, which you may have to convert to
;; int or String.
(.indexOf "foo" "oo") ;=&gt; 1
(.indexOf "foo" "x") ;=&gt; -1
(.lastIndexOf "foo" (int \o)) ;=&gt; 2
;; Substring
(subs "0123" 1) ;=&gt; "123"
(subs "0123" 1 3) ;=&gt; "12"
(str/trim " foo ") ;=&gt; "foo"
(str/triml " foo ") ;=&gt; "foo "
(str/trimr " foo ") ;=&gt; " foo"
;; Multiple substrings
(seq "foo") ;=&gt; (\f \o \o)
(str/split "foo/bar/quux" #"/") ;=&gt; ["foo" "bar" "quux"]
(str/split "foo/bar/quux" #"/" 2) ;=&gt; ["foo" "bar/quux"]
(str/split-lines "foo
bar") ;=&gt; ["foo" "bar"]
;; Case
(str/lower-case "fOo") ;=&gt; "foo"
(str/upper-case "fOo") ;=&gt; "FOO"
(str/capitalize "fOo") ;=&gt; "Foo"
;; Escaping
(str/escape "foo|bar|quux" {\| "||"}) ;=&gt; "foo||bar||quux"
;; Get byte array of given encoding.
;; (The output will likely have a different number than "3c3660".)
(.getBytes "foo" "UTF-8") ;=&gt; #&lt;byte[] [B@3c3660&gt;
;; Parsing keywords
(keyword "foo") ;=&gt; :foo
;; Parsing numbers
(bigint "20000000000000000000000000000") ;=&gt; 20000000000000000000000000000N
(bigdec "20000000000000000000.00000000") ;=&gt; 20000000000000000000.00000000M
(Integer/parseInt "2") ;=&gt; 2
(Float/parseFloat "2") ;=&gt; 2.0
;; Parsing edn, a subset of Clojure forms.
(edn/read-string "0xffff") ;=&gt; 65535
;; The sledgehammer approach to reading Clojure forms.
;;
;; SECURITY WARNING: Ensure *read-eval* is false when dealing with
;; strings you don't 100% trust. Even though *read-eval* is false by
;; default since Clojure 1.5, be paranoid and set it to false right
;; before you use it, because anything could've re-bound it to true.
(binding [*read-eval* false]
(read-string "#\"[abc]\""))
;=&gt; #"[abc]"
</code></pre><h3 id="parsing-complex-strings">Parsing complex strings</h3><h4 id="regexes">Regexes</h4><p>Regexes offer a boost in string-matching power. You can express ideas
like repetition, alternatives, etc.</p><p><a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Regex
reference.</a></p><p><strong>Groups:</strong> Regex groups are useful, when we want to match more than
one substring. (Or refer to matches later.) In the regex <code>#"(group-1) (group-2)"</code>, the 0th group is the whole match. The 1st group is
started by the left-most <code>(</code>, the 2nd group is started by the
second-left-most <code>(</code>, etc. You can even nest groups. You can refer to
groups later using <code>$0</code>, <code>$1</code>, etc.</p><p><strong>Matching</strong></p><pre><code class="clojure">;; Simple matching
(re-find #"\d+" "foo 123 bar") ;=&gt; "123"
;; What happens when a match fails.
(re-find #"\d+" "foobar") ;=&gt; nil
;; Return only the first groups which satisfy match.
(re-matches #"(@\w+)\s([.0-9]+)%"
"@shanley 19.8%")
;=&gt;["@shanley 19.8%" "@shanley" "19.8"]
;; Return seq of all matching groups which occur in string.
(re-seq #"(@\w+)\s([.0-9]+)%"
"@davidgraeber 12.3%,@shanley 19.8%")
;=&gt; (["@davidgraeber 12.3%" "@davidgraeber" "12.3"]
; ["@shanley 19.8%" "@shanley" "19.8"])
</code></pre><p><strong>Replacing</strong></p><p>We use <code>str/replace</code>. Aside from the first arg (the initial string),
the next two args are match and replacement:</p><pre><code> match / replacement can be:
string / string
char / char
pattern / (string or function of match).
</code></pre><pre><code class="clojure">;; In the replacement string, $0, $1, etc refer to matched groups.
(str/replace "@davidgraeber 12.3%,@shanley 19.8%"
#"(@\S+)\s([.0-9]+)%"
"$2 ($1)")
;=&gt; "12.3 (@davidgraeber),19.8 (@shanley)"
;; Using a function to replace text gives us power.
(println
(str/replace "@davidgraeber 12.3%,@shanley 19.8%"
#"(@\w+)\s([.0-9]+)%,?"
(fn [[_ person percent]]
(let [points (-&gt; percent Float/parseFloat (* 100) Math/round)]
(str person "'s followers grew " points " points.\n")))))
;print=&gt; @davidgraeber's followers grew 1230 points.
;print=&gt; @shanley's followers grew 1980 points.
;print=&gt;
</code></pre><h4 id="context-free-grammars">Context-free grammars</h4><p>Context-free grammars offer yet another boost in expressive matching
power, compared to regexes. You can express ideas like nesting.</p><p>We'll use <a href="https://github.com/Engelberg/instaparse">Instaparse</a> on
<a href="http://www.json.org/">JSON's grammar</a>. (This example isn't seriously
tested nor a featureful parser. Use
<a href="https://github.com/clojure/data.json">data.json</a> instead.)</p><pre><code class="clojure">;; Your project.clj should contain this (you may need to restart your JVM):
;; :dependencies [[instaparse "1.2.4"]]
;;
;; We'll assume your ns macro contains:
;; (:require [instaparse.core :as insta])
;; or else in the repl you've loaded it:
;; (require '[instaparse.core :as insta])
(def barely-tested-json-parser
(insta/parser
"object = &lt;'{'&gt; &lt;w*&gt; (members &lt;w*&gt;)* &lt;'}'&gt;
&lt;members&gt; = pair (&lt;w*&gt; &lt;','&gt; &lt;w*&gt; members)*
&lt;pair&gt; = string &lt;w*&gt; &lt;':'&gt; &lt;w*&gt; value
&lt;value&gt; = string | number | object | array | 'true' | 'false' | 'null'
array = &lt;'['&gt; elements* &lt;']'&gt;
&lt;elements&gt; = value &lt;w*&gt; (&lt;','&gt; &lt;w*&gt; elements)*
number = int frac? exp?
&lt;int&gt; = '-'? digits
&lt;frac&gt; = '.' digits
&lt;exp&gt; = e digits
&lt;e&gt; = ('e' | 'E') (&lt;'+'&gt; | '-')?
&lt;digits&gt; = #'[0-9]+'
(* First sketched state machine; then it was easier to figure out
regex syntax and all the maddening escape-backslashes. *)
string = &lt;'\\\"'&gt; #'([^\"\\\\]|\\\\.)*' &lt;'\\\"'&gt;
&lt;w&gt; = #'\\s+'"))
(barely-tested-json-parser "{\"foo\": {\"bar\": 99.9e-9, \"quux\": [1, 2, -3]}}")
;=&gt; [:object
; [:string "foo"]
; [:object
; [:string "bar"]
; [:number "99" "." "9" "e" "-" "9"]
; [:string "quux"]
; [:array [:number "1"] [:number "2"] [:number "-" "3"]]]]
;; That last output is a bit verbose. Let's process it further.
(-&gt;&gt; (barely-tested-json-parser "{\"foo\": {\"bar\": 99.9e-9, \"quux\": [1, 2, -3]}}")
(insta/transform {:object hash-map
:string str
:array vector
:number (comp edn/read-string str)}))
;=&gt; {"foo" {"quux" [1 2 -3], "bar" 9.99E-8}}
;; Now we can appreciate what those &lt;angle-brackets&gt; were all about.
;;
;; When to the right of the grammar's =, it totally hides the enclosed
;; thing in the output. For example, we don't care about whitespace,
;; so we hide it with &lt;w*&gt;.
;;
;; When to the left of the grammar's =, it merely prevents a level of
;; nesting in the output. For example, "members" is a rather
;; artificial entity, so we prevent a pointless level of nesting with
;; &lt;members&gt;.
</code></pre><h3 id="building-complex-strings">Building complex strings</h3><h4 id="redirecting-streams">Redirecting streams</h4><p><code>with-out-str</code> provides a simple way to build strings. It redirects
standard output (<code>*out*</code>) to a fresh <code>StringWriter</code>, then returns the
resulting string. So you can use functions like <code>print</code>, <em>even in
nested functions</em>, and get the resulting string at the end.</p><pre><code class="clojure">(let [shrimp-varieties ["shrimp-kabobs" "shrimp creole" "shrimp gumbo"]]
(with-out-str
(print "We have ")
(doseq [name (str/join ", " shrimp-varieties)]
(print name))
(print "...")))
;=&gt; "We have shrimp-kabobs, shrimp creole, shrimp gumbo..."
</code></pre><h4 id="format-strings">Format strings</h4><p>Java's templating mini-language helps you build many strings
conveniently. <a href="http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html">Reference.</a></p><pre><code class="clojure">;; %s is most commonly used to print args. Escape %'s with %%.
(format "%s enjoyed %s%%." "Mozambique" 19.8) ;=&gt; "Mozambique enjoyed 19.8%."
;; The 1$ prefix allows you to keep referring to the first arg.
(format "%1$tY-%1$tm-%1$td" #inst"2000-01-02T00:00:00") ;=&gt; "2000-01-02"
;; Again, 1$, 2$, etc prefixes let us refer to args in arbitrary orders.
(format "New year: %2$tY. Old year: %1$tY"
#inst"2000-01-02T00:00:00"
#inst"3111-12-31T00:00:00")
;=&gt; "New year: 3111. Old year: 2000"
</code></pre><h4 id="cl-format">CL-Format</h4><p><code>cl-format</code> is a port of Common Lisp's notorious, powerful string
formatting mini-language. For example, you can build strings from
sequences. (As well as oddities like print numbers in English or two
varieties of Roman numerals.) However, it's weaker than plain <code>format</code>
with printing dates and referring to args in arbitrary order.</p><p>Remember that <code>cl-format</code> represents a (potentially unreadable)
language which your audience didn't sign up to learn. If you're the
sort of person who likes it, try to only use it in sweetspots where it
provides clarity for little complexity.</p><p><a href="http://www.gigamonkeys.com/book/a-few-format-recipes.html">Tutorial</a>
in Practical Common
Lisp. <a href="http://www.lispworks.com/documentation/HyperSpec/Body/22_c.htm">Reference</a>
in Common Lisp's Hyperspec.</p><pre><code class="clojure">;; The first param prints to *out* if true. To string if false.
;; To a stream if it's a stream.
(pp/cl-format true "~{~{~a had ~s percentage point~:p.~}~^~%~}"
{"@davidgraeber" 12.3
"@shanley" 19.8
"@tjgabbour" 1})
;print=&gt; @davidgraeber had 12.3 percentage points.
;print=&gt; @tjgabbour had 1 percentage point.
;print=&gt; @shanley had 19.8 percentage points.
(def format-string "~{~#[~;~a~;~a and ~a~:;~@{~a~#[~;, and ~:;, ~]~}~]~}")
(pp/cl-format nil format-string [])
;=&gt; ""
(pp/cl-format nil format-string ["@shanley"])
;=&gt; "@shanley"
(pp/cl-format nil format-string ["@shanley", "@davidgraeber"])
;=&gt; "@shanley and @davidgraeber"
(pp/cl-format nil format-string ["@shanley", "@davidgraeber", "@sarahkendzior"])
;=&gt; "@shanley, @davidgraeber, and @sarahkendzior"
</code></pre><h2 id="contributors">Contributors</h2><p>Tj Gabbour <a href="mailto:tjg@simplevalue.de">tjg@simplevalue.de</a>, 2013 (original author)</p>
<div id="prev-next">
<a href="../data_structures/index.html">&laquo; Data Structures (Help wanted)</a>
||
<a href="../math/index.html">Mathematics with Clojure &raquo;</a>
</div>
</div>
</div>
</div>
<div class="col-md-3">
<div id="sidebar">
<h3>Links</h3>
<ul id="links">
<li><a href="../../about/index.html">About</a></li>
<li><a href="../../content/index.html">Table of Contents</a></li>
<li><a href="../../tutorials/getting_started/index.html">Getting Started with Clojure</a></li>
<li><a href="../../tutorials/introduction/index.html">Introduction to Clojure</a></li>
<li><a href="../../tutorials/emacs/index.html">Clojure with Emacs</a></li>
<li><a href="../../tutorials/vim_fireplace/index.html">Clojure with Vim and fireplace.vim</a></li>
<li><a href="../../tutorials/eclipse/index.html">Starting with Eclipse and Counterclockwise For Clojure Development</a></li>
<li><a href="../../tutorials/basic_web_development/index.html">Basic Web Development</a></li>
<li><a href="../../tutorials/parsing_xml_with_zippers/index.html">Parsing XML in Clojure</a></li>
<li><a href="../../tutorials/growing_a_dsl_with_clojure/index.html">Growing a DSL with Clojure</a></li>
<li><a href="../../language/core_overview/index.html">Overview of clojure.core, the standard Clojure library</a></li>
<li><a href="../../language/namespaces/index.html">Clojure Namespaces and Vars</a></li>
<li><a href="../../language/collections_and_sequences/index.html">Collections and Sequences in Clojure</a></li>
<li><a href="../../language/functions/index.html">Functions in Clojure</a></li>
<li><a href="../../language/laziness/index.html">Laziness in Clojure</a></li>
<li><a href="../../language/interop/index.html">Clojure interoperability with Java</a></li>
<li><a href="../../language/macros/index.html">Clojure Macros and Metaprogramming</a></li>
<li><a href="../../language/polymorphism/index.html">Polymorphism in Clojure: Protocols and Multimethods</a></li>
<li><a href="../../language/concurrency_and_parallelism/index.html">Concurrency and Parallelism in Clojure</a></li>
<li><a href="../../language/glossary/index.html">Clojure Terminology Guide</a></li>
<li><a href="../../ecosystem/libraries_directory/index.html">A Directory of Clojure Libraries</a></li>
<li><a href="../../ecosystem/libraries_authoring/index.html">Library Development and Distribution</a></li>
<li><a href="../../ecosystem/generating_documentation/index.html">Generating Documentation</a></li>
<li><a href="../../ecosystem/data_processing/index.html">Data Processing (Help Wanted)</a></li>
<li><a href="../../ecosystem/web_development/index.html">Web Development (Overview)</a></li>
<li><a href="../../ecosystem/maven/index.html">How to use Maven to build Clojure projects</a></li>
<li><a href="../../ecosystem/community/index.html">Clojure Community</a></li>
<li><a href="../../ecosystem/user_groups/index.html">Clojure User Groups</a></li>
<li><a href="../../ecosystem/running_cljug/index.html">Running a Clojure User Group</a></li>
<li><a href="../../ecosystem/books/index.html">Books about Clojure and ClojureScript</a></li>
<li><a href="../data_structures/index.html">Data Structures (Help wanted)</a></li>
<li><a href="index.html">Strings</a></li>
<li><a href="../math/index.html">Mathematics with Clojure</a></li>
<li><a href="../date_and_time/index.html">Date and Time (Help wanted)</a></li>
<li><a href="../files_and_directories/index.html">Working with Files and Directories in Clojure</a></li>
<li><a href="../middleware/index.html">Middleware in Clojure</a></li>
<li><a href="../../ecosystem/java_jdbc/home.html">java.jdbc - Getting Started</a></li>
<li><a href="../../ecosystem/java_jdbc/using_sql.html">java.jdbc - Manipulating data with SQL</a></li>
<li><a href="../../ecosystem/java_jdbc/using_ddl.html">java.jdbc - Using DDL and Metadata</a></li>
<li><a href="../../ecosystem/java_jdbc/reusing_connections.html">java.jdbc - How to reuse database connections</a></li>
<li><a href="../../ecosystem/core_typed/home/index.html">core.typed - User Documentation Home</a></li>
<li><a href="../../ecosystem/core_typed/user_documentation/index.html">core.typed - User Documentation</a></li>
<li><a href="../../ecosystem/core_typed/rationale/index.html">core.typed - Rationale</a></li>
<li><a href="../../ecosystem/core_typed/quick_guide.html">core.typed - Quick Guide</a></li>
<li><a href="../../ecosystem/core_typed/start/introduction_and_motivation/index.html">core.typed - Getting Started: Introduction and Motivation</a></li>
<li><a href="../../ecosystem/core_typed/types/index.html">core.typed - Types</a></li>
<li><a href="../../ecosystem/core_typed/start/annotations/index.html">core.typed - Annotations</a></li>
<li><a href="../../ecosystem/core_typed/poly_fn/index.html">core.typed - Polymorphic Functions</a></li>
<li><a href="../../ecosystem/core_typed/filters/index.html">core.typed - Filters</a></li>
<li><a href="../../ecosystem/core_typed/mm_protocol_datatypes/index.html">core.typed - Protocols</a></li>
<li><a href="../../ecosystem/core_typed/loops/index.html">core.typed - Looping constructs</a></li>
<li><a href="../../ecosystem/core_typed/function_types/index.html">core.typed - Functions</a></li>
<li><a href="../../ecosystem/core_typed/limitations/index.html">core.typed - Limitations</a></li>
</ul>
</div>
</div>
</div>
<footer>Copyright &copy; 2021 Multiple Authors
<p style="text-align: center;">Powered by <a href="http://cryogenweb.org">Cryogen</a></p></footer>
</div>
<script src="https://code.jquery.com/jquery-1.11.0.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.0/js/bootstrap.min.js"></script>
<script src="../../../js/highlight.pack.js" type="application/javascript"></script>
<script>hljs.initHighlightingOnLoad();</script>
</body>
</html>