285 lines
11 KiB
HTML
285 lines
11 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta name="generator" content=
|
||
"HTML Tidy for HTML5 for Linux version 5.2.0">
|
||
<title>Regular Expressions</title>
|
||
<meta charset="utf-8">
|
||
<meta name="description" content="A collection of examples of using Common Lisp">
|
||
<meta name="viewport" content=
|
||
"width=device-width, initial-scale=1">
|
||
<link rel="icon" href=
|
||
"assets/cl-logo-blue.png"/>
|
||
<link rel="stylesheet" href=
|
||
"assets/style.css">
|
||
<script type="text/javascript" src=
|
||
"assets/highlight-lisp.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/jquery-3.2.1.min.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/jquery.toc/jquery.toc.min.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/toggle-toc.js">
|
||
</script>
|
||
|
||
<link rel="stylesheet" href=
|
||
"assets/github.css">
|
||
|
||
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
|
||
</head>
|
||
<body>
|
||
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> – Regular Expressions</h1>
|
||
<div id="logo-container">
|
||
<a href="index.html">
|
||
<img id="logo" src="assets/cl-logo-blue.png"/>
|
||
</a>
|
||
|
||
<div id="searchform-container">
|
||
<form onsubmit="duckSearch()" action="javascript:void(0)">
|
||
<input id="searchField" type="text" value="" placeholder="Search...">
|
||
</form>
|
||
</div>
|
||
|
||
<div id="toc-container" class="toc-close">
|
||
<div id="toc-title">Table of Contents</div>
|
||
<ul id="toc" class="list-unstyled"></ul>
|
||
</div>
|
||
</div>
|
||
|
||
<div id="content-container">
|
||
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> – Regular Expressions</h1>
|
||
|
||
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
|
||
<!-- <p class="announce"> -->
|
||
<!-- 📢 🤶 ⭐ -->
|
||
<!-- <a style="font-size: 120%" href="https://www.udemy.com/course/common-lisp-programming/?couponCode=LISPY-XMAS2023" title="This course is under a paywall on the Udemy platform. Several videos are freely available so you can judge before diving in. vindarel is (I am) the main contributor to this Cookbook."> Discover our contributor's Lisp course with this Christmas coupon.</a> -->
|
||
<!-- <strong> -->
|
||
<!-- Recently added: 18 videos on MACROS. -->
|
||
<!-- </strong> -->
|
||
<!-- <a style="font-size: 90%" href="https://github.com/vindarel/common-lisp-course-in-videos/">Learn more</a>. -->
|
||
<!-- </p> -->
|
||
|
||
<p class="announce">
|
||
📢 New videos: <a href="https://www.youtube.com/watch?v=h_noB1sI_e8">web dev demo part 1</a>, <a href="https://www.youtube.com/watch?v=xnwc7irnc8k">dynamic page with HTMX</a>, <a href="https://www.youtube.com/watch?v=Zpn86AQRVN8">Weblocks demo</a>
|
||
</p>
|
||
|
||
<p class="announce-neutral">
|
||
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
|
||
</p>
|
||
|
||
|
||
<div id="content"
|
||
<p>The <a href="http://www.lispworks.com/documentation/HyperSpec/index.html">ANSI Common Lisp
|
||
standard</a>
|
||
does not include facilities for regular expressions, but a couple of
|
||
libraries exist for this task, for instance:
|
||
<a href="https://github.com/edicl/cl-ppcre">cl-ppcre</a>.</p>
|
||
|
||
<p>See also the respective <a href="http://www.cliki.net/Regular%20Expression">Cliki:
|
||
regexp</a> page for more
|
||
links.</p>
|
||
|
||
<p>Note that some CL implementations include regexp facilities, notably
|
||
<a href="http://clisp.sourceforge.net/impnotes.html#regexp">CLISP</a> and
|
||
<a href="https://franz.com/support/documentation/current/doc/regexp.htm">ALLEGRO
|
||
CL</a>. If
|
||
in doubt, check your manual or ask your vendor.</p>
|
||
|
||
<p>The description provided below is far from complete, so don’t forget
|
||
to check the reference manual that comes along with the CL-PPCRE
|
||
library.</p>
|
||
|
||
<h2 id="ppcre">PPCRE</h2>
|
||
|
||
<p><a href="https://github.com/edicl/cl-ppcre">CL-PPCRE</a> (abbreviation for
|
||
Portable Perl-compatible regular expressions) is a portable regular
|
||
expression library for Common Lisp with a broad set of features and
|
||
good performance. It has been ported to a number of Common Lisp
|
||
implementations and can be easily installed (or added as a dependency)
|
||
via Quicklisp:</p>
|
||
|
||
<pre><code class="language-lisp">(ql:quickload "cl-ppcre")
|
||
</code></pre>
|
||
|
||
<p>Basic operations with the CL-PPCRE library functions are described
|
||
below.</p>
|
||
|
||
<h3 id="looking-for-matching-patterns-scan-create-scanner">Looking for matching patterns: scan, create-scanner</h3>
|
||
|
||
<p>The <code>scan</code> function tries to match the given pattern and on success
|
||
returns four multiple-values values - the start of the match, the end
|
||
of the match, and two arrays denoting the beginnings and ends of
|
||
register matches. On failure returns <code>NIL</code>.</p>
|
||
|
||
<p>A regular expression pattern can be compiled with the <code>create-scanner</code>
|
||
function call. A “scanner” will be created that can be used by other
|
||
functions.</p>
|
||
|
||
<p>For example:</p>
|
||
|
||
<pre><code class="language-lisp">(let ((ptrn (ppcre:create-scanner "(a)*b")))
|
||
(ppcre:scan ptrn "xaaabd"))
|
||
</code></pre>
|
||
|
||
<p>will yield the same results as:</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:scan "(a)*b" "xaaabd")
|
||
</code></pre>
|
||
|
||
<p>but will require less time for repeated <code>scan</code> calls as parsing the
|
||
expression and compiling it is done only once.</p>
|
||
|
||
<h3 id="extracting-information">Extracting information</h3>
|
||
|
||
<p>CL-PPCRE provides several ways to extract matching fragments.</p>
|
||
|
||
<h4 id="all-matches-all-matches-as-strings">all-matches, all-matches-as-strings</h4>
|
||
|
||
<p>The function <code>all-matches-as-strings</code> is very handy: it returns a list of matches:</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
|
||
;; => ("1" "10" "42")
|
||
</code></pre>
|
||
|
||
<p>The function <code>all-matches</code> is similar, but it returns a list of positions:</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:all-matches "\\d+" "numbers: 1 10 42")
|
||
;; => (9 10 11 13 14 16)
|
||
</code></pre>
|
||
|
||
<p>Look carefully: it actually return a list containing the start and end
|
||
positions of all matches: 9 and 10 are the start and end for the first
|
||
number (1), and so on.</p>
|
||
|
||
<p>If you wanted to extract integers from this example string, simply map
|
||
<code>parse-integer</code> to the result:</p>
|
||
|
||
<pre><code class="language-lisp">CL-USER> (ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
|
||
;; ("1" "10" "42")
|
||
CL-USER> (mapcar #'parse-integer *)
|
||
(1 10 42)
|
||
</code></pre>
|
||
|
||
<p>The two functions accept the usual <code>:start</code> and <code>:end</code> key arguments. Additionnaly, <code>all-matches-as-strings</code> accepts a <code>:sharedp</code> argument:</p>
|
||
|
||
<blockquote>
|
||
<p>If SHAREDP is true, the substrings may share structure with TARGET-STRING.</p>
|
||
</blockquote>
|
||
|
||
<h4 id="count-matches-new-in-212-april-2024">count-matches (new in 2.1.2, April 2024)</h4>
|
||
|
||
<p><code>(count-matches regex target-string)</code> returns a count of all matches of <code>regex</code> against <code>target-string</code>:</p>
|
||
|
||
<pre><code class="language-lisp">CL-USER> (ppcre:count-matches "a" "foo bar baz")
|
||
2
|
||
|
||
CL-USER> (ppcre:count-matches "\\w*" "foo bar baz")
|
||
6
|
||
</code></pre>
|
||
|
||
<h4 id="scan-to-strings-register-groups-bind">scan-to-strings, register-groups-bind</h4>
|
||
|
||
<p>The <code>scan-to-strings</code> function is similar to <code>scan</code> but returns
|
||
substrings of target-string instead of positions. This function
|
||
returns two values on success: the whole match as a string plus an
|
||
array of substrings (or NILs) corresponding to the matched registers.</p>
|
||
|
||
<p>The <code>register-groups-bind</code> function tries to match the given pattern
|
||
against the target string and binds matching fragments with the given
|
||
variables.</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:register-groups-bind (first second third fourth)
|
||
("((a)|(b)|(c))+" "abababc" :sharedp t)
|
||
(list first second third fourth))
|
||
;; => ("c" "a" "b" "c")
|
||
</code></pre>
|
||
|
||
<p>CL-PPCRE also provides a shortcut for calling a function before
|
||
assigning the matching fragment to the variable:</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:register-groups-bind
|
||
(fname lname (#'parse-integer date month year))
|
||
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})"
|
||
"Frank Zappa 21.12.1940")
|
||
(list fname lname date month year))
|
||
;; => ("Frank" "Zappa" 21 12 1940)
|
||
</code></pre>
|
||
|
||
<h3 id="replacing-text-regex-replace-regex-replace-all">Replacing text: regex-replace, regex-replace-all</h3>
|
||
|
||
<pre><code class="language-lisp">(ppcre:regex-replace "a" "abc" "A") ;; => "Abc"
|
||
;; or
|
||
(let ((pat (ppcre:create-scanner "a")))
|
||
(ppcre:regex-replace pat "abc" "A"))
|
||
</code></pre>
|
||
|
||
<h2 id="see-more">See more</h2>
|
||
|
||
<ul>
|
||
<li><a href="https://common-lisp-libraries.readthedocs.io/cl-ppcre/">cl-ppcre on common-lisp-libraries.readthedocs.io</a> and read on: <code>do-matches</code>, <code>do-matches-as-strings</code>,
|
||
<code>do-register-groups</code>, <code>do-scans</code>, <code>parse-string</code>, <code>regex-apropos</code>,
|
||
<code>quote-meta-chars</code>, <code>split</code>…</li>
|
||
</ul>
|
||
|
||
|
||
<p class="page-source">
|
||
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/regexp.md">regexp.md</a>
|
||
</p>
|
||
</div>
|
||
|
||
<script type="text/javascript">
|
||
|
||
// Don't write the TOC on the index.
|
||
if (window.location.pathname != "/cl-cookbook/") {
|
||
$("#toc").toc({
|
||
content: "#content", // will ignore the first h1 with the site+page title.
|
||
headings: "h1,h2,h3,h4"});
|
||
}
|
||
|
||
$("#two-cols + ul").css({
|
||
"column-count": "2",
|
||
});
|
||
$("#contributors + ul").css({
|
||
"column-count": "4",
|
||
});
|
||
</script>
|
||
|
||
|
||
|
||
<div>
|
||
<footer class="footer">
|
||
<hr/>
|
||
© 2002–2023 the Common Lisp Cookbook Project
|
||
<div>
|
||
📹 Discover <a style="color: darkgrey; text-decoration: underline", href="https://www.udemy.com/course/common-lisp-programming/?referralCode=2F3D698BBC4326F94358">our contributor's Common Lisp video course on Udemy</a>
|
||
</div>
|
||
</footer>
|
||
|
||
</div>
|
||
<div id="toc-btn">T<br>O<br>C</div>
|
||
</div>
|
||
|
||
<script text="javascript">
|
||
HighlightLisp.highlight_auto({className: null});
|
||
</script>
|
||
|
||
<script type="text/javascript">
|
||
function duckSearch() {
|
||
var searchField = document.getElementById("searchField");
|
||
if (searchField && searchField.value) {
|
||
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
|
||
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
|
||
// https://duckduckgo.com/params
|
||
// kj=b2: blue header in results page
|
||
// kf=-1: no favicons
|
||
}
|
||
}
|
||
</script>
|
||
|
||
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
|
||
|
||
</body>
|
||
</html>
|