1
0
Fork 0
cl-sites/lispcookbook.github.io/cl-cookbook/regexp.html

286 lines
11 KiB
HTML
Raw Normal View History

2023-10-25 11:23:21 +02:00
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<title>Regular Expressions</title>
<meta charset="utf-8">
<meta name="description" content="A collection of examples of using Common Lisp">
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<link rel="icon" href=
"assets/cl-logo-blue.png"/>
<link rel="stylesheet" href=
"assets/style.css">
<script type="text/javascript" src=
"assets/highlight-lisp.js">
</script>
<script type="text/javascript" src=
"assets/jquery-3.2.1.min.js">
</script>
<script type="text/javascript" src=
"assets/jquery.toc/jquery.toc.min.js">
</script>
<script type="text/javascript" src=
"assets/toggle-toc.js">
</script>
<link rel="stylesheet" href=
"assets/github.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
</head>
<body>
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Regular Expressions</h1>
<div id="logo-container">
<a href="index.html">
<img id="logo" src="assets/cl-logo-blue.png"/>
</a>
<div id="searchform-container">
<form onsubmit="duckSearch()" action="javascript:void(0)">
<input id="searchField" type="text" value="" placeholder="Search...">
</form>
</div>
<div id="toc-container" class="toc-close">
<div id="toc-title">Table of Contents</div>
<ul id="toc" class="list-unstyled"></ul>
</div>
</div>
<div id="content-container">
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Regular Expressions</h1>
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
2024-05-15 18:18:38 +02:00
<!-- <p class="announce"> -->
<!-- 📢 🤶 ⭐ -->
<!-- <a style="font-size: 120%" href="https://www.udemy.com/course/common-lisp-programming/?couponCode=LISPY-XMAS2023" title="This course is under a paywall on the Udemy platform. Several videos are freely available so you can judge before diving in. vindarel is (I am) the main contributor to this Cookbook."> Discover our contributor's Lisp course with this Christmas coupon.</a> -->
<!-- <strong> -->
<!-- Recently added: 18 videos on MACROS. -->
<!-- </strong> -->
<!-- <a style="font-size: 90%" href="https://github.com/vindarel/common-lisp-course-in-videos/">Learn more</a>. -->
<!-- </p> -->
<p class="announce">
📢 New videos: <a href="https://www.youtube.com/watch?v=h_noB1sI_e8">web dev demo part 1</a>, <a href="https://www.youtube.com/watch?v=xnwc7irnc8k">dynamic page with HTMX</a>, <a href="https://www.youtube.com/watch?v=Zpn86AQRVN8">Weblocks demo</a>
</p>
2023-10-25 11:23:21 +02:00
<p class="announce-neutral">
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
</p>
<div id="content"
<p>The <a href="http://www.lispworks.com/documentation/HyperSpec/index.html">ANSI Common Lisp
standard</a>
does not include facilities for regular expressions, but a couple of
libraries exist for this task, for instance:
<a href="https://github.com/edicl/cl-ppcre">cl-ppcre</a>.</p>
<p>See also the respective <a href="http://www.cliki.net/Regular%20Expression">Cliki:
regexp</a> page for more
links.</p>
<p>Note that some CL implementations include regexp facilities, notably
<a href="http://clisp.sourceforge.net/impnotes.html#regexp">CLISP</a> and
<a href="https://franz.com/support/documentation/current/doc/regexp.htm">ALLEGRO
CL</a>. If
in doubt, check your manual or ask your vendor.</p>
<p>The description provided below is far from complete, so dont forget
to check the reference manual that comes along with the CL-PPCRE
library.</p>
<h2 id="ppcre">PPCRE</h2>
<p><a href="https://github.com/edicl/cl-ppcre">CL-PPCRE</a> (abbreviation for
Portable Perl-compatible regular expressions) is a portable regular
expression library for Common Lisp with a broad set of features and
good performance. It has been ported to a number of Common Lisp
implementations and can be easily installed (or added as a dependency)
via Quicklisp:</p>
<pre><code class="language-lisp">(ql:quickload "cl-ppcre")
</code></pre>
<p>Basic operations with the CL-PPCRE library functions are described
below.</p>
2024-01-12 09:23:31 +01:00
<h3 id="looking-for-matching-patterns-scan-create-scanner">Looking for matching patterns: scan, create-scanner</h3>
2023-10-25 11:23:21 +02:00
<p>The <code>scan</code> function tries to match the given pattern and on success
returns four multiple-values values - the start of the match, the end
of the match, and two arrays denoting the beginnings and ends of
register matches. On failure returns <code>NIL</code>.</p>
<p>A regular expression pattern can be compiled with the <code>create-scanner</code>
function call. A “scanner” will be created that can be used by other
functions.</p>
<p>For example:</p>
<pre><code class="language-lisp">(let ((ptrn (ppcre:create-scanner "(a)*b")))
(ppcre:scan ptrn "xaaabd"))
</code></pre>
<p>will yield the same results as:</p>
<pre><code class="language-lisp">(ppcre:scan "(a)*b" "xaaabd")
</code></pre>
<p>but will require less time for repeated <code>scan</code> calls as parsing the
expression and compiling it is done only once.</p>
2024-01-12 09:23:31 +01:00
<h3 id="extracting-information">Extracting information</h3>
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
<p>CL-PPCRE provides several ways to extract matching fragments.</p>
<h4 id="all-matches-all-matches-as-strings">all-matches, all-matches-as-strings</h4>
<p>The function <code>all-matches-as-strings</code> is very handy: it returns a list of matches:</p>
<pre><code class="language-lisp">(ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
;; =&gt; ("1" "10" "42")
2023-10-25 11:23:21 +02:00
</code></pre>
2024-01-12 09:23:31 +01:00
<p>The function <code>all-matches</code> is similar, but it returns a list of positions:</p>
<pre><code class="language-lisp">(ppcre:all-matches "\\d+" "numbers: 1 10 42")
;; =&gt; (9 10 11 13 14 16)
</code></pre>
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
<p>Look carefully: it actually return a list containing the start and end
positions of all matches: 9 and 10 are the start and end for the first
number (1), and so on.</p>
<p>If you wanted to extract integers from this example string, simply map
<code>parse-integer</code> to the result:</p>
<pre><code class="language-lisp">CL-USER&gt; (ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
;; ("1" "10" "42")
CL-USER&gt; (mapcar #'parse-integer *)
(1 10 42)
</code></pre>
<p>The two functions accept the usual <code>:start</code> and <code>:end</code> key arguments. Additionnaly, <code>all-matches-as-strings</code> accepts a <code>:sharedp</code> argument:</p>
<blockquote>
<p>If SHAREDP is true, the substrings may share structure with TARGET-STRING.</p>
</blockquote>
2024-05-15 18:18:38 +02:00
<h4 id="count-matches-new-in-212-april-2024">count-matches (new in 2.1.2, April 2024)</h4>
<p><code>(count-matches regex target-string)</code> returns a count of all matches of <code>regex</code> against <code>target-string</code>:</p>
<pre><code class="language-lisp">CL-USER&gt; (ppcre:count-matches "a" "foo bar baz")
2
CL-USER&gt; (ppcre:count-matches "\\w*" "foo bar baz")
6
</code></pre>
2024-01-12 09:23:31 +01:00
<h4 id="scan-to-strings-register-groups-bind">scan-to-strings, register-groups-bind</h4>
2023-10-25 11:23:21 +02:00
<p>The <code>scan-to-strings</code> function is similar to <code>scan</code> but returns
substrings of target-string instead of positions. This function
returns two values on success: the whole match as a string plus an
array of substrings (or NILs) corresponding to the matched registers.</p>
<p>The <code>register-groups-bind</code> function tries to match the given pattern
against the target string and binds matching fragments with the given
variables.</p>
<pre><code class="language-lisp">(ppcre:register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc" :sharedp t)
(list first second third fourth))
;; =&gt; ("c" "a" "b" "c")
</code></pre>
<p>CL-PPCRE also provides a shortcut for calling a function before
assigning the matching fragment to the variable:</p>
2024-01-12 09:23:31 +01:00
<pre><code class="language-lisp">(ppcre:register-groups-bind
(fname lname (#'parse-integer date month year))
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})"
"Frank Zappa 21.12.1940")
(list fname lname date month year))
;; =&gt; ("Frank" "Zappa" 21 12 1940)
2023-10-25 11:23:21 +02:00
</code></pre>
2024-01-12 09:23:31 +01:00
<h3 id="replacing-text-regex-replace-regex-replace-all">Replacing text: regex-replace, regex-replace-all</h3>
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
<pre><code class="language-lisp">(ppcre:regex-replace "a" "abc" "A") ;; =&gt; "Abc"
;; or
(let ((pat (ppcre:create-scanner "a")))
(ppcre:regex-replace pat "abc" "A"))
</code></pre>
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
<h2 id="see-more">See more</h2>
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
<ul>
<li><a href="https://common-lisp-libraries.readthedocs.io/cl-ppcre/">cl-ppcre on common-lisp-libraries.readthedocs.io</a> and read on: <code>do-matches</code>, <code>do-matches-as-strings</code>,
<code>do-register-groups</code>, <code>do-scans</code>, <code>parse-string</code>, <code>regex-apropos</code>,
<code>quote-meta-chars</code>, <code>split</code></li>
</ul>
2023-10-25 11:23:21 +02:00
<p class="page-source">
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/regexp.md">regexp.md</a>
</p>
</div>
<script type="text/javascript">
// Don't write the TOC on the index.
if (window.location.pathname != "/cl-cookbook/") {
$("#toc").toc({
content: "#content", // will ignore the first h1 with the site+page title.
headings: "h1,h2,h3,h4"});
}
$("#two-cols + ul").css({
"column-count": "2",
});
$("#contributors + ul").css({
"column-count": "4",
});
</script>
<div>
<footer class="footer">
<hr/>
&copy; 2002&ndash;2023 the Common Lisp Cookbook Project
<div>
2024-05-15 18:18:38 +02:00
📹 Discover <a style="color: darkgrey; text-decoration: underline", href="https://www.udemy.com/course/common-lisp-programming/?referralCode=2F3D698BBC4326F94358">our contributor's Common Lisp video course on Udemy</a>
2023-10-25 11:23:21 +02:00
</div>
</footer>
</div>
<div id="toc-btn">T<br>O<br>C</div>
</div>
<script text="javascript">
HighlightLisp.highlight_auto({className: null});
</script>
<script type="text/javascript">
function duckSearch() {
var searchField = document.getElementById("searchField");
if (searchField && searchField.value) {
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
// https://duckduckgo.com/params
// kj=b2: blue header in results page
// kf=-1: no favicons
}
}
</script>
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
</body>
</html>