1
0
Fork 0
cl-sites/lispcookbook.github.io/cl-cookbook/regexp.html

285 lines
11 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<title>Regular Expressions</title>
<meta charset="utf-8">
<meta name="description" content="A collection of examples of using Common Lisp">
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<link rel="icon" href=
"assets/cl-logo-blue.png"/>
<link rel="stylesheet" href=
"assets/style.css">
<script type="text/javascript" src=
"assets/highlight-lisp.js">
</script>
<script type="text/javascript" src=
"assets/jquery-3.2.1.min.js">
</script>
<script type="text/javascript" src=
"assets/jquery.toc/jquery.toc.min.js">
</script>
<script type="text/javascript" src=
"assets/toggle-toc.js">
</script>
<link rel="stylesheet" href=
"assets/github.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
</head>
<body>
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Regular Expressions</h1>
<div id="logo-container">
<a href="index.html">
<img id="logo" src="assets/cl-logo-blue.png"/>
</a>
<div id="searchform-container">
<form onsubmit="duckSearch()" action="javascript:void(0)">
<input id="searchField" type="text" value="" placeholder="Search...">
</form>
</div>
<div id="toc-container" class="toc-close">
<div id="toc-title">Table of Contents</div>
<ul id="toc" class="list-unstyled"></ul>
</div>
</div>
<div id="content-container">
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Regular Expressions</h1>
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
<!-- <p class="announce"> -->
<!-- 📢 🤶 ⭐ -->
<!-- <a style="font-size: 120%" href="https://www.udemy.com/course/common-lisp-programming/?couponCode=LISPY-XMAS2023" title="This course is under a paywall on the Udemy platform. Several videos are freely available so you can judge before diving in. vindarel is (I am) the main contributor to this Cookbook."> Discover our contributor's Lisp course with this Christmas coupon.</a> -->
<!-- <strong> -->
<!-- Recently added: 18 videos on MACROS. -->
<!-- </strong> -->
<!-- <a style="font-size: 90%" href="https://github.com/vindarel/common-lisp-course-in-videos/">Learn more</a>. -->
<!-- </p> -->
<p class="announce">
📢 New videos: <a href="https://www.youtube.com/watch?v=h_noB1sI_e8">web dev demo part 1</a>, <a href="https://www.youtube.com/watch?v=xnwc7irnc8k">dynamic page with HTMX</a>, <a href="https://www.youtube.com/watch?v=Zpn86AQRVN8">Weblocks demo</a>
</p>
<p class="announce-neutral">
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
</p>
<div id="content"
<p>The <a href="http://www.lispworks.com/documentation/HyperSpec/index.html">ANSI Common Lisp
standard</a>
does not include facilities for regular expressions, but a couple of
libraries exist for this task, for instance:
<a href="https://github.com/edicl/cl-ppcre">cl-ppcre</a>.</p>
<p>See also the respective <a href="http://www.cliki.net/Regular%20Expression">Cliki:
regexp</a> page for more
links.</p>
<p>Note that some CL implementations include regexp facilities, notably
<a href="http://clisp.sourceforge.net/impnotes.html#regexp">CLISP</a> and
<a href="https://franz.com/support/documentation/current/doc/regexp.htm">ALLEGRO
CL</a>. If
in doubt, check your manual or ask your vendor.</p>
<p>The description provided below is far from complete, so dont forget
to check the reference manual that comes along with the CL-PPCRE
library.</p>
<h2 id="ppcre">PPCRE</h2>
<p><a href="https://github.com/edicl/cl-ppcre">CL-PPCRE</a> (abbreviation for
Portable Perl-compatible regular expressions) is a portable regular
expression library for Common Lisp with a broad set of features and
good performance. It has been ported to a number of Common Lisp
implementations and can be easily installed (or added as a dependency)
via Quicklisp:</p>
<pre><code class="language-lisp">(ql:quickload "cl-ppcre")
</code></pre>
<p>Basic operations with the CL-PPCRE library functions are described
below.</p>
<h3 id="looking-for-matching-patterns-scan-create-scanner">Looking for matching patterns: scan, create-scanner</h3>
<p>The <code>scan</code> function tries to match the given pattern and on success
returns four multiple-values values - the start of the match, the end
of the match, and two arrays denoting the beginnings and ends of
register matches. On failure returns <code>NIL</code>.</p>
<p>A regular expression pattern can be compiled with the <code>create-scanner</code>
function call. A “scanner” will be created that can be used by other
functions.</p>
<p>For example:</p>
<pre><code class="language-lisp">(let ((ptrn (ppcre:create-scanner "(a)*b")))
(ppcre:scan ptrn "xaaabd"))
</code></pre>
<p>will yield the same results as:</p>
<pre><code class="language-lisp">(ppcre:scan "(a)*b" "xaaabd")
</code></pre>
<p>but will require less time for repeated <code>scan</code> calls as parsing the
expression and compiling it is done only once.</p>
<h3 id="extracting-information">Extracting information</h3>
<p>CL-PPCRE provides several ways to extract matching fragments.</p>
<h4 id="all-matches-all-matches-as-strings">all-matches, all-matches-as-strings</h4>
<p>The function <code>all-matches-as-strings</code> is very handy: it returns a list of matches:</p>
<pre><code class="language-lisp">(ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
;; =&gt; ("1" "10" "42")
</code></pre>
<p>The function <code>all-matches</code> is similar, but it returns a list of positions:</p>
<pre><code class="language-lisp">(ppcre:all-matches "\\d+" "numbers: 1 10 42")
;; =&gt; (9 10 11 13 14 16)
</code></pre>
<p>Look carefully: it actually return a list containing the start and end
positions of all matches: 9 and 10 are the start and end for the first
number (1), and so on.</p>
<p>If you wanted to extract integers from this example string, simply map
<code>parse-integer</code> to the result:</p>
<pre><code class="language-lisp">CL-USER&gt; (ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
;; ("1" "10" "42")
CL-USER&gt; (mapcar #'parse-integer *)
(1 10 42)
</code></pre>
<p>The two functions accept the usual <code>:start</code> and <code>:end</code> key arguments. Additionnaly, <code>all-matches-as-strings</code> accepts a <code>:sharedp</code> argument:</p>
<blockquote>
<p>If SHAREDP is true, the substrings may share structure with TARGET-STRING.</p>
</blockquote>
<h4 id="count-matches-new-in-212-april-2024">count-matches (new in 2.1.2, April 2024)</h4>
<p><code>(count-matches regex target-string)</code> returns a count of all matches of <code>regex</code> against <code>target-string</code>:</p>
<pre><code class="language-lisp">CL-USER&gt; (ppcre:count-matches "a" "foo bar baz")
2
CL-USER&gt; (ppcre:count-matches "\\w*" "foo bar baz")
6
</code></pre>
<h4 id="scan-to-strings-register-groups-bind">scan-to-strings, register-groups-bind</h4>
<p>The <code>scan-to-strings</code> function is similar to <code>scan</code> but returns
substrings of target-string instead of positions. This function
returns two values on success: the whole match as a string plus an
array of substrings (or NILs) corresponding to the matched registers.</p>
<p>The <code>register-groups-bind</code> function tries to match the given pattern
against the target string and binds matching fragments with the given
variables.</p>
<pre><code class="language-lisp">(ppcre:register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc" :sharedp t)
(list first second third fourth))
;; =&gt; ("c" "a" "b" "c")
</code></pre>
<p>CL-PPCRE also provides a shortcut for calling a function before
assigning the matching fragment to the variable:</p>
<pre><code class="language-lisp">(ppcre:register-groups-bind
(fname lname (#'parse-integer date month year))
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})"
"Frank Zappa 21.12.1940")
(list fname lname date month year))
;; =&gt; ("Frank" "Zappa" 21 12 1940)
</code></pre>
<h3 id="replacing-text-regex-replace-regex-replace-all">Replacing text: regex-replace, regex-replace-all</h3>
<pre><code class="language-lisp">(ppcre:regex-replace "a" "abc" "A") ;; =&gt; "Abc"
;; or
(let ((pat (ppcre:create-scanner "a")))
(ppcre:regex-replace pat "abc" "A"))
</code></pre>
<h2 id="see-more">See more</h2>
<ul>
<li><a href="https://common-lisp-libraries.readthedocs.io/cl-ppcre/">cl-ppcre on common-lisp-libraries.readthedocs.io</a> and read on: <code>do-matches</code>, <code>do-matches-as-strings</code>,
<code>do-register-groups</code>, <code>do-scans</code>, <code>parse-string</code>, <code>regex-apropos</code>,
<code>quote-meta-chars</code>, <code>split</code></li>
</ul>
<p class="page-source">
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/regexp.md">regexp.md</a>
</p>
</div>
<script type="text/javascript">
// Don't write the TOC on the index.
if (window.location.pathname != "/cl-cookbook/") {
$("#toc").toc({
content: "#content", // will ignore the first h1 with the site+page title.
headings: "h1,h2,h3,h4"});
}
$("#two-cols + ul").css({
"column-count": "2",
});
$("#contributors + ul").css({
"column-count": "4",
});
</script>
<div>
<footer class="footer">
<hr/>
&copy; 2002&ndash;2023 the Common Lisp Cookbook Project
<div>
📹 Discover <a style="color: darkgrey; text-decoration: underline", href="https://www.udemy.com/course/common-lisp-programming/?referralCode=2F3D698BBC4326F94358">our contributor's Common Lisp video course on Udemy</a>
</div>
</footer>
</div>
<div id="toc-btn">T<br>O<br>C</div>
</div>
<script text="javascript">
HighlightLisp.highlight_auto({className: null});
</script>
<script type="text/javascript">
function duckSearch() {
var searchField = document.getElementById("searchField");
if (searchField && searchField.value) {
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
// https://duckduckgo.com/params
// kj=b2: blue header in results page
// kf=-1: no favicons
}
}
</script>
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
</body>
</html>