emacs.d/clones/lispcookbook.github.io/cl-cookbook/regexp.html

235 lines
8.3 KiB
HTML
Raw Normal View History

2022-08-02 12:34:59 +02:00
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.2.0">
<title>Regular Expressions</title>
<meta charset="utf-8">
<meta name="description" content="A collection of examples of using Common Lisp">
<meta name="viewport" content=
"width=device-width, initial-scale=1">
<link rel="stylesheet" href=
"assets/style.css">
<script type="text/javascript" src=
"assets/highlight-lisp.js">
</script>
<script type="text/javascript" src=
"assets/jquery-3.2.1.min.js">
</script>
<script type="text/javascript" src=
"assets/jquery.toc/jquery.toc.min.js">
</script>
<script type="text/javascript" src=
"assets/toggle-toc.js">
</script>
<link rel="stylesheet" href=
"assets/github.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
</head>
<body>
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Regular Expressions</h1>
<div id="logo-container">
<a href="index.html">
<img id="logo" src="assets/cl-logo-blue.png"/>
</a>
<div id="searchform-container">
<form onsubmit="duckSearch()" action="javascript:void(0)">
<input id="searchField" type="text" value="" placeholder="Search...">
</form>
</div>
<div id="toc-container" class="toc-close">
<div id="toc-title">Table of Contents</div>
<ul id="toc" class="list-unstyled"></ul>
</div>
</div>
<div id="content-container">
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> &ndash; Regular Expressions</h1>
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
<p class="announce">
📹 <a href="https://www.udemy.com/course/common-lisp-programming/?couponCode=6926D599AA-LISP4ALL">NEW! Learn Lisp in videos and support our contributors with this 40% discount.</a>
</p>
<p class="announce-neutral">
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
</p>
<div id="content"
<p>The <a href="http://www.lispworks.com/documentation/HyperSpec/index.html">ANSI Common Lisp
standard</a>
does not include facilities for regular expressions, but a couple of
libraries exist for this task, for instance:
<a href="https://github.com/edicl/cl-ppcre">cl-ppcre</a>.</p>
<p>See also the respective <a href="http://www.cliki.net/Regular%20Expression">Cliki:
regexp</a> page for more
links.</p>
<p>Note that some CL implementations include regexp facilities, notably
<a href="http://clisp.sourceforge.net/impnotes.html#regexp">CLISP</a> and
<a href="https://franz.com/support/documentation/current/doc/regexp.htm">ALLEGRO
CL</a>. If
in doubt, check your manual or ask your vendor.</p>
<p>The description provided below is far from complete, so dont forget
to check the reference manual that comes along with the CL-PPCRE
library.</p>
<h2 id="ppcre">PPCRE</h2>
<h3 id="using-ppcre">Using PPCRE</h3>
<p><a href="https://github.com/edicl/cl-ppcre">CL-PPCRE</a> (abbreviation for
Portable Perl-compatible regular expressions) is a portable regular
expression library for Common Lisp with a broad set of features and
good performance. It has been ported to a number of Common Lisp
implementations and can be easily installed (or added as a dependency)
via Quicklisp:</p>
<pre><code class="language-lisp">(ql:quickload "cl-ppcre")
</code></pre>
<p>Basic operations with the CL-PPCRE library functions are described
below.</p>
<h3 id="looking-for-matching-patterns">Looking for matching patterns</h3>
<p>The <code>scan</code> function tries to match the given pattern and on success
returns four multiple-values values - the start of the match, the end
of the match, and two arrays denoting the beginnings and ends of
register matches. On failure returns <code>NIL</code>.</p>
<p>A regular expression pattern can be compiled with the <code>create-scanner</code>
function call. A “scanner” will be created that can be used by other
functions.</p>
<p>For example:</p>
<pre><code class="language-lisp">(let ((ptrn (ppcre:create-scanner "(a)*b")))
(ppcre:scan ptrn "xaaabd"))
</code></pre>
<p>will yield the same results as:</p>
<pre><code class="language-lisp">(ppcre:scan "(a)*b" "xaaabd")
</code></pre>
<p>but will require less time for repeated <code>scan</code> calls as parsing the
expression and compiling it is done only once.</p>
<h3 id="replacing-text">Replacing text</h3>
<pre><code class="language-lisp">(ppcre:regex-replace "a" "abc" "A") ;; =&gt; "Abc"
;; or
(let ((pat (ppcre:create-scanner "a")))
(ppcre:regex-replace pat "abc" "A"))
</code></pre>
<h3 id="extracting-information">Extracting information</h3>
<p>CL-PPCRE provides a several ways to extract matching fragments, among
them: the <code>scan-to-strings</code> and <code>register-groups-bind</code> functions.</p>
<p>The <code>scan-to-strings</code> function is similar to <code>scan</code> but returns
substrings of target-string instead of positions. This function
returns two values on success: the whole match as a string plus an
array of substrings (or NILs) corresponding to the matched registers.</p>
<p>The <code>register-groups-bind</code> function tries to match the given pattern
against the target string and binds matching fragments with the given
variables.</p>
<pre><code class="language-lisp">(ppcre:register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc" :sharedp t)
(list first second third fourth))
;; =&gt; ("c" "a" "b" "c")
</code></pre>
<p>CL-PPCRE also provides a shortcut for calling a function before
assigning the matching fragment to the variable:</p>
<pre><code class="language-lisp">(ppcre:register-groups-bind (fname lname (#'parse-integer date month year))
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})" "Frank Zappa 21.12.1940")
(list fname lname (encode-universal-time 0 0 0 date month year 0)))
;; =&gt; ("Frank" "Zappa" 1292889600)
</code></pre>
<h3 id="syntactic-sugar">Syntactic sugar</h3>
<p>It might be more convenient to use CL-PPCRE with the
<a href="https://github.com/edicl/cl-interpol">CL-INTERPOL</a>
library. CL-INTERPOL is a library for Common Lisp which modifies the
reader in a way that introduces interpolation within strings similar
to Perl, Scala, or Unix Shell scripts.</p>
<p>In addition to loading the CL-INTERPOL library, initialization call
must be made to properly configure the Lisp reader. This is
accomplished by either calling the <code>enable-interpol-syntax</code> function
from the REPL or placing that call in the source file before using any
of its features:</p>
<pre><code class="language-lisp">(interpol:enable-interpol-syntax)
</code></pre>
<p class="page-source">
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/regexp.md">regexp.md</a>
</p>
</div>
<script type="text/javascript">
// Don't write the TOC on the index.
if (window.location.pathname != "/cl-cookbook/") {
$("#toc").toc({
content: "#content", // will ignore the first h1 with the site+page title.
headings: "h1,h2,h3,h4"});
}
$("#two-cols + ul").css({
"column-count": "2",
});
$("#contributors + ul").css({
"column-count": "4",
});
</script>
<div>
<footer class="footer">
<hr/>
&copy; 2002&ndash;2021 the Common Lisp Cookbook Project
</footer>
</div>
<div id="toc-btn">T<br>O<br>C</div>
</div>
<script text="javascript">
HighlightLisp.highlight_auto({className: null});
</script>
<script type="text/javascript">
function duckSearch() {
var searchField = document.getElementById("searchField");
if (searchField && searchField.value) {
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
// https://duckduckgo.com/params
// kj=b2: blue header in results page
// kf=-1: no favicons
}
}
</script>
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
</body>
</html>