236 lines
8.3 KiB
HTML
236 lines
8.3 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta name="generator" content=
|
||
"HTML Tidy for HTML5 for Linux version 5.2.0">
|
||
<title>Regular Expressions</title>
|
||
<meta charset="utf-8">
|
||
<meta name="description" content="A collection of examples of using Common Lisp">
|
||
<meta name="viewport" content=
|
||
"width=device-width, initial-scale=1">
|
||
<link rel="icon" href=
|
||
"assets/cl-logo-blue.png"/>
|
||
<link rel="stylesheet" href=
|
||
"assets/style.css">
|
||
<script type="text/javascript" src=
|
||
"assets/highlight-lisp.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/jquery-3.2.1.min.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/jquery.toc/jquery.toc.min.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/toggle-toc.js">
|
||
</script>
|
||
|
||
<link rel="stylesheet" href=
|
||
"assets/github.css">
|
||
|
||
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
|
||
</head>
|
||
<body>
|
||
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> – Regular Expressions</h1>
|
||
<div id="logo-container">
|
||
<a href="index.html">
|
||
<img id="logo" src="assets/cl-logo-blue.png"/>
|
||
</a>
|
||
|
||
<div id="searchform-container">
|
||
<form onsubmit="duckSearch()" action="javascript:void(0)">
|
||
<input id="searchField" type="text" value="" placeholder="Search...">
|
||
</form>
|
||
</div>
|
||
|
||
<div id="toc-container" class="toc-close">
|
||
<div id="toc-title">Table of Contents</div>
|
||
<ul id="toc" class="list-unstyled"></ul>
|
||
</div>
|
||
</div>
|
||
|
||
<div id="content-container">
|
||
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> – Regular Expressions</h1>
|
||
|
||
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
|
||
<p class="announce">
|
||
📹 <a href="https://www.udemy.com/course/common-lisp-programming/?couponCode=6926D599AA-LISP4ALL">NEW! Learn Lisp in videos and support our contributors with this 40% discount.</a>
|
||
</p>
|
||
<p class="announce-neutral">
|
||
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
|
||
</p>
|
||
|
||
|
||
<div id="content"
|
||
<p>The <a href="http://www.lispworks.com/documentation/HyperSpec/index.html">ANSI Common Lisp
|
||
standard</a>
|
||
does not include facilities for regular expressions, but a couple of
|
||
libraries exist for this task, for instance:
|
||
<a href="https://github.com/edicl/cl-ppcre">cl-ppcre</a>.</p>
|
||
|
||
<p>See also the respective <a href="http://www.cliki.net/Regular%20Expression">Cliki:
|
||
regexp</a> page for more
|
||
links.</p>
|
||
|
||
<p>Note that some CL implementations include regexp facilities, notably
|
||
<a href="http://clisp.sourceforge.net/impnotes.html#regexp">CLISP</a> and
|
||
<a href="https://franz.com/support/documentation/current/doc/regexp.htm">ALLEGRO
|
||
CL</a>. If
|
||
in doubt, check your manual or ask your vendor.</p>
|
||
|
||
<p>The description provided below is far from complete, so don’t forget
|
||
to check the reference manual that comes along with the CL-PPCRE
|
||
library.</p>
|
||
|
||
<h2 id="ppcre">PPCRE</h2>
|
||
|
||
<h3 id="using-ppcre">Using PPCRE</h3>
|
||
|
||
<p><a href="https://github.com/edicl/cl-ppcre">CL-PPCRE</a> (abbreviation for
|
||
Portable Perl-compatible regular expressions) is a portable regular
|
||
expression library for Common Lisp with a broad set of features and
|
||
good performance. It has been ported to a number of Common Lisp
|
||
implementations and can be easily installed (or added as a dependency)
|
||
via Quicklisp:</p>
|
||
|
||
<pre><code class="language-lisp">(ql:quickload "cl-ppcre")
|
||
</code></pre>
|
||
|
||
<p>Basic operations with the CL-PPCRE library functions are described
|
||
below.</p>
|
||
|
||
<h3 id="looking-for-matching-patterns">Looking for matching patterns</h3>
|
||
|
||
<p>The <code>scan</code> function tries to match the given pattern and on success
|
||
returns four multiple-values values - the start of the match, the end
|
||
of the match, and two arrays denoting the beginnings and ends of
|
||
register matches. On failure returns <code>NIL</code>.</p>
|
||
|
||
<p>A regular expression pattern can be compiled with the <code>create-scanner</code>
|
||
function call. A “scanner” will be created that can be used by other
|
||
functions.</p>
|
||
|
||
<p>For example:</p>
|
||
|
||
<pre><code class="language-lisp">(let ((ptrn (ppcre:create-scanner "(a)*b")))
|
||
(ppcre:scan ptrn "xaaabd"))
|
||
</code></pre>
|
||
|
||
<p>will yield the same results as:</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:scan "(a)*b" "xaaabd")
|
||
</code></pre>
|
||
|
||
<p>but will require less time for repeated <code>scan</code> calls as parsing the
|
||
expression and compiling it is done only once.</p>
|
||
|
||
<h3 id="replacing-text">Replacing text</h3>
|
||
|
||
<pre><code class="language-lisp">(ppcre:regex-replace "a" "abc" "A") ;; => "Abc"
|
||
;; or
|
||
(let ((pat (ppcre:create-scanner "a")))
|
||
(ppcre:regex-replace pat "abc" "A"))
|
||
</code></pre>
|
||
|
||
<h3 id="extracting-information">Extracting information</h3>
|
||
|
||
<p>CL-PPCRE provides a several ways to extract matching fragments, among
|
||
them: the <code>scan-to-strings</code> and <code>register-groups-bind</code> functions.</p>
|
||
|
||
<p>The <code>scan-to-strings</code> function is similar to <code>scan</code> but returns
|
||
substrings of target-string instead of positions. This function
|
||
returns two values on success: the whole match as a string plus an
|
||
array of substrings (or NILs) corresponding to the matched registers.</p>
|
||
|
||
<p>The <code>register-groups-bind</code> function tries to match the given pattern
|
||
against the target string and binds matching fragments with the given
|
||
variables.</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:register-groups-bind (first second third fourth)
|
||
("((a)|(b)|(c))+" "abababc" :sharedp t)
|
||
(list first second third fourth))
|
||
;; => ("c" "a" "b" "c")
|
||
</code></pre>
|
||
|
||
<p>CL-PPCRE also provides a shortcut for calling a function before
|
||
assigning the matching fragment to the variable:</p>
|
||
|
||
<pre><code class="language-lisp">(ppcre:register-groups-bind (fname lname (#'parse-integer date month year))
|
||
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})" "Frank Zappa 21.12.1940")
|
||
(list fname lname (encode-universal-time 0 0 0 date month year 0)))
|
||
;; => ("Frank" "Zappa" 1292889600)
|
||
</code></pre>
|
||
|
||
<h3 id="syntactic-sugar">Syntactic sugar</h3>
|
||
|
||
<p>It might be more convenient to use CL-PPCRE with the
|
||
<a href="https://github.com/edicl/cl-interpol">CL-INTERPOL</a>
|
||
library. CL-INTERPOL is a library for Common Lisp which modifies the
|
||
reader in a way that introduces interpolation within strings similar
|
||
to Perl, Scala, or Unix Shell scripts.</p>
|
||
|
||
<p>In addition to loading the CL-INTERPOL library, initialization call
|
||
must be made to properly configure the Lisp reader. This is
|
||
accomplished by either calling the <code>enable-interpol-syntax</code> function
|
||
from the REPL or placing that call in the source file before using any
|
||
of its features:</p>
|
||
|
||
<pre><code class="language-lisp">(interpol:enable-interpol-syntax)
|
||
</code></pre>
|
||
|
||
|
||
<p class="page-source">
|
||
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/regexp.md">regexp.md</a>
|
||
</p>
|
||
</div>
|
||
|
||
<script type="text/javascript">
|
||
|
||
// Don't write the TOC on the index.
|
||
if (window.location.pathname != "/cl-cookbook/") {
|
||
$("#toc").toc({
|
||
content: "#content", // will ignore the first h1 with the site+page title.
|
||
headings: "h1,h2,h3,h4"});
|
||
}
|
||
|
||
$("#two-cols + ul").css({
|
||
"column-count": "2",
|
||
});
|
||
$("#contributors + ul").css({
|
||
"column-count": "4",
|
||
});
|
||
</script>
|
||
|
||
|
||
|
||
<div>
|
||
<footer class="footer">
|
||
<hr/>
|
||
© 2002–2021 the Common Lisp Cookbook Project
|
||
</footer>
|
||
|
||
</div>
|
||
<div id="toc-btn">T<br>O<br>C</div>
|
||
</div>
|
||
|
||
<script text="javascript">
|
||
HighlightLisp.highlight_auto({className: null});
|
||
</script>
|
||
|
||
<script type="text/javascript">
|
||
function duckSearch() {
|
||
var searchField = document.getElementById("searchField");
|
||
if (searchField && searchField.value) {
|
||
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
|
||
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
|
||
// https://duckduckgo.com/params
|
||
// kj=b2: blue header in results page
|
||
// kf=-1: no favicons
|
||
}
|
||
}
|
||
</script>
|
||
|
||
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
|
||
|
||
</body>
|
||
</html>
|