2023-10-25 11:23:21 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta name = "generator" content =
"HTML Tidy for HTML5 for Linux version 5.2.0">
< title > Regular Expressions< / title >
< meta charset = "utf-8" >
< meta name = "description" content = "A collection of examples of using Common Lisp" >
< meta name = "viewport" content =
"width=device-width, initial-scale=1">
< link rel = "icon" href =
"assets/cl-logo-blue.png"/>
< link rel = "stylesheet" href =
"assets/style.css">
< script type = "text/javascript" src =
"assets/highlight-lisp.js">
< / script >
< script type = "text/javascript" src =
"assets/jquery-3.2.1.min.js">
< / script >
< script type = "text/javascript" src =
"assets/jquery.toc/jquery.toc.min.js">
< / script >
< script type = "text/javascript" src =
"assets/toggle-toc.js">
< / script >
< link rel = "stylesheet" href =
"assets/github.css">
< link rel = "stylesheet" href = "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity = "sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin = "anonymous" >
< / head >
< body >
< h1 id = "title-xs" > < a href = "index.html" > The Common Lisp Cookbook< / a > – Regular Expressions< / h1 >
< div id = "logo-container" >
< a href = "index.html" >
< img id = "logo" src = "assets/cl-logo-blue.png" / >
< / a >
< div id = "searchform-container" >
< form onsubmit = "duckSearch()" action = "javascript:void(0)" >
< input id = "searchField" type = "text" value = "" placeholder = "Search..." >
< / form >
< / div >
< div id = "toc-container" class = "toc-close" >
< div id = "toc-title" > Table of Contents< / div >
< ul id = "toc" class = "list-unstyled" > < / ul >
< / div >
< / div >
< div id = "content-container" >
< h1 id = "title-non-xs" > < a href = "index.html" > The Common Lisp Cookbook< / a > – Regular Expressions< / h1 >
<!-- Announcement we can keep for 1 month or more. I remove it and re - add it from time to time. -->
2024-05-15 18:18:38 +02:00
<!-- <p class="announce"> -->
<!-- 📢 🤶 ⭐ -->
<!-- <a style="font - size: 120%" href="https://www.udemy.com/course/common - lisp - programming/?couponCode=LISPY - XMAS2023" title="This course is under a paywall on the Udemy platform. Several videos are freely available so you can judge before diving in. vindarel is (I am) the main contributor to this Cookbook."> Discover our contributor's Lisp course with this Christmas coupon.</a> -->
<!-- <strong> -->
<!-- Recently added: 18 videos on MACROS. -->
<!-- </strong> -->
<!-- <a style="font - size: 90%" href="https://github.com/vindarel/common - lisp - course - in - videos/">Learn more</a>. -->
<!-- </p> -->
< p class = "announce" >
📢 New videos: < a href = "https://www.youtube.com/watch?v=h_noB1sI_e8" > web dev demo part 1< / a > , < a href = "https://www.youtube.com/watch?v=xnwc7irnc8k" > dynamic page with HTMX< / a > , < a href = "https://www.youtube.com/watch?v=Zpn86AQRVN8" > Weblocks demo< / a >
< / p >
2023-10-25 11:23:21 +02:00
< p class = "announce-neutral" >
📕 < a href = "index.html#download-in-epub" > Get the EPUB and PDF< / a >
< / p >
< div id = "content"
< p > The < a href = "http://www.lispworks.com/documentation/HyperSpec/index.html" > ANSI Common Lisp
standard< / a >
does not include facilities for regular expressions, but a couple of
libraries exist for this task, for instance:
< a href = "https://github.com/edicl/cl-ppcre" > cl-ppcre< / a > .< / p >
< p > See also the respective < a href = "http://www.cliki.net/Regular%20Expression" > Cliki:
regexp< / a > page for more
links.< / p >
< p > Note that some CL implementations include regexp facilities, notably
< a href = "http://clisp.sourceforge.net/impnotes.html#regexp" > CLISP< / a > and
< a href = "https://franz.com/support/documentation/current/doc/regexp.htm" > ALLEGRO
CL< / a > . If
in doubt, check your manual or ask your vendor.< / p >
< p > The description provided below is far from complete, so don’ t forget
to check the reference manual that comes along with the CL-PPCRE
library.< / p >
< h2 id = "ppcre" > PPCRE< / h2 >
< p > < a href = "https://github.com/edicl/cl-ppcre" > CL-PPCRE< / a > (abbreviation for
Portable Perl-compatible regular expressions) is a portable regular
expression library for Common Lisp with a broad set of features and
good performance. It has been ported to a number of Common Lisp
implementations and can be easily installed (or added as a dependency)
via Quicklisp:< / p >
< pre > < code class = "language-lisp" > (ql:quickload "cl-ppcre")
< / code > < / pre >
< p > Basic operations with the CL-PPCRE library functions are described
below.< / p >
2024-01-12 09:23:31 +01:00
< h3 id = "looking-for-matching-patterns-scan-create-scanner" > Looking for matching patterns: scan, create-scanner< / h3 >
2023-10-25 11:23:21 +02:00
< p > The < code > scan< / code > function tries to match the given pattern and on success
returns four multiple-values values - the start of the match, the end
of the match, and two arrays denoting the beginnings and ends of
register matches. On failure returns < code > NIL< / code > .< / p >
< p > A regular expression pattern can be compiled with the < code > create-scanner< / code >
function call. A “scanner” will be created that can be used by other
functions.< / p >
< p > For example:< / p >
< pre > < code class = "language-lisp" > (let ((ptrn (ppcre:create-scanner "(a)*b")))
(ppcre:scan ptrn "xaaabd"))
< / code > < / pre >
< p > will yield the same results as:< / p >
< pre > < code class = "language-lisp" > (ppcre:scan "(a)*b" "xaaabd")
< / code > < / pre >
< p > but will require less time for repeated < code > scan< / code > calls as parsing the
expression and compiling it is done only once.< / p >
2024-01-12 09:23:31 +01:00
< h3 id = "extracting-information" > Extracting information< / h3 >
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
< p > CL-PPCRE provides several ways to extract matching fragments.< / p >
< h4 id = "all-matches-all-matches-as-strings" > all-matches, all-matches-as-strings< / h4 >
< p > The function < code > all-matches-as-strings< / code > is very handy: it returns a list of matches:< / p >
< pre > < code class = "language-lisp" > (ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
;; => ("1" "10" "42")
2023-10-25 11:23:21 +02:00
< / code > < / pre >
2024-01-12 09:23:31 +01:00
< p > The function < code > all-matches< / code > is similar, but it returns a list of positions:< / p >
< pre > < code class = "language-lisp" > (ppcre:all-matches "\\d+" "numbers: 1 10 42")
;; => (9 10 11 13 14 16)
< / code > < / pre >
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
< p > Look carefully: it actually return a list containing the start and end
positions of all matches: 9 and 10 are the start and end for the first
number (1), and so on.< / p >
< p > If you wanted to extract integers from this example string, simply map
< code > parse-integer< / code > to the result:< / p >
< pre > < code class = "language-lisp" > CL-USER> (ppcre:all-matches-as-strings "\\d+" "numbers: 1 10 42")
;; ("1" "10" "42")
CL-USER> (mapcar #'parse-integer *)
(1 10 42)
< / code > < / pre >
< p > The two functions accept the usual < code > :start< / code > and < code > :end< / code > key arguments. Additionnaly, < code > all-matches-as-strings< / code > accepts a < code > :sharedp< / code > argument:< / p >
< blockquote >
< p > If SHAREDP is true, the substrings may share structure with TARGET-STRING.< / p >
< / blockquote >
2024-05-15 18:18:38 +02:00
< h4 id = "count-matches-new-in-212-april-2024" > count-matches (new in 2.1.2, April 2024)< / h4 >
< p > < code > (count-matches regex target-string)< / code > returns a count of all matches of < code > regex< / code > against < code > target-string< / code > :< / p >
< pre > < code class = "language-lisp" > CL-USER> (ppcre:count-matches "a" "foo bar baz")
2
CL-USER> (ppcre:count-matches "\\w*" "foo bar baz")
6
< / code > < / pre >
2024-01-12 09:23:31 +01:00
< h4 id = "scan-to-strings-register-groups-bind" > scan-to-strings, register-groups-bind< / h4 >
2023-10-25 11:23:21 +02:00
< p > The < code > scan-to-strings< / code > function is similar to < code > scan< / code > but returns
substrings of target-string instead of positions. This function
returns two values on success: the whole match as a string plus an
array of substrings (or NILs) corresponding to the matched registers.< / p >
< p > The < code > register-groups-bind< / code > function tries to match the given pattern
against the target string and binds matching fragments with the given
variables.< / p >
< pre > < code class = "language-lisp" > (ppcre:register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc" :sharedp t)
(list first second third fourth))
;; => ("c" "a" "b" "c")
< / code > < / pre >
< p > CL-PPCRE also provides a shortcut for calling a function before
assigning the matching fragment to the variable:< / p >
2024-01-12 09:23:31 +01:00
< pre > < code class = "language-lisp" > (ppcre:register-groups-bind
(fname lname (#'parse-integer date month year))
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})"
"Frank Zappa 21.12.1940")
(list fname lname date month year))
;; => ("Frank" "Zappa" 21 12 1940)
2023-10-25 11:23:21 +02:00
< / code > < / pre >
2024-01-12 09:23:31 +01:00
< h3 id = "replacing-text-regex-replace-regex-replace-all" > Replacing text: regex-replace, regex-replace-all< / h3 >
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
< pre > < code class = "language-lisp" > (ppcre:regex-replace "a" "abc" "A") ;; => "Abc"
;; or
(let ((pat (ppcre:create-scanner "a")))
(ppcre:regex-replace pat "abc" "A"))
< / code > < / pre >
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
< h2 id = "see-more" > See more< / h2 >
2023-10-25 11:23:21 +02:00
2024-01-12 09:23:31 +01:00
< ul >
< li > < a href = "https://common-lisp-libraries.readthedocs.io/cl-ppcre/" > cl-ppcre on common-lisp-libraries.readthedocs.io< / a > and read on: < code > do-matches< / code > , < code > do-matches-as-strings< / code > ,
< code > do-register-groups< / code > , < code > do-scans< / code > , < code > parse-string< / code > , < code > regex-apropos< / code > ,
< code > quote-meta-chars< / code > , < code > split< / code > …< / li >
< / ul >
2023-10-25 11:23:21 +02:00
< p class = "page-source" >
Page source: < a href = "https://github.com/LispCookbook/cl-cookbook/blob/master/regexp.md" > regexp.md< / a >
< / p >
< / div >
< script type = "text/javascript" >
// Don't write the TOC on the index.
if (window.location.pathname != "/cl-cookbook/") {
$("#toc").toc({
content: "#content", // will ignore the first h1 with the site+page title.
headings: "h1,h2,h3,h4"});
}
$("#two-cols + ul").css({
"column-count": "2",
});
$("#contributors + ul").css({
"column-count": "4",
});
< / script >
< div >
< footer class = "footer" >
< hr / >
© 2002– 2023 the Common Lisp Cookbook Project
< div >
2024-05-15 18:18:38 +02:00
📹 Discover < a style = "color: darkgrey; text-decoration: underline" , href = "https://www.udemy.com/course/common-lisp-programming/?referralCode=2F3D698BBC4326F94358" > our contributor's Common Lisp video course on Udemy< / a >
2023-10-25 11:23:21 +02:00
< / div >
< / footer >
< / div >
< div id = "toc-btn" > T< br > O< br > C< / div >
< / div >
< script text = "javascript" >
HighlightLisp.highlight_auto({className: null});
< / script >
< script type = "text/javascript" >
function duckSearch() {
var searchField = document.getElementById("searchField");
if (searchField & & searchField.value) {
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
window.location.href = "https://duckduckgo.com/?kj=b2& kf=-1& ko=1& q=" + query;
// https://duckduckgo.com/params
// kj=b2: blue header in results page
// kf=-1: no favicons
}
}
< / script >
< script async defer data-domain = "lispcookbook.github.io/cl-cookbook" src = "https://plausible.io/js/plausible.js" > < / script >
< / body >
< / html >