Fun with Macros: Do-File
Posted on April 19th, 2022.
It's been a while, but it's time to take a look at another fun little Common
Lisp macro with some interesting things inside it: do-file
.
Usage
The macro we'll be taking a look at today is called do-file
. It's used to
open a file and iterate over the contents using a reader function, saving you
some tedious boilerplate.
First let's look at some examples of how you could use it. Processing each line of a file is the default:
(do-file (line "foo.txt")
(unless (string= "" line)
(write-line (string-upcase line))))
Using a different reader function and another macro to gather data from inside the iteration:
(gathering
(do-file (n :reader #'read-integer)
(when (primep n)
(gather n))))
Passing along options to the underlying open
, and returning early:
(do-file (form "foo.lisp" :reader #'read :external-format :EBCDIC-US)
(when (eq form :stop)
(return :stopped-early))
(print form))
All of these could of course be done in other ways. You could have a separate
function that reads the file into a sequence and then pass that to mapcar
or
something else, but it can be wasteful to cons up the entire list if you're only
going to process items and don't need to retain then (or if you're going to stop
early).
You could also write a mapc-file
that takes a function instead of making this
a macro, but sometimes it's nice to not have to wrap things in a thunk. It's
probably worth having that function as an additional tool in the toolbox though!
Implementation
Here's the full implementation of the macro:
(let ((eof (gensym "EOF")))
(defmacro do-file ((symbol path
&rest open-options
&key (reader '#'read-line) &allow-other-keys)
&body body)
"Iterate over the contents of `file` using `reader`.
During iteration, `symbol` will be set to successive values read from the
file by `reader`.
`reader` can be any function that conforms to the usual reading interface,
i.e. anything that can handle `(read-foo stream eof-error-p eof-value)`.
Any keyword arguments other than `:reader` will be passed along to `open`.
If `nil` is used for one of the `:if-…` options to `open` and this results
in `open` returning `nil`, no iteration will take place.
An implicit block named `nil` surrounds the iteration, so `return` can be
used to terminate early.
Returns `nil`.
Examples:
(do-file (line \"foo.txt\")
(print line))
(do-file (form \"foo.lisp\" :reader #'read :external-format :EBCDIC-US)
(when (eq form :stop)
(return :stopped-early))
(print form))
(do-file (line \"does-not-exist.txt\" :if-does-not-exist nil)
(this-will-not-be-executed))
"
(let ((open-options (alexandria:remove-from-plist open-options :reader)))
(alexandria:with-gensyms (stream)
(alexandria:once-only (path reader)
`(when-let ((,stream (open ,path :direction :input ,@open-options)))
(unwind-protect
(do ((,symbol
(funcall ,reader ,stream nil ',eof)
(funcall ,reader ,stream nil ',eof)))
((eq ,symbol ',eof))
,@body)
(close ,stream))))))))
There are a few interesting things to talk about here.
Let Over Defmacro
The very first line is unusual: instead of the defmacro
being the top level
form, we wrap it in a let
to generate one single unique EOF sentinel object:
(let ((eof (gensym "EOF")))
(defmacro do-file (…)
…))
We could put the let
inside the macro, but then we'd be generating a separate
EOF object for every use of the macro, which is wasteful.
&rest and &key
Note how the argument list of the macro takes both &rest
and &key
arguments, and uses
&allow-other-keys
to let the macro take arbitrary keyword arguments
(defmacro do-file ((symbol path
&rest open-options
&key (reader '#'read-line) &allow-other-keys)
&body body)
(let ((open-options (alexandria:remove-from-plist open-options :reader)))
…
(when-let ((,stream (open ,path :direction :input ,@open-options)))
…)))
We pass along any keyword arguments we get (aside from the special :reader
argument for this macro) to open
. Using &allow-other-keys
means we don't
need to hardcode all the possible options to open
, and also allows for
additional implementation-specific options to be passed to open
if the user
wants.
We could have omitted the keyword arguments entirely, taken the arguments as
a raw &rest
, and pulled out :reader
ourselves with getf
. But doing it
this way means we don't have to fiddle around doing that, and also can also
provide slightly nicer documentation in an editor when it shows the macro's
argument list in the status bar. We'll also get a nicer error if we
accidentally pass an odd number of keyword arguments.
One more thing before we move on: note the extra level of quoting for the
(reader '#'read-line)
default value. It's important to remember that this is
a macro, and so when someone writes (do-file (… :reader #'foo) …)
the macro
isn't getting the function foo
because it's not evaluated yet, it's getting
the list (function foo)
. But the default value is evaluated when the
argument is missing, so we need the extra layer of quoting to make sure the
result makes sense and matches what we'd be getting normally.
Macros Using Macros
We use with-gensyms
and once-only
from Alexandria to maintain good hygiene
in the macro. We also use when-let
to avoid some more boilerplate:
(defmacro do-file (…)
(alexandria:with-gensyms (stream)
(alexandria:once-only (path reader)
`(when-let ((,stream (open ,path :direction :input ,@open-options)))
(unwind-protect
(do …)
(close ,stream))))))
Don't Loop
Finally we get to the meat of the macro:
(do ((,symbol
(funcall ,reader ,stream nil ',eof)
(funcall ,reader ,stream nil ',eof)))
((eq ,symbol ',eof))
,@body)
Unfortunately we need to use the tedious do
instead of loop
here to avoid an
annoying bug: if we expanded into a loop
call, and the user is calling this
from their own loop, and they use (loop-finish)
in the body code, then it
would finish our loop instead of their loop, which would very confusing.
Imagine the user wrote this very contrived example:
(defun find-the-cat (&rest paths)
(loop
:with result = nil
:for (path . remaining) :on paths
:for i :from 1
:do (do-file (line path)
(when (string= line "meow")
(setf result path)
(loop-finish))) ;; This should obviously go to the finally below.
:finally
(when result
(format t "Found cat after searching ~D files (did not search ~D other~:P)."
i (length remaining))
(return result))))
If do-file
expanded into a loop
form, then the (loop-finish)
would only
terminate that loop.
The same issue kind of applies with the implicit block named nil
around do
.
But this is much less surprising for a macro named do-…
, and we've documented
it in the docstring, so that's probably okay.
Repetition Allergies
Using do
here is a little annoying because the init form and the step form are
exactly the same. If you're allergic to repeating yourself you could use #n=
and #n#
reader macros to get around it:
(do ((,symbol #1=(funcall ,reader ,stream nil ',eof) #1#))
((eq ,symbol ',eof))
,@body)
I find this more confusing than helpful, but to each their own.
Result
We've got a nice little macro for easily iterating over files piece by piece.
It can take any reader function that conforms to the usual (read-foo stream
eof-error-p eof-value)
interface, which means we can write our own reader
functions that will compose nicely with the macro.
We'll end with an exercise for the reader: figure out how to support declarations correctly. For example:
(do-file (n "numbers.txt" :reader #'read-fixnum)
(declare (type fixnum n))
(when (primep n)
(collect (* n n))))
Hint: you'll need to deal with the sentinel value a bit differently so it doesn't contaminate the type of the bound variable.