cl-sites/gigamonkeys.com/book/practical-an-mp3-database.html

<HTML><HEAD><TITLE>Practical: An MP3 Database</TITLE><LINK REL="stylesheet" TYPE="text/css" HREF="style.css"/></HEAD><BODY><DIV CLASS="copyright">Copyright &copy; 2003-2005, Peter Seibel</DIV><H1>27. Practical: An MP3 Database</H1><P>In this chapter you'll revisit the idea first explored in Chapter 3 of
building an in-memory database out of basic Lisp data structures. This
time your goal is to hold information that you'll extract from a
collection of MP3 files using the ID3v2 library from Chapter 25.
You'll then use this database in Chapters 28 and 29 as part of a
Web-based streaming MP3 server. Of course, this time around you can
use some of the language features you've learned since Chapter 3 to
build a more sophisticated version.</P><A NAME="the-database"><H2>The Database</H2></A><P>The main problem with the database in Chapter 3 is that there's only
one table, the list stored in the variable <CODE>*db*</CODE>. Another is
that the code doesn't know anything about what type of values are
stored in different columns. In Chapter 3 you got away with that by
using the fairly general-purpose <CODE><B>EQUAL</B></CODE> method to compare column
values when selecting rows from the database, but you would've been
in trouble if you had wanted to store values that couldn't be
compared with <CODE><B>EQUAL</B></CODE> or if you had wanted to sort the rows in the
database since there's no ordering function that's as general as
<CODE><B>EQUAL</B></CODE>.</P><P>This time you'll solve both problems by defining a class,
<CODE>table</CODE>, to represent individual database tables. Each
<CODE>table</CODE> instance will consist of two slots--one to hold the
table's data and another to hold information about the columns in the
table that database operations will be able to use. The class looks
like this:</P><PRE>(defclass table ()
  ((rows   :accessor rows   :initarg :rows :initform (make-rows))
   (schema :accessor schema :initarg :schema)))</PRE><P>As in Chapter 3, you can represent the individual rows with plists,
but this time around you'll create an abstraction that will make that
an implementation detail you can change later without too much
trouble. And this time you'll store the rows in a vector rather than
a list since certain operations that you'll want to support, such as
random access to rows by a numeric index and the ability to sort a
table, can be more efficiently implemented with vectors.</P><P>The function <CODE>make-rows</CODE> used to initialize the <CODE>rows</CODE> slot
can be a simple wrapper around <CODE><B>MAKE-ARRAY</B></CODE> that builds an empty,
adjustable,vector with a fill pointer.</P><DIV CLASS="sidebarhead">The Package</DIV><DIV CLASS="sidebar"><P>The package for the code you'll develop in this chapter looks
like this:</P><PRE>(defpackage :com.gigamonkeys.mp3-database
  (:use :common-lisp 
        :com.gigamonkeys.pathnames
        :com.gigamonkeys.macro-utilities
        :com.gigamonkeys.id3v2)
  (:export  :*default-table-size*
            :*mp3-schema*
            :*mp3s*
            :column
            :column-value
            :delete-all-rows
            :delete-rows
            :do-rows
            :extract-schema
            :in
            :insert-row
            :load-database
            :make-column
            :make-schema
            :map-rows
            :matching
            :not-nullable
            :nth-row
            :random-selection
            :schema
            :select
            :shuffle-table
            :sort-rows
            :table
            :table-size
            :with-column-values))</PRE><P>The <CODE>:use</CODE> section gives you access to the functions and macros
whose names are exported from the packages defined in Chapter 15, 8,
and 25 and the <CODE>:export</CODE> section exports the API this library
will provide, which you'll use in Chapter 29.</P></DIV><PRE>(defparameter *default-table-size* 100)

(defun make-rows (&amp;optional (size *default-table-size*))
  (make-array size :adjustable t :fill-pointer 0))</PRE><P>To represent a table's schema, you need to define another class,
<CODE>column</CODE>, each instance of which will contain information about
one column in the table: its name, how to compare values in the column
for equality and ordering, a default value, and a function that will
be used to normalize the column's values when inserting data into the
table and when querying the table. The <CODE>schema</CODE> slot will hold a
list of <CODE>column</CODE> objects. The class definition looks like this:</P><PRE>(defclass column ()
  ((name               
    :reader name
    :initarg :name)

   (equality-predicate
    :reader equality-predicate
    :initarg :equality-predicate)

   (comparator
    :reader comparator
    :initarg :comparator)

   (default-value
    :reader default-value
    :initarg :default-value
    :initform nil)

   (value-normalizer
    :reader value-normalizer
    :initarg :value-normalizer
    :initform #'(lambda (v column) (declare (ignore column)) v))))</PRE><P>The <CODE>equality-predicate</CODE> and <CODE>comparator</CODE> slots of a
<CODE>column</CODE> object hold functions used to compare values from the
given column for equivalence and ordering. Thus, a column containing
string values might have <CODE><B>STRING=</B></CODE> as its
<CODE>equality-predicate</CODE> and <CODE><B>STRING&lt;</B></CODE> as its <CODE>comparator</CODE>,
while a column containing numbers might have <CODE><B>=</B></CODE> and <CODE><B>&lt;</B></CODE>.</P><P>The <CODE>default-value</CODE> and <CODE>value-normalizer</CODE> slots are used
when inserting rows into the database and, in the case of
<CODE>value-normalizer</CODE>, when querying the database. When you insert
a row into the database, if no value is provided for a particular
column, you can use the value stored in the <CODE>column</CODE>'s
<CODE>default-value</CODE> slot. Then the value--defaulted or otherwise--is
normalized by passing it and the column object to the function stored
in the <CODE>value-normalizer</CODE> slot. You pass the column in case the
<CODE>value-normalizer</CODE> function needs to use some data associated
with the column object. (You'll see an example of this in the next
section.) You should also normalize values passed in queries before
comparing them with values in the database.</P><P>Thus, the <CODE>value-normalizer</CODE>'s responsibility is primarily to
return a value that can be safely and correctly passed to the
<CODE>equality-predicate</CODE> and <CODE>comparator</CODE> functions. If the
<CODE>value-normalizer</CODE> can't figure out an appropriate value to
return, it can signal an error.</P><P>The other reason to normalize values before you store them in the
database is to save both memory and CPU cycles. For instance, if you
have a column that's going to contain string values but the number of
distinct strings that will be stored in the column is small--for
instance, the genre column in the MP3 database--you can save space
and speed by using the <CODE>value-normalizer</CODE> to <I>intern</I> the
strings (translate all <CODE><B>STRING=</B></CODE> values to a single string
object). Thus, you'll need only as many strings as there are distinct
values, regardless of how many rows are in the table, and you can use
<CODE><B>EQL</B></CODE> to compare column values rather than the slower
<CODE><B>STRING=</B></CODE>.<SUP>1</SUP></P><A NAME="defining-a-schema"><H2>Defining a Schema</H2></A><P>Thus, to make an instance of <CODE>table</CODE>, you need to build a list
of <CODE>column</CODE> objects. You could build the list by hand, using
<CODE><B>LIST</B></CODE> and <CODE><B>MAKE-INSTANCE</B></CODE>. But you'll soon notice that you're
frequently making a lot column objects with the same comparator and
equality-predicate combinations. This is because the combination of a
comparator and equality predicate essentially defines a column type.
It'd be nice if there was a way to give those types names that would
allow you to say simply that a given column is a string column,
rather than having to specify <CODE><B>STRING&lt;</B></CODE> as its comparator and
<CODE><B>STRING=</B></CODE> as its equality predicate. One way is to define a
generic function, <CODE>make-column</CODE>, like this:</P><PRE>(defgeneric make-column (name type &amp;optional default-value))</PRE><P>Now you can implement methods on this generic function that
specialize on <CODE>type</CODE> with <CODE><B>EQL</B></CODE> specializers and return
<CODE>column</CODE> objects with the slots filled in with appropriate
values. Here's the generic function and methods that define column
types for the type names <CODE>string</CODE> and <CODE>number</CODE>:</P><PRE>(defmethod make-column (name (type (eql 'string)) &amp;optional default-value)
  (make-instance
   'column 
   :name name
   :comparator #'string&lt; 
   :equality-predicate #'string=
   :default-value default-value
   :value-normalizer #'not-nullable))

(defmethod make-column (name (type (eql 'number)) &amp;optional default-value)
  (make-instance 
   'column
   :name name
   :comparator #'&lt; 
   :equality-predicate #'=
   :default-value default-value))</PRE><P>The following function, <CODE>not-nullable</CODE>, used as the
<CODE>value-normalizer</CODE> for <CODE>string</CODE> columns, simply returns the
value it's given unless the value is <CODE><B>NIL</B></CODE>, in which case it
signals an error:</P><PRE>(defun not-nullable (value column)
  (or value (error &quot;Column ~a can't be null&quot; (name column))))</PRE><P>This is important because <CODE><B>STRING&lt;</B></CODE> and <CODE><B>STRING=</B></CODE> will signal
an error if called on <CODE><B>NIL</B></CODE>; it's better to catch bad values
before they go into the table rather than when you try to use
them.<SUP>2</SUP></P><P>Another column type you'll need for the MP3 database is an
<CODE>interned-string</CODE> whose values are interned as discussed
previously. Since you need a hash table in which to intern values,
you should define a subclass of <CODE>column</CODE>,
<CODE>interned-values-column</CODE>, that adds a slot whose value is the
hash table you use to intern.</P><P>To implement the actual interning, you'll also need to provide an
<CODE>:initform</CODE> for <CODE>value-normalizer</CODE> of a function that
interns the value in the column's <CODE>interned-values</CODE> hash table.
And because one of the main reasons to intern values is to allow you
to use <CODE><B>EQL</B></CODE> as the equality predicate, you should also add an
<CODE>:initform</CODE> for the <CODE>equality-predicate</CODE> of <CODE>#'eql</CODE>.</P><PRE>(defclass interned-values-column (column)
  ((interned-values
    :reader interned-values
    :initform (make-hash-table :test #'equal))
   (equality-predicate :initform #'eql)
   (value-normalizer   :initform #'intern-for-column)))

(defun intern-for-column (value column)
  (let ((hash (interned-values column)))
    (or (gethash (not-nullable value column) hash)
        (setf (gethash value hash) value))))</PRE><P>You can then define a <CODE>make-column</CODE> method specialized on the
name <CODE>interned-string</CODE> that returns an instance of
<CODE>interned-values-column</CODE>.</P><PRE>(defmethod make-column (name (type (eql 'interned-string)) &amp;optional default-value)
  (make-instance 
   'interned-values-column
   :name name
   :comparator #'string&lt; 
   :default-value default-value))</PRE><P>With these methods defined on <CODE>make-column</CODE>, you can now define
a function, <CODE>make-schema</CODE>, that builds a list of <CODE>column</CODE>
objects from a list of column specifications consisting of a column
name, a column type name, and, optionally, a default value.</P><PRE>(defun make-schema (spec)
  (mapcar #'(lambda (column-spec) (apply #'make-column column-spec)) spec))</PRE><P>For instance, you can define the schema for the table you'll use to
store data extracted from MP3s like this:</P><PRE>(defparameter *mp3-schema* 
  (make-schema 
   '((:file     string)
     (:genre    interned-string &quot;Unknown&quot;)
     (:artist   interned-string &quot;Unknown&quot;)
     (:album    interned-string &quot;Unknown&quot;)
     (:song     string)
     (:track    number 0)
     (:year     number 0)
     (:id3-size number))))</PRE><P>To make an actual table for holding information about MP3s, you pass
<CODE>*mp3-schema*</CODE> as the <CODE>:schema</CODE> initarg to
<CODE><B>MAKE-INSTANCE</B></CODE>.</P><PRE>(defparameter *mp3s* (make-instance 'table :schema *mp3-schema*))</PRE><A NAME="inserting-values"><H2>Inserting Values</H2></A><P>Now you're ready to define your first table operation,
<CODE>insert-row</CODE>, which takes a plist of names and values and a table
and adds a row to the table containing the given values. The bulk of
the work is done in a helper function, <CODE>normalize-row</CODE>, that
builds a plist with a defaulted, normalized value for each column,
using the values from <CODE>names-and-values</CODE> if available and the
<CODE>default-value</CODE> for the column if not.</P><PRE>(defun insert-row (names-and-values table)
  (vector-push-extend (normalize-row names-and-values (schema table)) (rows table)))

(defun normalize-row (names-and-values schema)
  (loop
     for column in schema
     for name  = (name column)
     for value = (or (getf names-and-values name) (default-value column))
     collect name
     collect (normalize-for-column value column)))</PRE><P>It's worth defining a separate helper function,
<CODE>normalize-for-column</CODE>, that takes a value and a <CODE>column</CODE>
object and returns the normalized value because you'll need to perform
the same normalization on query arguments.</P><PRE>(defun normalize-for-column (value column)
  (funcall (value-normalizer column) value column))</PRE><P>Now you're ready to combine this database code with code from
previous chapters to build a database of data extracted from MP3
files. You can define a function, <CODE>file-&gt;row</CODE>, that uses
<CODE>read-id3</CODE> from the ID3v2 library to extract an ID3 tag from a
file and turns it into a plist that you can pass to
<CODE>insert-row</CODE>.</P><PRE>(defun file-&gt;row (file)
  (let ((id3 (read-id3 file)))
    (list
     :file   (namestring (truename file))
     :genre  (translated-genre id3)
     :artist (artist id3)
     :album  (album id3)
     :song   (song id3)
     :track  (parse-track (track id3))
     :year   (parse-year (year id3))
     :id3-size (size id3))))</PRE><P>You don't have to worry about normalizing the values since
<CODE>insert-row</CODE> takes care of that for you. You do, however, have
to convert the string values returned by the <CODE>track</CODE> and
<CODE>year</CODE> into numbers. The track number in an ID3 tag is sometimes
stored as the ASCII representation of the track number and sometimes
as a number followed by a slash followed by the total number of
tracks on the album. Since you care only about the actual track
number, you should use the <CODE>:end</CODE> argument to <CODE><B>PARSE-INTEGER</B></CODE>
to specify that it should parse only up to the slash, if any.<SUP>3</SUP></P><PRE>(defun parse-track (track)
  (when track (parse-integer track :end (position #\/ track))))

(defun parse-year (year)
  (when year (parse-integer year)))</PRE><P>Finally, you can put all these functions together, along with
<CODE>walk-directory</CODE> from the portable pathnames library and
<CODE>mp3-p</CODE> from the ID3v2 library, to define a function that loads
an MP3 database with data extracted from all the MP3 files it can
find under a given directory.</P><PRE>(defun load-database (dir db)
  (let ((count 0))
    (walk-directory 
     dir 
     #'(lambda (file)
         (princ #\.)
         (incf count)
         (insert-row (file-&gt;row file) db))
     :test #'mp3-p)
    (format t &quot;~&amp;Loaded ~d files into database.&quot; count)))</PRE><A NAME="querying-the-database"><H2>Querying the Database</H2></A><P>Once you've loaded your database with data, you'll need a way to
query it. For the MP3 application you'll need a slightly more
sophisticated query function than you wrote in Chapter 3. This time
around you want not only to be able to select rows matching
particular criteria but also to limit the results to particular
columns, to limit the results to unique rows, and perhaps to sort the
rows by particular columns. In keeping with the spirit of relational
database theory, the result of a query will be a new <CODE>table</CODE>
object containing the desired rows and columns.</P><P>The query function you'll write, <CODE>select</CODE>, is loosely modeled on
the <CODE>SELECT</CODE> statement from Structured Query Language (SQL).
It'll take five keyword parameters: <CODE>:from</CODE>, <CODE>:columns</CODE>,
<CODE>:where</CODE>, <CODE>:distinct</CODE>, and <CODE>:order-by</CODE>. The
<CODE>:from</CODE> argument is the <CODE>table</CODE> object you want to query.
The <CODE>:columns</CODE> argument specifies which columns should be
included in the result. The value should be a list of column names, a
single column name, or a <CODE><B>T</B></CODE>, the default, meaning return all
columns. The <CODE>:where</CODE> argument, if provided, should be a
function that accepts a row and returns true if it should be included
in the results. In a moment, you'll write two functions,
<CODE>matching</CODE> and <CODE>in</CODE>, that return functions appropriate for
use as <CODE>:where</CODE> arguments. The <CODE>:order-by</CODE> argument, if
supplied, should be a list of column names; the results will be
sorted by the named columns. As with the <CODE>:columns</CODE> argument,
you can specify a single column using just the name, which is
equivalent to a one-item list containing the same name. Finally, the
<CODE>:distinct</CODE> argument is a boolean that says whether to eliminate
duplicate rows from the results. The default value for
<CODE>:distinct</CODE> is <CODE><B>NIL</B></CODE>.</P><P>Here are some examples of using <CODE>select</CODE>:</P><PRE>;; Select all rows where the :artist column is &quot;Green Day&quot;
(select :from *mp3s* :where (matching *mp3s* :artist &quot;Green Day&quot;))

;; Select a sorted list of artists with songs in the genre &quot;Rock&quot;
(select
  :columns :artist
  :from *mp3s*
  :where (matching *mp3s* :genre &quot;Rock&quot;)
  :distinct t
  :order-by :artist)</PRE><P>The implementation of <CODE>select</CODE> with its immediate helper
functions looks like this:</P><PRE>(defun select (&amp;key (columns t) from where distinct order-by)
  (let ((rows (rows from))
        (schema (schema from)))

    (when where
      (setf rows (restrict-rows rows where)))

    (unless (eql columns 't)
      (setf schema (extract-schema (mklist columns) schema))
      (setf rows (project-columns rows schema)))

    (when distinct
      (setf rows (distinct-rows rows schema)))

    (when order-by
      (setf rows (sorted-rows rows schema (mklist order-by))))

    (make-instance 'table :rows rows :schema schema)))

(defun mklist (thing)
  (if (listp thing) thing (list thing)))

(defun extract-schema (column-names schema)
  (loop for c in column-names collect (find-column c schema)))

(defun find-column (column-name schema)
  (or (find column-name schema :key #'name)
      (error &quot;No column: ~a in schema: ~a&quot; column-name schema)))

(defun restrict-rows (rows where)
  (remove-if-not where rows))

(defun project-columns (rows schema)
  (map 'vector (extractor schema) rows))

(defun distinct-rows (rows schema)
  (remove-duplicates rows :test (row-equality-tester schema)))

(defun sorted-rows (rows schema order-by)
  (sort (copy-seq rows) (row-comparator order-by schema)))</PRE><P>Of course, the really interesting part of <CODE>select</CODE> is how you
implement the functions <CODE>extractor</CODE>, <CODE>row-equality-tester</CODE>,
and <CODE>row-comparator</CODE>.</P><P>As you can tell by how they're used, each of these functions must
return a function. For instance, <CODE>project-columns</CODE> uses the value
returned by <CODE>extractor</CODE> as the function argument to <CODE><B>MAP</B></CODE>.
Since the purpose of <CODE>project-columns</CODE> is to return a set of rows
with only certain column values, you can infer that <CODE>extractor</CODE>
returns a function that takes a row as an argument and returns a new
row containing only the columns specified in the schema it's passed.
Here's how you can implement it:</P><PRE>(defun extractor (schema)
  (let ((names (mapcar #'name schema)))
    #'(lambda (row)
        (loop for c in names collect c collect (getf row c)))))</PRE><P>Note how you can do the work of extracting the names from the schema
outside the body of the closure: since the closure will be called
many times, you want it to do as little work as possible each time
it's called.</P><P>The functions <CODE>row-equality-tester</CODE> and <CODE>row-comparator</CODE> are
implemented in a similar way. To decide whether two rows are
equivalent, you need to apply the appropriate equality predicate for
each column to the appropriate column values. Recall from Chapter 22
that the <CODE><B>LOOP</B></CODE> clause <CODE>always</CODE> will return <CODE><B>NIL</B></CODE> as soon
as a pair of values fails their test or will cause the <CODE><B>LOOP</B></CODE> to
return <CODE><B>T</B></CODE>.</P><PRE>(defun row-equality-tester (schema)
  (let ((names (mapcar #'name schema))
        (tests (mapcar #'equality-predicate schema)))
    #'(lambda (a b)
        (loop for name in names and test in tests
           always (funcall test (getf a name) (getf b name))))))</PRE><P>Ordering two rows is a bit more complex. In Lisp, comparator functions
return true if their first argument should be sorted ahead of the
second and <CODE><B>NIL</B></CODE> otherwise. Thus, a <CODE><B>NIL</B></CODE> can mean that the
second argument should be sorted ahead of the first <I>or</I> that they're equivalent. You want your row comparators to behave the same way:
return <CODE><B>T</B></CODE> if the first row should be sorted ahead of the second
and <CODE><B>NIL</B></CODE> otherwise.</P><P>Thus, to compare two rows, you should compare the values from the
columns you're sorting by, in order, using the appropriate comparator
for each column. First call the comparator with the value from the
first row as the first argument. If the comparator returns true, that
means the first row should definitely be sorted ahead of the second
row, so you can immediately return <CODE><B>T</B></CODE>.</P><P>But if the column comparator returns <CODE><B>NIL</B></CODE>, then you need to
determine whether that's because the second value should sort ahead
of the first value or because they're equivalent. So you should call
the comparator again with the arguments reversed. If the comparator
returns true this time, it means the second column value sorts ahead
of the first and thus the second row ahead of the first row, so you
can return <CODE><B>NIL</B></CODE> immediately. Otherwise, the column values are
equivalent, and you need to move onto the next column. If you get
through all the columns without one row's value ever winning the
comparison, then the rows are equivalent, and you return <CODE><B>NIL</B></CODE>. A
function that implements this algorithm looks like this:</P><PRE>(defun row-comparator (column-names schema)
  (let ((comparators (mapcar #'comparator (extract-schema column-names schema))))
    #'(lambda (a b)
        (loop
           for name in column-names
           for comparator in comparators
           for a-value = (getf a name)
           for b-value = (getf b name)
           when (funcall comparator a-value b-value) return t
           when (funcall comparator b-value a-value) return nil
           finally (return nil)))))</PRE><A NAME="matching-functions"><H2>Matching Functions</H2></A><P>The <CODE>:where</CODE> argument to <CODE>select</CODE> can be any function that
takes a row object and returns true if it should be included in the
results. In practice, however, you'll rarely need the full power of
arbitrary code to express query criteria. So you should provide two
functions, <CODE>matching</CODE> and <CODE>in</CODE>, that will build query
functions that allow you to express the common kinds of queries and
that take care of using the proper equality predicates and value
normalizers for each column.</P><P>The workhouse query-function constructor will be <CODE>matching</CODE>,
which returns a function that will match rows with specific column
values. You saw how it was used in the earlier examples of
<CODE>select</CODE>. For instance, this call to <CODE>matching</CODE>:</P><PRE>(matching *mp3s* :artist &quot;Green Day&quot;)</PRE><P>returns a function that matches rows whose <CODE>:artist</CODE> value is
&quot;Green Day&quot;. You can also pass multiple names and values; the
returned function matches when all the columns match. For example,
the following returns a closure that matches rows where the artist is
&quot;Green Day&quot; and the album is &quot;American Idiot&quot;:</P><PRE>(matching *mp3s* :artist &quot;Green Day&quot; :album &quot;American Idiot&quot;)</PRE><P>You have to pass <CODE>matching</CODE> the table object because it needs
access to the table's schema in order to get at the equality
predicates and value normalizer functions for the columns it matches
against.</P><P>You build up the function returned by <CODE>matching</CODE> out of smaller
functions, each responsible for matching one column's value. To build
these functions, you should define a function, <CODE>column-matcher</CODE>,
that takes a <CODE>column</CODE> object and an unnormalized value you want
to match and returns a function that accepts a single row and returns
true when the value of the given column in the row matches the
normalized version of the given value.</P><PRE>(defun column-matcher (column value)
  (let ((name (name column))
        (predicate (equality-predicate column))
        (normalized (normalize-for-column value column)))
    #'(lambda (row) (funcall predicate (getf row name) normalized))))</PRE><P>You then build a list of column-matching functions for the names and
values you care about with the following function,
<CODE>column-matchers</CODE>:</P><PRE>(defun column-matchers (schema names-and-values)
  (loop for (name value) on names-and-values by #'cddr
     when value collect
       (column-matcher (find-column name schema) value)))</PRE><P>Now you can implement <CODE>matching</CODE>. Again, note that you do as
much work as possible outside the closure in order to do it only once
rather than once per row in the table.</P><PRE>(defun matching (table &amp;rest names-and-values)
  &quot;Build a where function that matches rows with the given column values.&quot;
  (let ((matchers (column-matchers (schema table) names-and-values)))
    #'(lambda (row)
        (every #'(lambda (matcher) (funcall matcher row)) matchers))))</PRE><P>This function is a bit of a twisty maze of closures, but it's worth
contemplating for a moment to get a flavor of the possibilities of
programming with functions as first-class objects.</P><P>The job of <CODE>matching</CODE> is to return a function that will be
invoked on each row in a table to determine whether it should be
included in the new table. So, <CODE>matching</CODE> returns a closure with
one parameter, <CODE>row</CODE>.</P><P>Now recall that the function <CODE><B>EVERY</B></CODE> takes a predicate function as
its first argument and returns true if, and only if, that function
returns true each time it's applied to an element of the list passed
as <CODE><B>EVERY</B></CODE>'s second argument. However, in this case, the list you
pass to <CODE><B>EVERY</B></CODE> is itself a list of functions, the column
matchers. What you want to know is that every column matcher, when
invoked on the row you're currently testing, returns true. So, as the
predicate argument to <CODE><B>EVERY</B></CODE>, you pass yet another closure that
<CODE><B>FUNCALL</B></CODE>s the column matcher, passing it the row.</P><P>Another matching function that you'll occasionally find useful is
<CODE>in</CODE>, which returns a function that matches rows where a
particular column is in a given set of values. You'll define
<CODE>in</CODE> to take two arguments: a column name and a table that
contains the values you want to match. For instance, suppose you
wanted to find all the songs in the MP3 database that have names the
same as a song performed by the Dixie Chicks. You can write that
where clause using <CODE>in</CODE> and a sub<CODE>select</CODE> like
this:<SUP>4</SUP></P><PRE>(select
  :columns '(:artist :song)
  :from *mp3s*
  :where (in :song 
             (select
               :columns :song
               :from *mp3s*
               :where (matching *mp3s* :artist &quot;Dixie Chicks&quot;))))</PRE><P>Although the queries are more complex, the definition of <CODE>in</CODE> is
much simpler than that of <CODE>matching</CODE>.</P><PRE>(defun in (column-name table)
  (let ((test (equality-predicate (find-column column-name (schema table))))
        (values (map 'list #'(lambda (r) (getf r column-name)) (rows table))))
    #'(lambda (row)
        (member (getf row column-name) values :test test))))</PRE><A NAME="getting-at-the-results"><H2>Getting at the Results</H2></A><P>Since <CODE>select</CODE> returns another <CODE>table</CODE>, you need to think a
bit about how you want to get at the individual row and column values
in a table. If you're sure you'll never want to change the way you
represent the data in a table, you can just make the structure of a
table part of the API--that <CODE>table</CODE> has a slot <CODE>rows</CODE>
that's a vector of plists--and use all the normal Common Lisp
functions for manipulating vectors and plists to get at the values in
the table. But that representation is really an internal detail that
you might want to change. Also, you don't necessarily want other code
manipulating the data structures directly--for instance, you don't
want anyone to use <CODE><B>SETF</B></CODE> to put an unnormalized column value into
a row. So it might be a good idea to define a few abstractions that
provide the operations you want to support. Then if you decide to
change the internal representation later, you'll need to change only
the implementation of these functions and macros. And while Common
Lisp doesn't enable you to absolutely prevent folks from getting at
&quot;internal&quot; data, by providing an official API you at least make it
clear where the boundary is.</P><P>Probably the most common thing you'll need to do with the results of a
query is to iterate over the individual rows and extract specific
column values. So you need to provide a way to do both those things
without touching the <CODE>rows</CODE> vector directly or using <CODE><B>GETF</B></CODE>
to get at the column values within a row.</P><P>For now these operations are trivial to implement; they're merely
wrappers around the code you'd write if you didn't have these
abstractions. You can provide two ways to iterate over the rows of a
table: a macro <CODE>do-rows</CODE>, which provides a basic looping
construct, and a function <CODE>map-rows</CODE>, which builds a list
containing the results of applying a function to each row in the
table.<SUP>5</SUP></P><PRE>(defmacro do-rows ((row table) &amp;body body)
  `(loop for ,row across (rows ,table) do ,@body))

(defun map-rows (fn table)
  (loop for row across (rows table) collect (funcall fn row)))</PRE><P>To get at individual column values within a row, you should provide a
function, <CODE>column-value</CODE>, that takes a row and a column name and
returns the appropriate value. Again, it's a trivial wrapper around
the code you'd write otherwise. But if you change the internal
representation of a table later, users of <CODE>column-value</CODE> needn't
be any the wiser.</P><PRE>(defun column-value (row column-name)
  (getf row column-name))</PRE><P>While <CODE>column-value</CODE> is a sufficient abstraction for getting at
column values, you'll often want to get at the values of multiple
columns at once. So you can provide a bit of syntactic sugar, a
macro, <CODE>with-column-values</CODE>, that binds a set of variables to
the values extracted from a row using the corresponding keyword
names. Thus, instead of writing this:</P><PRE> (do-rows (row table)
   (let ((song (column-value row :song))
         (artist (column-value row :artist))
         (album (column-value row :album)))
     (format t &quot;~a by ~a from ~a~%&quot; song artist album)))</PRE><P>you can simply write the following:</P><PRE>(do-rows (row table)
  (with-column-values (song artist album) row
    (format t &quot;~a by ~a from ~a~%&quot; song artist album)))</PRE><P>Again, the actual implementation isn't complicated if you use the
<CODE>once-only</CODE> macro from Chapter 8.</P><PRE>(defmacro with-column-values ((&amp;rest vars) row &amp;body body)
  (once-only (row)
    `(let ,(column-bindings vars row) ,@body)))

(defun column-bindings (vars row)
  (loop for v in vars collect `(,v (column-value ,row ,(as-keyword v)))))

(defun as-keyword (symbol)
  (intern (symbol-name symbol) :keyword))</PRE><P>Finally, you should provide abstractions for getting at the number of
rows in a table and for accessing a specific row by numeric index.</P><PRE>(defun table-size (table)
  (length (rows table)))

(defun nth-row (n table)
  (aref (rows table) n))</PRE><A NAME="other-database-operations"><H2>Other Database Operations</H2></A><P>Finally, you'll implement a few other database operations that you'll
need in Chapter 29. The first two are analogs of the SQL
<CODE>DELETE</CODE> statement. The function <CODE>delete-rows</CODE> is used to
delete rows from a table that match particular criteria. Like
<CODE>select</CODE>, it takes <CODE>:from</CODE> and <CODE>:where</CODE> keyword
arguments. Unlike <CODE>select</CODE>, it doesn't return a new table--it
actually modifies the table passed as the <CODE>:from</CODE> argument.</P><PRE>(defun delete-rows (&amp;key from where)
  (loop
     with rows = (rows from)
     with store-idx = 0
     for read-idx from 0
     for row across rows
     do (setf (aref rows read-idx) nil)
     unless (funcall where row) do
       (setf (aref rows store-idx) row)
       (incf store-idx)
     finally (setf (fill-pointer rows) store-idx)))</PRE><P>In the interest of efficiency, you might want to provide a separate
function for deleting all the rows from a table.</P><PRE>(defun delete-all-rows (table)
  (setf (rows table) (make-rows *default-table-size*)))</PRE><P>The remaining table operations don't really map to normal relational
database operations but will be useful in the MP3 browser
application. The first is a function to sort the rows of a table in
place.</P><PRE>(defun sort-rows (table &amp;rest column-names)
  (setf (rows table) (sort (rows table) (row-comparator column-names (schema table))))
  table)</PRE><P>On the flip side, in the MP3 browser application, you'll need a
function that shuffles a table's rows in place using the function
<CODE>nshuffle-vector</CODE> from Chapter 23.</P><PRE>(defun shuffle-table (table)
  (nshuffle-vector (rows table))
  table)</PRE><P>And finally, again for the purposes of the MP3 browser, you should
provide a function that selects <I>n</I> random rows, returning the
results as a new table. It also uses <CODE>nshuffle-vector</CODE> along
with a version of <CODE>random-sample</CODE> based on Algorithm S from
Donald Knuth's <I>The Art of Computer Programming, Volume 2:
Seminumerical Algorithms</I>, Third Edition (Addison-Wesley, 1998) that
I discussed in Chapter 20.</P><PRE>(defun random-selection (table n)
  (make-instance
   'table
   :schema (schema table)
   :rows (nshuffle-vector (random-sample (rows table) n))))

(defun random-sample (vector n)
  &quot;Based on Algorithm S from Knuth. TAOCP, vol. 2. p. 142&quot;
  (loop with selected = (make-array n :fill-pointer 0)
     for idx from 0
     do
       (loop
          with to-select = (- n (length selected))
          for remaining = (- (length vector) idx)
          while (&gt;= (* remaining (random 1.0)) to-select)
          do (incf idx))
       (vector-push (aref vector idx) selected)
     when (= (length selected) n) return selected))</PRE><P>With this code you'll be ready, in Chapter 29, to build a Web
interface for browsing a collection of MP3 files. But before you get
to that, you need to implement the part of the server that streams
MP3s using the Shoutcast protocol, which is the topic of the next
chapter.
</P><HR/><DIV CLASS="notes"><P><SUP>1</SUP>The general theory behind interning objects is
that if you're going to compare a particular value many times, it's
worth it to pay the cost of interning it. The <CODE>value-normalizer</CODE>
runs once when you insert a value into the table and, as you'll see,
once at the beginning of each query. Since a query can involve
invoking the <CODE>equality-predicate</CODE> once per row in the table, the
amortized cost of interning the values will quickly approach zero.
</P><P><SUP>2</SUP>As always, the first causality of concise exposition in
programming books is proper error handling; in production code you'd
probably want to define your own error type, such as the following,
and signal it instead:</P><PRE>(error 'illegal-column-value :value value :column column)</PRE><P>Then you'd want to think about where you can add restarts that might
be able to recover from this condition. And, finally, in any given
application you could establish condition handlers that would choose
from among those restarts.</P><P><SUP>3</SUP>If
any MP3 files have malformed data in the track and year frames,
<CODE><B>PARSE-INTEGER</B></CODE> could signal an error. One way to deal with that
is to pass <CODE><B>PARSE-INTEGER</B></CODE> the <CODE>:junk-allowed</CODE> argument of
<CODE><B>T</B></CODE>, which will cause it to ignore any non-numeric junk following
the number and to return <CODE><B>NIL</B></CODE> if no number can be found in the
string. Or, if you want practice at using the condition system, you
could define an error and signal it from these functions when the
data is malformed and also establish a few restarts to allow these
functions to recover.
</P><P><SUP>4</SUP>This query will also return all the songs performed by the
Dixie Chicks. If you want to limit it to songs by artists other than
the Dixie Chicks, you need a more complex <CODE>:where</CODE> function.
Since the <CODE>:where</CODE> argument can be any function, it's certainly
possible; you could remove the Dixie Chicks' own songs with this
query:</P><PRE>(let* ((dixie-chicks (matching *mp3s* :artist &quot;Dixie Chicks&quot;))
       (same-song (in :song (select :columns :song :from *mp3s* :where dixie-chicks)))
       (query #'(lambda (row) (and (not (funcall dixie-chicks row)) (funcall same-song row)))))
  (select :columns '(:artist :song) :from *mp3s* :where query))</PRE><P>This obviously isn't quite as convenient. If you were going to write
an application that needed to do lots of complex queries, you might
want to consider coming up with a more expressive query language.</P><P><SUP>5</SUP>The version of <CODE><B>LOOP</B></CODE> implemented at M.I.T. before
Common Lisp was standardized included a mechanism for extending the
<CODE><B>LOOP</B></CODE> grammar to support iteration over new data structures. Some
Common Lisp implementations that inherited their <CODE><B>LOOP</B></CODE>
implementation from that code base may still support that facility,
which would make <CODE>do-rows</CODE> and <CODE>map-rows</CODE> less necessary.
</P></DIV></BODY></HTML>