967 lines
69 KiB
HTML
967 lines
69 KiB
HTML
|
<HTML><HEAD><TITLE>Practical: Parsing Binary Files</TITLE><LINK REL="stylesheet" TYPE="text/css" HREF="style.css"/></HEAD><BODY><DIV CLASS="copyright">Copyright © 2003-2005, Peter Seibel</DIV><H1>24. Practical: Parsing Binary Files</H1><P>In this chapter I'll show you how to build a library that you can use
|
||
|
to write code for reading and writing binary files. You'll use this
|
||
|
library in Chapter 25 to write a parser for ID3 tags, the mechanism
|
||
|
used to store metadata such as artist and album names in MP3 files.
|
||
|
This library is also an example of how to use macros to extend the
|
||
|
language with new constructs, turning it into a special-purpose
|
||
|
language for solving a particular problem, in this case reading and
|
||
|
writing binary data. Because you'll develop the library a bit at a
|
||
|
time, including several partial versions, it may seem you're writing a
|
||
|
lot of code. But when all is said and done, the whole library is fewer
|
||
|
than 150 lines of code, and the longest macro is only 20 lines long.</P><A NAME="binary-files"><H2>Binary Files</H2></A><P>At a sufficiently low level of abstraction, all files are "binary" in
|
||
|
the sense that they just contain a bunch of numbers encoded in binary
|
||
|
form. However, it's customary to distinguish between <I>text files</I>,
|
||
|
where all the numbers can be interpreted as characters representing
|
||
|
human-readable text, and <I>binary files</I>, which contain data that,
|
||
|
if interpreted as characters, yields nonprintable characters.<SUP>1</SUP></P><P>Binary file formats are usually designed to be both compact and
|
||
|
efficient to parse--that's their main advantage over text-based
|
||
|
formats. To meet both those criteria, they're usually composed of
|
||
|
on-disk structures that are easily mapped to data structures that a
|
||
|
program might use to represent the same data in memory.<SUP>2</SUP></P><P>The library will give you an easy way to define the mapping between
|
||
|
the on-disk structures defined by a binary file format and in-memory
|
||
|
Lisp objects. Using the library, it should be easy to write a program
|
||
|
that can read a binary file, translating it into Lisp objects that
|
||
|
you can manipulate, and then write back out to another properly
|
||
|
formatted binary file.</P><A NAME="binary-format-basics"><H2>Binary Format Basics</H2></A><P>The starting point for reading and writing binary files is to open the
|
||
|
file for reading or writing individual bytes. As I discussed in
|
||
|
Chapter 14, both <CODE><B>OPEN</B></CODE> and <CODE><B>WITH-OPEN-FILE</B></CODE> accept a keyword
|
||
|
argument, <CODE>:element-type</CODE>, that controls the basic unit of
|
||
|
transfer for the stream. When you're dealing with binary files,
|
||
|
you'll specify <CODE>(unsigned-byte 8)</CODE>. An input stream opened with
|
||
|
such an <CODE>:element-type</CODE> will return an integer between 0 and 255
|
||
|
each time it's passed to <CODE><B>READ-BYTE</B></CODE>. Conversely, you can write
|
||
|
bytes to an <CODE>(unsigned-byte 8)</CODE> output stream by passing numbers
|
||
|
between 0 and 255 to <CODE><B>WRITE-BYTE</B></CODE>.</P><P>Above the level of individual bytes, most binary formats use a
|
||
|
smallish number of primitive data types--numbers encoded in various
|
||
|
ways, textual strings, bit fields, and so on--which are then composed
|
||
|
into more complex structures. So your first task is to define a
|
||
|
framework for writing code to read and write the primitive data types
|
||
|
used by a given binary format.</P><P>To take a simple example, suppose you're dealing with a binary format
|
||
|
that uses an unsigned 16-bit integer as a primitive data type. To
|
||
|
read such an integer, you need to read the two bytes and then combine
|
||
|
them into a single number by multiplying one byte by 256, a.k.a. 2^8,
|
||
|
and adding it to the other byte. For instance, assuming the binary
|
||
|
format specifies that such 16-bit quantities are stored in
|
||
|
<I>big-endian</I><SUP>3</SUP> form,
|
||
|
with the most significant byte first, you can read such a number with
|
||
|
this function:</P><PRE>(defun read-u2 (in)
|
||
|
(+ (* (read-byte in) 256) (read-byte in)))</PRE><P>However, Common Lisp provides a more convenient way to perform this
|
||
|
kind of bit twiddling. The function <CODE><B>LDB</B></CODE>, whose name stands for
|
||
|
load byte, can be used to extract and set (with <CODE><B>SETF</B></CODE>) any number
|
||
|
of contiguous bits from an integer.<SUP>4</SUP> The number of bits and their position within the
|
||
|
integer is specified with a <I>byte specifier</I> created with the
|
||
|
<CODE><B>BYTE</B></CODE> function. <CODE><B>BYTE</B></CODE> takes two arguments, the number of bits
|
||
|
to extract (or set) and the position of the rightmost bit where the
|
||
|
least significant bit is at position zero. <CODE><B>LDB</B></CODE> takes a byte
|
||
|
specifier and the integer from which to extract the bits and returns
|
||
|
the positive integer represented by the extracted bits. Thus, you can
|
||
|
extract the least significant octet of an integer like this:</P><PRE>(ldb (byte 8 0) #xabcd) ==> 205 ; 205 is #xcd</PRE><P>To get the next octet, you'd use a byte specifier of <CODE>(byte 8
|
||
|
8)</CODE> like this:</P><PRE>(ldb (byte 8 8) #xabcd) ==> 171 ; 171 is #xab</PRE><P>You can use <CODE><B>LDB</B></CODE> with <CODE><B>SETF</B></CODE> to set the specified bits of an
|
||
|
integer stored in a <CODE><B>SETF</B></CODE>able place.</P><PRE>CL-USER> (defvar *num* 0)
|
||
|
*NUM*
|
||
|
CL-USER> (setf (ldb (byte 8 0) *num*) 128)
|
||
|
128
|
||
|
CL-USER> *num*
|
||
|
128
|
||
|
CL-USER> (setf (ldb (byte 8 8) *num*) 255)
|
||
|
255
|
||
|
CL-USER> *num*
|
||
|
65408</PRE><P>Thus, you can also write <CODE>read-u2</CODE> like this:<SUP>5</SUP></P><PRE>(defun read-u2 (in)
|
||
|
(let ((u2 0))
|
||
|
(setf (ldb (byte 8 8) u2) (read-byte in))
|
||
|
(setf (ldb (byte 8 0) u2) (read-byte in))
|
||
|
u2))</PRE><P>To write a number out as a 16-bit integer, you need to extract the
|
||
|
individual 8-bit bytes and write them one at a time. To extract the
|
||
|
individual bytes, you just need to use <CODE><B>LDB</B></CODE> with the same byte
|
||
|
specifiers.</P><PRE>(defun write-u2 (out value)
|
||
|
(write-byte (ldb (byte 8 8) value) out)
|
||
|
(write-byte (ldb (byte 8 0) value) out))</PRE><P>Of course, you can also encode integers in many other ways--with
|
||
|
different numbers of bytes, with different endianness, and in signed
|
||
|
and unsigned format.</P><A NAME="strings-in-binary-files"><H2>Strings in Binary Files</H2></A><P>Textual strings are another kind of primitive data type you'll find
|
||
|
in many binary formats. When you read files one byte at a time, you
|
||
|
can't read and write strings directly--you need to decode and encode
|
||
|
them one byte at a time, just as you do with binary-encoded numbers.
|
||
|
And just as you can encode an integer in several ways, you can encode
|
||
|
a string in many ways. To start with, the binary format must specify
|
||
|
how individual characters are encoded.</P><P>To translate bytes to characters, you need to know both what
|
||
|
character <I>code</I> and what character <I>encoding</I> you're using. A
|
||
|
character code defines a mapping from positive integers to
|
||
|
characters. Each number in the mapping is called a <I>code point</I>.
|
||
|
For instance, ASCII is a character code that maps the numbers from
|
||
|
0-127 to particular characters used in the Latin alphabet. A
|
||
|
character encoding, on the other hand, defines how the code points
|
||
|
are represented as a sequence of bytes in a byte-oriented medium such
|
||
|
as a file. For codes that use eight or fewer bits, such as ASCII and
|
||
|
ISO-8859-1, the encoding is trivial--each numeric value is encoded as
|
||
|
a single byte.</P><P>Nearly as straightforward are pure double-byte encodings, such as
|
||
|
UCS-2, which map between 16-bit values and characters. The only
|
||
|
reason double-byte encodings can be more complex than single-byte
|
||
|
encodings is that you may also need to know whether the 16-bit values
|
||
|
are supposed to be encoded in big-endian or little-endian format.</P><P>Variable-width encodings use different numbers of octets for
|
||
|
different numeric values, making them more complex but allowing them
|
||
|
to be more compact in many cases. For instance, UTF-8, an encoding
|
||
|
designed for use with the Unicode character code, uses a single octet
|
||
|
to encode the values 0-127 while using up to four octets to encode
|
||
|
values up to 1,114,111.<SUP>6</SUP></P><P>Since the code points from 0-127 map to the same characters in
|
||
|
Unicode as they do in ASCII, a UTF-8 encoding of text consisting only
|
||
|
of characters also in ASCII is the same as the ASCII encoding. On the
|
||
|
other hand, texts consisting mostly of characters requiring four
|
||
|
bytes in UTF-8 could be more compactly encoded in a straight
|
||
|
double-byte encoding.</P><P>Common Lisp provides two functions for translating between numeric
|
||
|
character codes and character objects: <CODE><B>CODE-CHAR</B></CODE>, which takes an
|
||
|
numeric code and returns as a character, and <CODE><B>CHAR-CODE</B></CODE>, which
|
||
|
takes a character and returns its numeric code. The language standard
|
||
|
doesn't specify what character encoding an implementation must use,
|
||
|
so there's no guarantee you can represent every character that can
|
||
|
possibly be encoded in a given file format as a Lisp character.
|
||
|
However, almost all contemporary Common Lisp implementations use
|
||
|
ASCII, ISO-8859-1, or Unicode as their native character code. Because
|
||
|
Unicode is a superset ofISO-8859-1, which is in turn a superset of
|
||
|
ASCII, if you're using a Unicode Lisp, <CODE><B>CODE-CHAR</B></CODE> and
|
||
|
<CODE><B>CHAR-CODE</B></CODE> can be used directly for translating any of those
|
||
|
three character codes.<SUP>7</SUP></P><P>In addition to specifying a character encoding, a string encoding
|
||
|
must also specify how to encode the length of the string. Three
|
||
|
techniques are typically used in binary file formats.</P><P>The simplest is to not encode it but to let it be implicit in the
|
||
|
position of the string in some larger structure: a particular element
|
||
|
of a file may always be a string of a certain length, or a string may
|
||
|
be the last element of a variable-length data structure whose overall
|
||
|
size determines how many bytes are left to read as string data. Both
|
||
|
these techniques are used in ID3 tags, as you'll see in the next
|
||
|
chapter.</P><P>The other two techniques can be used to encode variable-length
|
||
|
strings without relying on context. One is to encode the length of
|
||
|
the string followed by the character data--the parser reads an
|
||
|
integer value (in some specified integer format) and then reads that
|
||
|
number of characters. Another is to write the character data followed
|
||
|
by a delimiter that can't appear in the string such as a null
|
||
|
character.</P><P>The different representations have different advantages and
|
||
|
disadvantages, but when you're dealing with already specified binary
|
||
|
formats, you won't have any control over which encoding is used.
|
||
|
However, none of the encodings is particularly more difficult to read
|
||
|
and write than any other. Here, as an example, is a function that
|
||
|
reads a null-terminated ASCII string, assuming your Lisp
|
||
|
implementation uses ASCII or one of its supersets such as ISO-8859-1
|
||
|
or full Unicode as its native character encoding:</P><PRE>(defconstant +null+ (code-char 0))
|
||
|
|
||
|
(defun read-null-terminated-ascii (in)
|
||
|
(with-output-to-string (s)
|
||
|
(loop for char = (code-char (read-byte in))
|
||
|
until (char= char +null+) do (write-char char s))))</PRE><P>The <CODE><B>WITH-OUTPUT-TO-STRING</B></CODE> macro, which I mentioned in Chapter 14,
|
||
|
is an easy way to build up a string when you don't know how long it'll
|
||
|
be. It creates a <CODE><B>STRING-STREAM</B></CODE> and binds it to the variable name
|
||
|
specified, <CODE>s</CODE> in this case. All characters written to the stream
|
||
|
are collected into a string, which is then returned as the value of
|
||
|
the <CODE><B>WITH-OUTPUT-TO-STRING</B></CODE> form.</P><P>To write a string back out, you just need to translate the characters
|
||
|
back to numeric values that can be written with <CODE><B>WRITE-BYTE</B></CODE> and
|
||
|
then write the null terminator after the string contents.</P><PRE>(defun write-null-terminated-ascii (string out)
|
||
|
(loop for char across string
|
||
|
do (write-byte (char-code char) out))
|
||
|
(write-byte (char-code +null+) out))</PRE><P>As these examples show, the main intellectual challenge--such as it
|
||
|
is--of reading and writing primitive elements of binary files is
|
||
|
understanding how exactly to interpret the bytes that appear in a
|
||
|
file and to map them to Lisp data types. If a binary file format is
|
||
|
well specified, this should be a straightforward proposition.
|
||
|
Actually writing functions to read and write a particular encoding
|
||
|
is, as they say, a simple matter of programming.</P><P>Now you can turn to the issue of reading and writing more complex
|
||
|
on-disk structures and how to map them to Lisp objects.</P><A NAME="composite-structures"><H2>Composite Structures</H2></A><P>Since binary formats are usually used to represent data in a way that
|
||
|
makes it easy to map to in-memory data structures, it should come as
|
||
|
no surprise that composite on-disk structures are usually defined in
|
||
|
ways similar to the way programming languages define in-memory
|
||
|
structures. Usually a composite on-disk structure will consist of a
|
||
|
number of named parts, each of which is itself either a primitive
|
||
|
type such as a number or a string, another composite structure, or
|
||
|
possibly a collection of such values.</P><P>For instance, an ID3 tag defined in the 2.2 version of the
|
||
|
specification consists of a header made up of a three-character
|
||
|
ISO-8859-1 string, which is always "ID3"; two one-byte unsigned
|
||
|
integers that specify the major version and revision of the
|
||
|
specification; eight bits worth of boolean flags; and four bytes that
|
||
|
encode the size of the tag in an encoding particular to the ID3
|
||
|
specification. Following the header is a list of <I>frames</I>, each of
|
||
|
which has its own internal structure. After the frames are as many
|
||
|
null bytes as are necessary to pad the tag out to the size specified
|
||
|
in the header.</P><P>If you look at the world through the lens of object orientation,
|
||
|
composite structures look a lot like classes. For instance, you could
|
||
|
write a class to represent an ID3 tag.</P><PRE>(defclass id3-tag ()
|
||
|
((identifier :initarg :identifier :accessor identifier)
|
||
|
(major-version :initarg :major-version :accessor major-version)
|
||
|
(revision :initarg :revision :accessor revision)
|
||
|
(flags :initarg :flags :accessor flags)
|
||
|
(size :initarg :size :accessor size)
|
||
|
(frames :initarg :frames :accessor frames)))</PRE><P>An instance of this class would make a perfect repository to hold the
|
||
|
data needed to represent an ID3 tag. You could then write functions
|
||
|
to read and write instances of this class. For example, assuming the
|
||
|
existence of certain other functions for reading the appropriate
|
||
|
primitive data types, a <CODE>read-id3-tag</CODE> function might look like
|
||
|
this:</P><PRE>(defun read-id3-tag (in)
|
||
|
(let ((tag (make-instance 'id3-tag)))
|
||
|
(with-slots (identifier major-version revision flags size frames) tag
|
||
|
(setf identifier (read-iso-8859-1-string in :length 3))
|
||
|
(setf major-version (read-u1 in))
|
||
|
(setf revision (read-u1 in))
|
||
|
(setf flags (read-u1 in))
|
||
|
(setf size (read-id3-encoded-size in))
|
||
|
(setf frames (read-id3-frames in :tag-size size)))
|
||
|
tag))</PRE><P>The <CODE>write-id3-tag</CODE> function would be structured similarly--you'd
|
||
|
use the appropriate <CODE>write-*</CODE> functions to write out the values
|
||
|
stored in the slots of the <CODE>id3-tag</CODE> object.</P><P>It's not hard to see how you could write the appropriate classes to
|
||
|
represent all the composite data structures in a specification along
|
||
|
with <CODE>read-foo</CODE> and <CODE>write-foo</CODE> functions for each class and
|
||
|
for necessary primitive types. But it's also easy to tell that all the
|
||
|
reading and writing functions are going to be pretty similar,
|
||
|
differing only in the specifics of what types they read and the names
|
||
|
of the slots they store them in. It's particularly irksome when you
|
||
|
consider that in the ID3 specification it takes about four lines of
|
||
|
text to specify the structure of an ID3 tag, while you've already
|
||
|
written eighteen lines of code and haven't even written
|
||
|
<CODE>write-id3-tag</CODE> yet.</P><P>What you'd really like is a way to describe the structure of
|
||
|
something like an ID3 tag in a form that's as compressed as the
|
||
|
specification's pseudocode yet that can also be expanded into code
|
||
|
that defines the <CODE>id3-tag</CODE> class <I>and</I> the functions that
|
||
|
translate between bytes on disk and instances of the class. Sounds
|
||
|
like a job for a macro.</P><A NAME="designing-the-macros"><H2>Designing the Macros</H2></A><P>Since you already have a rough idea what code your macros will need
|
||
|
to generate, the next step, according to the process for writing a
|
||
|
macro I outlined in Chapter 8, is to switch perspectives and think
|
||
|
about what a call to the macro should look like. Since the goal is to
|
||
|
be able to write something as compressed as the pseudocode in the ID3
|
||
|
specification, you can start there. The header of an ID3 tag is
|
||
|
specified like this:</P><PRE>ID3/file identifier "ID3"
|
||
|
ID3 version $02 00
|
||
|
ID3 flags %xx000000
|
||
|
ID3 size 4 * %0xxxxxxx</PRE><P>In the notation of the specification, this means the "file
|
||
|
identifier" slot of an ID3 tag is the string "ID3" in ISO-8859-1
|
||
|
encoding. The version consists of two bytes, the first of which--for
|
||
|
this version of the specification--has the value 2 and the second of
|
||
|
which--again for this version of the specification--is 0. The flags
|
||
|
slot is eight bits, of which all but the first two are 0, and the
|
||
|
size consists of four bytes, each of which has a 0 in the most
|
||
|
significant bit.</P><P>Some information isn't captured by this pseudocode. For instance,
|
||
|
exactly how the four bytes that encode the size are to be interpreted
|
||
|
is described in a few lines of prose. Likewise, the spec describes in
|
||
|
prose how the frame and subsequent padding is stored after this
|
||
|
header. But most of what you need to know to be able to write code to
|
||
|
read and write an ID3 tag is specified by this pseudocode. Thus, you
|
||
|
ought to be able to write an s-expression version of this pseudocode
|
||
|
and have it expanded into the class and function definitions you'd
|
||
|
otherwise have to write by hand--something, perhaps, like this:</P><PRE>(define-binary-class id3-tag
|
||
|
((file-identifier (iso-8859-1-string :length 3))
|
||
|
(major-version u1)
|
||
|
(revision u1)
|
||
|
(flags u1)
|
||
|
(size id3-tag-size)
|
||
|
(frames (id3-frames :tag-size size))))</PRE><P>The basic idea is that this form defines a class <CODE>id3-tag</CODE>
|
||
|
similar to the way you could with <CODE><B>DEFCLASS</B></CODE>, but instead of
|
||
|
specifying things such as <CODE>:initarg</CODE> and <CODE>:accessors</CODE>, each
|
||
|
slot specification consists of the name of the
|
||
|
slot--<CODE>file-identifier</CODE>, <CODE>major-version</CODE>, and so on--and
|
||
|
information about how that slot is represented on disk. Since this is
|
||
|
just a bit of fantasizing, you don't have to worry about exactly how
|
||
|
the macro <CODE>define-binary-class</CODE> will know what to do with
|
||
|
expressions such as <CODE>(iso-8859-1-string :length 3)</CODE>, <CODE>u1</CODE>,
|
||
|
<CODE>id3-tag-size</CODE>, and <CODE>(id3-frames :tag-size size)</CODE>; as long
|
||
|
as each expression contains the information necessary to know how to
|
||
|
read and write a particular data encoding, you should be okay.</P><A NAME="making-the-dream-a-reality"><H2>Making the Dream a Reality</H2></A><P>Okay, enough fantasizing about good-looking code; now you need to get
|
||
|
to work writing <CODE>define-binary-class</CODE>--writing the code that
|
||
|
will turn that concise expression of what an ID3 tag looks like into
|
||
|
code that can represent one in memory, read one off disk, and write
|
||
|
it back out.</P><P>To start with, you should define a package for this library. Here's
|
||
|
the package file that comes with the version you can download from
|
||
|
the book's Web site:</P><PRE>(in-package :cl-user)
|
||
|
|
||
|
(defpackage :com.gigamonkeys.binary-data
|
||
|
(:use :common-lisp :com.gigamonkeys.macro-utilities)
|
||
|
(:export :define-binary-class
|
||
|
:define-tagged-binary-class
|
||
|
:define-binary-type
|
||
|
:read-value
|
||
|
:write-value
|
||
|
:*in-progress-objects*
|
||
|
:parent-of-type
|
||
|
:current-binary-object
|
||
|
:+null+))</PRE><P>The <CODE>COM.GIGAMONKEYS.MACRO-UTILITIES</CODE> package contains the
|
||
|
<CODE>with-gensyms</CODE> and <CODE>once-only</CODE> macros from Chapter 8.</P><P>Since you already have a handwritten version of the code you want to
|
||
|
generate, it shouldn't be too hard to write such a macro. Just take
|
||
|
it in small pieces, starting with a version of
|
||
|
<CODE>define-binary-class</CODE> that generates just the <CODE><B>DEFCLASS</B></CODE>
|
||
|
form.</P><P>If you look back at the <CODE>define-binary-class</CODE> form, you'll see
|
||
|
that it takes two arguments, the name <CODE>id3-tag</CODE> and a list of
|
||
|
slot specifiers, each of which is itself a two-item list. From those
|
||
|
pieces you need to build the appropriate <CODE><B>DEFCLASS</B></CODE> form. Clearly,
|
||
|
the biggest difference between the <CODE>define-binary-class</CODE> form
|
||
|
and a proper <CODE><B>DEFCLASS</B></CODE> form is in the slot specifiers. A single
|
||
|
slot specifier from <CODE>define-binary-class</CODE> looks something like
|
||
|
this:</P><PRE>(major-version u1)</PRE><P>But that's not a legal slot specifier for a <CODE><B>DEFCLASS</B></CODE>. Instead,
|
||
|
you need something like this:</P><PRE>(major-version :initarg :major-version :accessor major-version)</PRE><P>Easy enough. First define a simple function to translate a symbol to
|
||
|
the corresponding keyword symbol.</P><PRE>(defun as-keyword (sym) (intern (string sym) :keyword))</PRE><P>Now define a function that takes a <CODE>define-binary-class</CODE> slot
|
||
|
specifier and returns a <CODE><B>DEFCLASS</B></CODE> slot specifier.</P><PRE>(defun slot->defclass-slot (spec)
|
||
|
(let ((name (first spec)))
|
||
|
`(,name :initarg ,(as-keyword name) :accessor ,name)))</PRE><P>You can test this function at the REPL after switching to your new
|
||
|
package with a call to <CODE><B>IN-PACKAGE</B></CODE>.</P><PRE>BINARY-DATA> (slot->defclass-slot '(major-version u1))
|
||
|
(MAJOR-VERSION :INITARG :MAJOR-VERSION :ACCESSOR MAJOR-VERSION)</PRE><P>Looks good. Now the first version of <CODE>define-binary-class</CODE> is
|
||
|
trivial.</P><PRE>(defmacro define-binary-class (name slots)
|
||
|
`(defclass ,name ()
|
||
|
,(mapcar #'slot->defclass-slot slots)))</PRE><P>This is simple template-style macro--<CODE>define-binary-class</CODE>
|
||
|
generates a <CODE><B>DEFCLASS</B></CODE> form by interpolating the name of the class
|
||
|
and a list of slot specifiers constructed by applying
|
||
|
<CODE>slot->defclass-slot</CODE> to each element of the list of slots
|
||
|
specifiers from the <CODE>define-binary-class</CODE> form.</P><P>To see exactly what code this macro generates, you can evaluate this
|
||
|
expression at the REPL.</P><PRE>(macroexpand-1 '(define-binary-class id3-tag
|
||
|
((identifier (iso-8859-1-string :length 3))
|
||
|
(major-version u1)
|
||
|
(revision u1)
|
||
|
(flags u1)
|
||
|
(size id3-tag-size)
|
||
|
(frames (id3-frames :tag-size size)))))</PRE><P>The result, slightly reformatted here for better readability, should
|
||
|
look familiar since it's exactly the class definition you wrote by
|
||
|
hand earlier:</P><PRE>(defclass id3-tag ()
|
||
|
((identifier :initarg :identifier :accessor identifier)
|
||
|
(major-version :initarg :major-version :accessor major-version)
|
||
|
(revision :initarg :revision :accessor revision)
|
||
|
(flags :initarg :flags :accessor flags)
|
||
|
(size :initarg :size :accessor size)
|
||
|
(frames :initarg :frames :accessor frames)))</PRE><A NAME="reading-binary-objects"><H2>Reading Binary Objects</H2></A><P>Next you need to make <CODE>define-binary-class</CODE> also generate a
|
||
|
function that can read an instance of the new class. Looking back at
|
||
|
the <CODE>read-id3-tag</CODE> function you wrote before, this seems a bit
|
||
|
trickier, as the <CODE>read-id3-tag</CODE> wasn't quite so regular--to read
|
||
|
each slot's value, you had to call a different function. Not to
|
||
|
mention, the name of the function, <CODE>read-id3-tag</CODE>, while derived
|
||
|
from the name of the class you're defining, isn't one of the
|
||
|
arguments to <CODE>define-binary-class</CODE> and thus isn't available to
|
||
|
be interpolated into a template the way the class name was.</P><P>You could deal with both of those problems by devising and following a
|
||
|
naming convention so the macro can figure out the name of the function
|
||
|
to call based on the name of the type in the slot specifier. However,
|
||
|
this would require <CODE>define-binary-class</CODE> to generate the name
|
||
|
<CODE>read-id3-tag</CODE>, which is possible but a bad idea. Macros that
|
||
|
create global definitions should generally use only names passed to
|
||
|
them by their callers; macros that generate names under the covers can
|
||
|
cause hard-to-predict--and hard-to-debug--name conflicts when the
|
||
|
generated names happen to be the same as names used
|
||
|
elsewhere.<SUP>8</SUP></P><P>You can avoid both these inconveniences by noticing that all the
|
||
|
functions that read a particular type of value have the same
|
||
|
fundamental purpose, to read a value of a specific type from a
|
||
|
stream. Speaking colloquially, you might say they're all instances of
|
||
|
a single generic operation. And the colloquial use of the word
|
||
|
<I>generic</I> should lead you directly to the solution to your problem:
|
||
|
instead of defining a bunch of independent functions, all with
|
||
|
different names, you can define a single generic function,
|
||
|
<CODE>read-value</CODE>, with methods specialized to read different types
|
||
|
of values.</P><P>That is, instead of defining functions <CODE>read-iso-8859-1-string</CODE>
|
||
|
and <CODE>read-u1</CODE>, you can define <CODE>read-value</CODE> as a generic
|
||
|
function taking two required arguments, a type and a stream, and
|
||
|
possibly some keyword arguments.</P><PRE>(defgeneric read-value (type stream &key)
|
||
|
(:documentation "Read a value of the given type from the stream."))</PRE><P>By specifying <CODE><B>&key</B></CODE> without any actual keyword parameters, you
|
||
|
allow different methods to define their own <CODE><B>&key</B></CODE> parameters
|
||
|
without requiring them to do so. This does mean every method
|
||
|
specialized on <CODE>read-value</CODE> will have to include either
|
||
|
<CODE><B>&key</B></CODE> or an <CODE><B>&rest</B></CODE> parameter in its parameter list to be
|
||
|
compatible with the generic function.</P><P>Then you'll define methods that use <CODE><B>EQL</B></CODE> specializers to
|
||
|
specialize the type argument on the name of the type you want to
|
||
|
read.</P><PRE>(defmethod read-value ((type (eql 'iso-8859-1-string)) in &key length) ...)
|
||
|
|
||
|
(defmethod read-value ((type (eql 'u1)) in &key) ...)</PRE><P>Then you can make <CODE>define-binary-class</CODE> generate a
|
||
|
<CODE>read-value</CODE> method specialized on the type name <CODE>id3-tag</CODE>,
|
||
|
and that method can be implemented in terms of calls to
|
||
|
<CODE>read-value</CODE> with the appropriate slot types as the first
|
||
|
argument. The code you want to generate is going to look like this:</P><PRE>(defmethod read-value ((type (eql 'id3-tag)) in &key)
|
||
|
(let ((object (make-instance 'id3-tag)))
|
||
|
(with-slots (identifier major-version revision flags size frames) object
|
||
|
(setf identifier (read-value 'iso-8859-1-string in :length 3))
|
||
|
(setf major-version (read-value 'u1 in))
|
||
|
(setf revision (read-value 'u1 in))
|
||
|
(setf flags (read-value 'u1 in))
|
||
|
(setf size (read-value 'id3-encoded-size in))
|
||
|
(setf frames (read-value 'id3-frames in :tag-size size)))
|
||
|
object))</PRE><P>So, just as you needed a function to translate a
|
||
|
<CODE>define-binary-class</CODE> slot specifier to a <CODE><B>DEFCLASS</B></CODE> slot
|
||
|
specifier in order to generate the <CODE><B>DEFCLASS</B></CODE> form, now you need a
|
||
|
function that takes a <CODE>define-binary-class</CODE> slot specifier and
|
||
|
generates the appropriate <CODE><B>SETF</B></CODE> form, that is, something that
|
||
|
takes this:</P><PRE>(identifier (iso-8859-1-string :length 3))</PRE><P>and returns this:</P><PRE>(setf identifier (read-value 'iso-8859-1-string in :length 3))</PRE><P>However, there's a difference between this code and the <CODE><B>DEFCLASS</B></CODE>
|
||
|
slot specifier: it includes a reference to a variable <CODE>in</CODE>--the
|
||
|
method parameter from the <CODE>read-value</CODE> method--that wasn't
|
||
|
derived from the slot specifier. It doesn't have to be called
|
||
|
<CODE>in</CODE>, but whatever name you use has to be the same as the one
|
||
|
used in the method's parameter list and in the other calls to
|
||
|
<CODE>read-value</CODE>. For now you can dodge the issue of where that name
|
||
|
comes from by defining <CODE>slot->read-value</CODE> to take a second
|
||
|
argument of the name of the stream variable.</P><PRE>(defun slot->read-value (spec stream)
|
||
|
(destructuring-bind (name (type &rest args)) (normalize-slot-spec spec)
|
||
|
`(setf ,name (read-value ',type ,stream ,@args))))</PRE><P>The function <CODE>normalize-slot-spec</CODE> normalizes the second element
|
||
|
of the slot specifier, converting a symbol like <CODE>u1</CODE> to the list
|
||
|
<CODE>(u1)</CODE> so the <CODE><B>DESTRUCTURING-BIND</B></CODE> can parse it. It looks like
|
||
|
this:</P><PRE>(defun normalize-slot-spec (spec)
|
||
|
(list (first spec) (mklist (second spec))))
|
||
|
|
||
|
(defun mklist (x) (if (listp x) x (list x)))</PRE><P>You can test <CODE>slot->read-value</CODE> with each type of slot
|
||
|
specifier.</P><PRE>BINARY-DATA> (slot->read-value '(major-version u1) 'stream)
|
||
|
(SETF MAJOR-VERSION (READ-VALUE 'U1 STREAM))
|
||
|
BINARY-DATA> (slot->read-value '(identifier (iso-8859-1-string :length 3)) 'stream)
|
||
|
(SETF IDENTIFIER (READ-VALUE 'ISO-8859-1-STRING STREAM :LENGTH 3))</PRE><P>With these functions you're ready to add <CODE>read-value</CODE> to
|
||
|
<CODE>define-binary-class</CODE>. If you take the handwritten
|
||
|
<CODE>read-value</CODE> method and strip out anything that's tied to a
|
||
|
particular class, you're left with this skeleton:</P><PRE>(defmethod read-value ((type (eql ...)) stream &key)
|
||
|
(let ((object (make-instance ...)))
|
||
|
(with-slots (...) object
|
||
|
...
|
||
|
object)))</PRE><P>All you need to do is add this skeleton to the
|
||
|
<CODE>define-binary-class</CODE> template, replacing ellipses with code
|
||
|
that fills in the skeleton with the appropriate names and code.
|
||
|
You'll also want to replace the variables <CODE>type</CODE>, <CODE>stream</CODE>,
|
||
|
and <CODE>object</CODE> with gensymed names to avoid potential conflicts
|
||
|
with slot names,<SUP>9</SUP> which you can do with the
|
||
|
<CODE>with-gensyms</CODE> macro from Chapter 8.</P><P>Also, because a macro must expand into a single form, you need to wrap
|
||
|
some form around the <CODE><B>DEFCLASS</B></CODE> and <CODE><B>DEFMETHOD</B></CODE>. <CODE><B>PROGN</B></CODE> is
|
||
|
the customary form to use for macros that expand into multiple
|
||
|
definitions because of the special treatment it gets from the file
|
||
|
compiler when appearing at the top level of a file, as I discussed in
|
||
|
Chapter 20.</P><P>So, you can change <CODE>define-binary-class</CODE> as follows:</P><PRE>(defmacro define-binary-class (name slots)
|
||
|
(with-gensyms (typevar objectvar streamvar)
|
||
|
`(progn
|
||
|
(defclass ,name ()
|
||
|
,(mapcar #'slot->defclass-slot slots))
|
||
|
|
||
|
(defmethod read-value ((,typevar (eql ',name)) ,streamvar &key)
|
||
|
(let ((,objectvar (make-instance ',name)))
|
||
|
(with-slots ,(mapcar #'first slots) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots))
|
||
|
,objectvar)))))</PRE><A NAME="writing-binary-objects"><H2>Writing Binary Objects</H2></A><P>Generating code to write out an instance of a binary class will
|
||
|
proceed similarly. First you can define a <CODE>write-value</CODE> generic
|
||
|
function.</P><PRE>(defgeneric write-value (type stream value &key)
|
||
|
(:documentation "Write a value as the given type to the stream."))</PRE><P>Then you define a helper function that translates a
|
||
|
<CODE>define-binary-class</CODE> slot specifier into code that writes out
|
||
|
the slot using <CODE>write-value</CODE>. As with the
|
||
|
<CODE>slot->read-value</CODE> function, this helper function needs to take
|
||
|
the name of the stream variable as an argument.</P><PRE>(defun slot->write-value (spec stream)
|
||
|
(destructuring-bind (name (type &rest args)) (normalize-slot-spec spec)
|
||
|
`(write-value ',type ,stream ,name ,@args)))</PRE><P>Now you can add a <CODE>write-value</CODE> template to the
|
||
|
<CODE>define-binary-class</CODE> macro.</P><PRE>(defmacro define-binary-class (name slots)
|
||
|
(with-gensyms (typevar objectvar streamvar)
|
||
|
`(progn
|
||
|
(defclass ,name ()
|
||
|
,(mapcar #'slot->defclass-slot slots))
|
||
|
|
||
|
(defmethod read-value ((,typevar (eql ',name)) ,streamvar &key)
|
||
|
(let ((,objectvar (make-instance ',name)))
|
||
|
(with-slots ,(mapcar #'first slots) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots))
|
||
|
,objectvar))
|
||
|
|
||
|
(defmethod write-value ((,typevar (eql ',name)) ,streamvar ,objectvar &key)
|
||
|
(with-slots ,(mapcar #'first slots) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots))))))</PRE><A NAME="adding-inheritance-and-tagged-structures"><H2>Adding Inheritance and Tagged Structures</H2></A><P>While this version of <CODE>define-binary-class</CODE> will handle
|
||
|
stand-alone structures, binary file formats often define on-disk
|
||
|
structures that would be natural to model with subclasses and
|
||
|
superclasses. So you might want to extend <CODE>define-binary-class</CODE>
|
||
|
to support inheritance.</P><P>A related technique used in many binary formats is to have several
|
||
|
on-disk structures whose exact type can be determined only by reading
|
||
|
some data that indicates how to parse the following bytes. For
|
||
|
instance, the frames that make up the bulk of an ID3 tag all share a
|
||
|
common header structure consisting of a string identifier and a
|
||
|
length. To read a frame, you need to read the identifier and use its
|
||
|
value to determine what kind of frame you're looking at and thus how
|
||
|
to parse the body of the frame.</P><P>The current <CODE>define-binary-class</CODE> macro has no way to handle
|
||
|
this kind of reading--you could use <CODE>define-binary-class</CODE> to
|
||
|
define a class to represent each kind of frame, but you'd have no way
|
||
|
to know what type of frame to read without reading at least the
|
||
|
identifier. And if other code reads the identifier in order to
|
||
|
determine what type to pass to <CODE>read-value</CODE>, then that will
|
||
|
break <CODE>read-value</CODE> since it's expecting to read all the data
|
||
|
that makes up the instance of the class it instantiates.</P><P>You can solve this problem by adding inheritance to
|
||
|
<CODE>define-binary-class</CODE> and then writing another macro,
|
||
|
<CODE>define-tagged-binary-class</CODE>, for defining "abstract" classes
|
||
|
that aren't instantiated directly but that can be specialized on by
|
||
|
<CODE>read-value</CODE> methods that know how to read enough data to
|
||
|
determine what kind of class to create.</P><P>The first step to adding inheritance to <CODE>define-binary-class</CODE> is
|
||
|
to add a parameter to the macro to accept a list of superclasses.</P><PRE>(defmacro define-binary-class (name (&rest superclasses) slots) ...</PRE><P>Then, in the <CODE><B>DEFCLASS</B></CODE> template, interpolate that value instead
|
||
|
of the empty list.</P><PRE>(defclass ,name ,superclasses
|
||
|
...)</PRE><P>However, there's a bit more to it than that. You also need to change
|
||
|
the <CODE>read-value</CODE> and <CODE>write-value</CODE> methods so the methods
|
||
|
generated when defining a superclass can be used by the methods
|
||
|
generated as part of a subclass to read and write inherited slots.</P><P>The current way <CODE>read-value</CODE> works is particularly problematic
|
||
|
since it instantiates the object before filling it in--obviously, you
|
||
|
can't have the method responsible for reading the superclass's fields
|
||
|
instantiate one object while the subclass's method instantiates and
|
||
|
fills in a different object.</P><P>You can fix that problem by splitting <CODE>read-value</CODE> into two
|
||
|
parts--one responsible for instantiating the correct kind of object
|
||
|
and another responsible for filling slots in an existing object. On
|
||
|
the writing side it's a bit simpler, but you can use the same
|
||
|
technique.</P><P>So you'll define two new generic functions, <CODE>read-object</CODE> and
|
||
|
<CODE>write-object</CODE>, that will both take an existing object and a
|
||
|
stream. Methods on these generic functions will be responsible for
|
||
|
reading or writing the slots specific to the class of the object on
|
||
|
which they're specialized.</P><PRE>(defgeneric read-object (object stream)
|
||
|
(:method-combination progn :most-specific-last)
|
||
|
(:documentation "Fill in the slots of object from stream."))
|
||
|
|
||
|
(defgeneric write-object (object stream)
|
||
|
(:method-combination progn :most-specific-last)
|
||
|
(:documentation "Write out the slots of object to the stream."))</PRE><P>Defining these generic functions to use the <CODE><B>PROGN</B></CODE> method
|
||
|
combination with the option <CODE>:most-specific-last</CODE> allows you to
|
||
|
define methods that specialize <CODE>object</CODE> on each binary class and
|
||
|
have them deal only with the slots actually defined in that class;
|
||
|
the <CODE><B>PROGN</B></CODE> method combination will combine all the applicable
|
||
|
methods so the method specialized on the least specific class in the
|
||
|
hierarchy runs first, reading or writing the slots defined in that
|
||
|
class, then the method specialized on next least specific subclass,
|
||
|
and so on. And since all the heavy lifting for a specific class is
|
||
|
now going to be done by <CODE>read-object</CODE> and <CODE>write-object</CODE>,
|
||
|
you don't even need to define specialized <CODE>read-value</CODE> and
|
||
|
<CODE>write-value</CODE> methods; you can define default methods that
|
||
|
assume the type argument is the name of a binary class.</P><PRE>(defmethod read-value ((type symbol) stream &key)
|
||
|
(let ((object (make-instance type)))
|
||
|
(read-object object stream)
|
||
|
object))
|
||
|
|
||
|
(defmethod write-value ((type symbol) stream value &key)
|
||
|
(assert (typep value type))
|
||
|
(write-object value stream))</PRE><P>Note how you can use <CODE><B>MAKE-INSTANCE</B></CODE> as a generic object
|
||
|
factory--while you normally call <CODE><B>MAKE-INSTANCE</B></CODE> with a quoted
|
||
|
symbol as the first argument because you normally know exactly what
|
||
|
class you want to instantiate, you can use any expression that
|
||
|
evaluates to a class name such as, in this case, the <CODE>type</CODE>
|
||
|
parameter in the <CODE>read-value</CODE> method.</P><P>The actual changes to <CODE>define-binary-class</CODE> to define methods on
|
||
|
<CODE>read-object</CODE> and <CODE>write-object</CODE> rather than
|
||
|
<CODE>read-value</CODE> and <CODE>write-value</CODE> are fairly minor.</P><PRE>(defmacro define-binary-class (name superclasses slots)
|
||
|
(with-gensyms (objectvar streamvar)
|
||
|
`(progn
|
||
|
(defclass ,name ,superclasses
|
||
|
,(mapcar #'slot->defclass-slot slots))
|
||
|
|
||
|
(defmethod read-object progn ((,objectvar ,name) ,streamvar)
|
||
|
(with-slots ,(mapcar #'first slots) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots)))
|
||
|
|
||
|
(defmethod write-object progn ((,objectvar ,name) ,streamvar)
|
||
|
(with-slots ,(mapcar #'first slots) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots))))))</PRE><A NAME="keeping-track-of-inherited-slots"><H2>Keeping Track of Inherited Slots</H2></A><P>This definition will work for many purposes. However, it doesn't
|
||
|
handle one fairly common situation, namely, when you have a subclass
|
||
|
that needs to refer to inherited slots in its own slot
|
||
|
specifications. For instance, with the current definition of
|
||
|
<CODE>define-binary-class</CODE>, you can define a single class like this:</P><PRE>(define-binary-class generic-frame ()
|
||
|
((id (iso-8859-1-string :length 3))
|
||
|
(size u3)
|
||
|
(data (raw-bytes :bytes size))))</PRE><P>The reference to <CODE>size</CODE> in the specification of <CODE>data</CODE>
|
||
|
works the way you'd expect because the expressions that read and
|
||
|
write the <CODE>data</CODE> slot are wrapped in a <CODE><B>WITH-SLOTS</B></CODE> that
|
||
|
lists all the object's slots. However, if you try to split that class
|
||
|
into two classes like this:</P><PRE>(define-binary-class frame ()
|
||
|
((id (iso-8859-1-string :length 3))
|
||
|
(size u3)))
|
||
|
|
||
|
(define-binary-class generic-frame (frame)
|
||
|
((data (raw-bytes :bytes size))))</PRE><P>you'll get a compile-time warning when you compile the
|
||
|
<CODE>generic-frame</CODE> definition and a runtime error when you try to
|
||
|
use it because there will be no lexically apparent variable
|
||
|
<CODE>size</CODE> in the <CODE>read-object</CODE> and <CODE>write-object</CODE> methods
|
||
|
specialized on <CODE>generic-frame</CODE>.</P><P>What you need to do is keep track of the slots defined by each binary
|
||
|
class and then include inherited slots in the <CODE><B>WITH-SLOTS</B></CODE> forms
|
||
|
in the <CODE>read-object</CODE> and <CODE>write-object</CODE> methods.</P><P>The easiest way to keep track of information like this is to hang it
|
||
|
off the symbol that names the class. As I discussed in Chapter 21,
|
||
|
every symbol object has an associated property list, which can be
|
||
|
accessed via the functions <CODE><B>SYMBOL-PLIST</B></CODE> and <CODE><B>GET</B></CODE>. You can
|
||
|
associate arbitrary key/value pairs with a symbol by adding them to
|
||
|
its property list with <CODE><B>SETF</B></CODE> of <CODE><B>GET</B></CODE>. For instance, if the
|
||
|
binary class <CODE>foo</CODE> defines three slots--<CODE>x</CODE>, <CODE>y</CODE>, and
|
||
|
<CODE>z</CODE>--you can keep track of that fact by adding a <CODE>slots</CODE>
|
||
|
key to the symbol <CODE>foo</CODE>'s property list with the value <CODE>(x
|
||
|
y z)</CODE> with this expression:</P><PRE>(setf (get 'foo 'slots) '(x y z))</PRE><P>You want this bookkeeping to happen as part of evaluating the
|
||
|
<CODE>define-binary-class</CODE> of <CODE>foo</CODE>. However, it's not clear
|
||
|
where to put the expression. If you evaluate it when you compute the
|
||
|
macro's expansion, it'll get evaluated when you compile the
|
||
|
<CODE>define-binary-class</CODE> form but not if you later load a file that
|
||
|
contains the resulting compiled code. On the other hand, if you
|
||
|
include the expression in the expansion, then it <I>won't</I> be
|
||
|
evaluated during compilation, which means if you compile a file with
|
||
|
several <CODE>define-binary-class</CODE> forms, none of the information
|
||
|
about what classes define what slots will be available until the
|
||
|
whole file is loaded, which is too late.</P><P>This is what the special operator <CODE><B>EVAL-WHEN</B></CODE> I discussed in
|
||
|
Chapter 20 is for. By wrapping a form in an <CODE><B>EVAL-WHEN</B></CODE>, you can
|
||
|
control whether it's evaluated at compile time, when the compiled
|
||
|
code is loaded, or both. For cases like this where you want to
|
||
|
squirrel away some information during the compilation of a macro form
|
||
|
that you also want to be available after the compiled form is loaded,
|
||
|
you should wrap it in an <CODE><B>EVAL-WHEN</B></CODE> like this:</P><PRE>(eval-when (:compile-toplevel :load-toplevel :execute)
|
||
|
(setf (get 'foo 'slots) '(x y z)))</PRE><P>and include the <CODE><B>EVAL-WHEN</B></CODE> in the expansion generated by the
|
||
|
macro. Thus, you can save both the slots and the direct superclasses
|
||
|
of a binary class by adding this form to the expansion generated by
|
||
|
<CODE>define-binary-class</CODE>:</P><PRE>(eval-when (:compile-toplevel :load-toplevel :execute)
|
||
|
(setf (get ',name 'slots) ',(mapcar #'first slots))
|
||
|
(setf (get ',name 'superclasses) ',superclasses))</PRE><P>Now you can define three helper functions for accessing this
|
||
|
information. The first simply returns the slots directly defined by a
|
||
|
binary class. It's a good idea to return a copy of the list since you
|
||
|
don't want other code to modify the list of slots after the binary
|
||
|
class has been defined.</P><PRE>(defun direct-slots (name)
|
||
|
(copy-list (get name 'slots)))</PRE><P>The next function returns the slots inherited from other binary
|
||
|
classes.</P><PRE>(defun inherited-slots (name)
|
||
|
(loop for super in (get name 'superclasses)
|
||
|
nconc (direct-slots super)
|
||
|
nconc (inherited-slots super)))</PRE><P>Finally, you can define a function that returns a list containing the
|
||
|
names of all directly defined and inherited slots.</P><PRE>(defun all-slots (name)
|
||
|
(nconc (direct-slots name) (inherited-slots name)))</PRE><P>When you're computing the expansion of a
|
||
|
<CODE>define-generic-binary-class</CODE> form, you want to generate a
|
||
|
<CODE><B>WITH-SLOTS</B></CODE> form that contains the names of all the slots defined
|
||
|
in the new class and all its superclasses. However, you can't use
|
||
|
<CODE>all-slots</CODE> while you're generating the expansion since the
|
||
|
information won't be available until after the expansion is compiled.
|
||
|
Instead, you should use the following function, which takes the list
|
||
|
of slot specifiers and superclasses passed to
|
||
|
<CODE>define-generic-binary-class</CODE> and uses them to compute the list
|
||
|
of all the new class's slots:</P><PRE>(defun new-class-all-slots (slots superclasses)
|
||
|
(nconc (mapcan #'all-slots superclasses) (mapcar #'first slots)))</PRE><P>With these functions defined, you can change
|
||
|
<CODE>define-binary-class</CODE> to store the information about the class
|
||
|
currently being defined and to use the already stored information
|
||
|
about the superclasses' slots to generate the <CODE><B>WITH-SLOTS</B></CODE> forms
|
||
|
you want like this:</P><PRE>(defmacro define-binary-class (name (&rest superclasses) slots)
|
||
|
(with-gensyms (objectvar streamvar)
|
||
|
`(progn
|
||
|
(eval-when (:compile-toplevel :load-toplevel :execute)
|
||
|
(setf (get ',name 'slots) ',(mapcar #'first slots))
|
||
|
(setf (get ',name 'superclasses) ',superclasses))
|
||
|
|
||
|
(defclass ,name ,superclasses
|
||
|
,(mapcar #'slot->defclass-slot slots))
|
||
|
|
||
|
(defmethod read-object progn ((,objectvar ,name) ,streamvar)
|
||
|
(with-slots ,(new-class-all-slots slots superclasses) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots)))
|
||
|
|
||
|
(defmethod write-object progn ((,objectvar ,name) ,streamvar)
|
||
|
(with-slots ,(new-class-all-slots slots superclasses) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots))))))</PRE><A NAME="tagged-structures"><H2>Tagged Structures</H2></A><P>With the ability to define binary classes that extend other binary
|
||
|
classes, you're ready to define a new macro for defining classes to
|
||
|
represent "tagged" structures. The strategy for reading tagged
|
||
|
structures will be to define a specialized <CODE>read-value</CODE> method
|
||
|
that knows how to read the values that make up the start of the
|
||
|
structure and then use those values to determine what subclass to
|
||
|
instantiate. It'll then make an instance of that class with
|
||
|
<CODE><B>MAKE-INSTANCE</B></CODE>, passing the already read values as initargs, and
|
||
|
pass the object to <CODE>read-object</CODE>, allowing the actual class of
|
||
|
the object to determine how the rest of the structure is read.</P><P>The new macro, <CODE>define-tagged-binary-class</CODE>, will look like
|
||
|
<CODE>define-binary-class</CODE> with the addition of a <CODE>:dispatch</CODE>
|
||
|
option used to specify a form that should evaluate to the name of a
|
||
|
binary class. The <CODE>:dispatch</CODE> form will be evaluated in a context
|
||
|
where the names of the slots defined by the tagged class are bound to
|
||
|
variables that hold the values read from the file. The class whose
|
||
|
name it returns must accept initargs corresponding to the slot names
|
||
|
defined by the tagged class. This is easily ensured if the
|
||
|
<CODE>:dispatch</CODE> form always evaluates to the name of a class that
|
||
|
subclasses the tagged class.</P><P>For instance, supposing you have a function, <CODE>find-frame-class</CODE>,
|
||
|
that will map a string identifier to a binary class representing a
|
||
|
particular kind of ID3 frame, you might define a tagged binary class,
|
||
|
<CODE>id3-frame</CODE>, like this:</P><PRE>(define-tagged-binary-class id3-frame ()
|
||
|
((id (iso-8859-1-string :length 3))
|
||
|
(size u3))
|
||
|
(:dispatch (find-frame-class id)))</PRE><P>The expansion of a <CODE>define-tagged-binary-class</CODE> will contain a
|
||
|
<CODE><B>DEFCLASS</B></CODE> and a <CODE>write-object</CODE> method just like the expansion
|
||
|
of <CODE>define-binary-class</CODE>, but instead of a <CODE>read-object</CODE>
|
||
|
method it'll contain a <CODE>read-value</CODE> method that looks like this:</P><PRE>(defmethod read-value ((type (eql 'id3-frame)) stream &key)
|
||
|
(let ((id (read-value 'iso-8859-1-string stream :length 3))
|
||
|
(size (read-value 'u3 stream)))
|
||
|
(let ((object (make-instance (find-frame-class id) :id id :size size)))
|
||
|
(read-object object stream)
|
||
|
object)))</PRE><P>Since the expansions of <CODE>define-tagged-binary-class</CODE> and
|
||
|
<CODE>define-binary-class</CODE> are going to be identical except for the
|
||
|
read method, you can factor out the common bits into a helper macro,
|
||
|
<CODE>define-generic-binary-class</CODE>, that accepts the read method as a
|
||
|
parameter and interpolates it.</P><PRE>(defmacro define-generic-binary-class (name (&rest superclasses) slots read-method)
|
||
|
(with-gensyms (objectvar streamvar)
|
||
|
`(progn
|
||
|
(eval-when (:compile-toplevel :load-toplevel :execute)
|
||
|
(setf (get ',name 'slots) ',(mapcar #'first slots))
|
||
|
(setf (get ',name 'superclasses) ',superclasses))
|
||
|
|
||
|
(defclass ,name ,superclasses
|
||
|
,(mapcar #'slot->defclass-slot slots))
|
||
|
|
||
|
,read-method
|
||
|
|
||
|
(defmethod write-object progn ((,objectvar ,name) ,streamvar)
|
||
|
(declare (ignorable ,streamvar))
|
||
|
(with-slots ,(new-class-all-slots slots superclasses) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->write-value x streamvar)) slots))))))</PRE><P>Now you can define both <CODE>define-binary-class</CODE> and
|
||
|
<CODE>define-tagged-binary-class</CODE> to expand into a call to
|
||
|
<CODE>define-generic-binary-class</CODE>. Here's a new version of
|
||
|
<CODE>define-binary-class</CODE> that generates the same code as the earlier
|
||
|
version when it's fully expanded:</P><PRE>(defmacro define-binary-class (name (&rest superclasses) slots)
|
||
|
(with-gensyms (objectvar streamvar)
|
||
|
`(define-generic-binary-class ,name ,superclasses ,slots
|
||
|
(defmethod read-object progn ((,objectvar ,name) ,streamvar)
|
||
|
(declare (ignorable ,streamvar))
|
||
|
(with-slots ,(new-class-all-slots slots superclasses) ,objectvar
|
||
|
,@(mapcar #'(lambda (x) (slot->read-value x streamvar)) slots))))))</PRE><P>And here's <CODE>define-tagged-binary-class</CODE> along with two new
|
||
|
helper functions it uses:</P><PRE>(defmacro define-tagged-binary-class (name (&rest superclasses) slots &rest options)
|
||
|
(with-gensyms (typevar objectvar streamvar)
|
||
|
`(define-generic-binary-class ,name ,superclasses ,slots
|
||
|
(defmethod read-value ((,typevar (eql ',name)) ,streamvar &key)
|
||
|
(let* ,(mapcar #'(lambda (x) (slot->binding x streamvar)) slots)
|
||
|
(let ((,objectvar
|
||
|
(make-instance
|
||
|
,@(or (cdr (assoc :dispatch options))
|
||
|
(error "Must supply :dispatch form."))
|
||
|
,@(mapcan #'slot->keyword-arg slots))))
|
||
|
(read-object ,objectvar ,streamvar)
|
||
|
,objectvar))))))
|
||
|
|
||
|
(defun slot->binding (spec stream)
|
||
|
(destructuring-bind (name (type &rest args)) (normalize-slot-spec spec)
|
||
|
`(,name (read-value ',type ,stream ,@args))))
|
||
|
|
||
|
(defun slot->keyword-arg (spec)
|
||
|
(let ((name (first spec)))
|
||
|
`(,(as-keyword name) ,name)))</PRE><A NAME="primitive-binary-types"><H2>Primitive Binary Types</H2></A><P>While <CODE>define-binary-class</CODE> and
|
||
|
<CODE>define-tagged-binary-class</CODE> make it easy to define composite
|
||
|
structures, you still have to write <CODE>read-value</CODE> and
|
||
|
<CODE>write-value</CODE> methods for primitive data types by hand. You
|
||
|
could decide to live with that, specifying that users of the library
|
||
|
need to write appropriate methods on <CODE>read-value</CODE> and
|
||
|
<CODE>write-value</CODE> to support the primitive types used by their
|
||
|
binary classes.</P><P>However, rather than having to document how to write a suitable
|
||
|
<CODE>read-value</CODE>/<CODE>write-value</CODE> pair, you can provide a macro to
|
||
|
do it automatically. This also has the advantage of making the
|
||
|
abstraction created by <CODE>define-binary-class</CODE> less leaky.
|
||
|
Currently, <CODE>define-binary-class</CODE> depends on having methods on
|
||
|
<CODE>read-value</CODE> and <CODE>write-value</CODE> defined in a particular way,
|
||
|
but that's really just an implementation detail. By defining a macro
|
||
|
that generates the <CODE>read-value</CODE> and <CODE>write-value</CODE> methods
|
||
|
for primitive types, you hide those details behind an abstraction you
|
||
|
control. If you decide later to change the implementation of
|
||
|
<CODE>define-binary-class</CODE>, you can change your
|
||
|
primitive-type-defining macro to meet the new requirements without
|
||
|
requiring any changes to code that uses the binary data library.</P><P>So you should define one last macro, <CODE>define-binary-type</CODE>, that
|
||
|
will generate <CODE>read-value</CODE> and <CODE>write-value</CODE> methods for
|
||
|
reading values represented by instances of existing classes, rather
|
||
|
than by classes defined with <CODE>define-binary-class</CODE>.</P><P>For a concrete example, consider a type used in the <CODE>id3-tag</CODE>
|
||
|
class, a fixed-length string encoded in ISO-8859-1 characters. I'll
|
||
|
assume, as I did earlier, that the native character encoding of your
|
||
|
Lisp is ISO-8859-1 or a superset, so you can use <CODE><B>CODE-CHAR</B></CODE> and
|
||
|
<CODE><B>CHAR-CODE</B></CODE> to translate bytes to characters and back.</P><P>As always, your goal is to write a macro that allows you to express
|
||
|
only the essential information needed to generate the required code.
|
||
|
In this case, there are four pieces of essential information: the
|
||
|
name of the type, <CODE>iso-8859-1-string</CODE>; the <CODE><B>&key</B></CODE> parameters
|
||
|
that should be accepted by the <CODE>read-value</CODE> and
|
||
|
<CODE>write-value</CODE> methods, <CODE>length</CODE> in this case; the code for
|
||
|
reading from a stream; and the code for writing to a stream. Here's
|
||
|
an expression that contains those four pieces of information:</P><PRE>(define-binary-type iso-8859-1-string (length)
|
||
|
(:reader (in)
|
||
|
(let ((string (make-string length)))
|
||
|
(dotimes (i length)
|
||
|
(setf (char string i) (code-char (read-byte in))))
|
||
|
string))
|
||
|
(:writer (out string)
|
||
|
(dotimes (i length)
|
||
|
(write-byte (char-code (char string i)) out))))</PRE><P>Now you just need a macro that can take apart this form and put it
|
||
|
back together in the form of two <CODE><B>DEFMETHOD</B></CODE>s wrapped in a
|
||
|
<CODE><B>PROGN</B></CODE>. If you define the parameter list to
|
||
|
<CODE>define-binary-type</CODE> like this:</P><PRE> (defmacro define-binary-type (name (&rest args) &body spec) ...</PRE><P>then within the macro the parameter <CODE>spec</CODE> will be a list
|
||
|
containing the reader and writer definitions. You can then use
|
||
|
<CODE><B>ASSOC</B></CODE> to extract the elements of <CODE>spec</CODE> using the tags
|
||
|
<CODE>:reader</CODE> and <CODE>:writer</CODE> and then use
|
||
|
<CODE><B>DESTRUCTURING-BIND</B></CODE> to take apart the <CODE><B>REST</B></CODE> of each
|
||
|
element.<SUP>10</SUP></P><P>From there it's just a matter of interpolating the extracted values
|
||
|
into the backquoted templates of the <CODE>read-value</CODE> and
|
||
|
<CODE>write-value</CODE> methods.</P><PRE>(defmacro define-binary-type (name (&rest args) &body spec)
|
||
|
(with-gensyms (type)
|
||
|
`(progn
|
||
|
,(destructuring-bind ((in) &body body) (rest (assoc :reader spec))
|
||
|
`(defmethod read-value ((,type (eql ',name)) ,in &key ,@args)
|
||
|
,@body))
|
||
|
,(destructuring-bind ((out value) &body body) (rest (assoc :writer spec))
|
||
|
`(defmethod write-value ((,type (eql ',name)) ,out ,value &key ,@args)
|
||
|
,@body)))))</PRE><P>Note how the backquoted templates are nested: the outermost template
|
||
|
starts with the backquoted <CODE><B>PROGN</B></CODE> form. That template consists of
|
||
|
the symbol <CODE><B>PROGN</B></CODE> and two comma-unquoted <CODE><B>DESTRUCTURING-BIND</B></CODE>
|
||
|
expressions. Thus, the outer template is filled in by evaluating the
|
||
|
<CODE><B>DESTRUCTURING-BIND</B></CODE> expressions and interpolating their values.
|
||
|
Each <CODE><B>DESTRUCTURING-BIND</B></CODE> expression in turn contains another
|
||
|
backquoted template, which is used to generate one of the method
|
||
|
definitions to be interpolated in the outer template.</P><P>With this macro defined, the <CODE>define-binary-type</CODE> form given
|
||
|
previously expands to this code:</P><PRE>(progn
|
||
|
(defmethod read-value ((#:g1618 (eql 'iso-8859-1-string)) in &key length)
|
||
|
(let ((string (make-string length)))
|
||
|
(dotimes (i length)
|
||
|
(setf (char string i) (code-char (read-byte in))))
|
||
|
string))
|
||
|
(defmethod write-value ((#:g1618 (eql 'iso-8859-1-string)) out string &key length)
|
||
|
(dotimes (i length)
|
||
|
(write-byte (char-code (char string i)) out))))</PRE><P>Of course, now that you've got this nice macro for defining binary
|
||
|
types, it's tempting to make it do a bit more work. For now you
|
||
|
should just make one small enhancement that will turn out to be
|
||
|
pretty handy when you start using this library to deal with actual
|
||
|
formats such as ID3 tags.</P><P>ID3 tags, like many other binary formats, use lots of primitive types
|
||
|
that are minor variations on a theme, such as unsigned integers in
|
||
|
one-, two-, three-, and four-byte varieties. You could certainly
|
||
|
define each of those types with <CODE>define-binary-type</CODE> as it
|
||
|
stands. Or you could factor out the common algorithm for reading and
|
||
|
writing <I>n</I>-byte unsigned integers into helper functions.</P><P>But suppose you had already defined a binary type,
|
||
|
<CODE>unsigned-integer</CODE>, that accepts a <CODE>:bytes</CODE> parameter to
|
||
|
specify how many bytes to read and write. Using that type, you could
|
||
|
specify a slot representing a one-byte unsigned integer with a type
|
||
|
specifier of <CODE>(unsigned-integer :bytes 1)</CODE>. But if a particular
|
||
|
binary format specifies lots of slots of that type, it'd be nice to
|
||
|
be able to easily define a new type--say, <CODE>u1</CODE>--that means the
|
||
|
same thing. As it turns out, it's easy to change
|
||
|
<CODE>define-binary-type</CODE> to support two forms, a long form
|
||
|
consisting of a <CODE>:reader</CODE> and <CODE>:writer</CODE> pair and a short
|
||
|
form that defines a new binary type in terms of an existing type.
|
||
|
Using a short form <CODE>define-binary-type</CODE>, you can define
|
||
|
<CODE>u1</CODE> like this:</P><PRE>(define-binary-type u1 () (unsigned-integer :bytes 1))</PRE><P>which will expand to this:</P><PRE>(progn
|
||
|
(defmethod read-value ((#:g161887 (eql 'u1)) #:g161888 &key)
|
||
|
(read-value 'unsigned-integer #:g161888 :bytes 1))
|
||
|
(defmethod write-value ((#:g161887 (eql 'u1)) #:g161888 #:g161889 &key)
|
||
|
(write-value 'unsigned-integer #:g161888 #:g161889 :bytes 1)))</PRE><P>To support both long- and short-form <CODE>define-binary-type</CODE> calls,
|
||
|
you need to differentiate based on the value of the <CODE>spec</CODE>
|
||
|
argument. If <CODE>spec</CODE> is two items long, it represents a long-form
|
||
|
call, and the two items should be the <CODE>:reader</CODE> and
|
||
|
<CODE>:writer</CODE> specifications, which you extract as before. On the
|
||
|
other hand, if it's only one item long, the one item should be a type
|
||
|
specifier, which needs to be parsed differently. You can use
|
||
|
<CODE><B>ECASE</B></CODE> to switch on the <CODE><B>LENGTH</B></CODE> of <CODE>spec</CODE> and then parse
|
||
|
<CODE>spec</CODE> and generate an appropriate expansion for either the long
|
||
|
form or the short form.</P><PRE>(defmacro define-binary-type (name (&rest args) &body spec)
|
||
|
(ecase (length spec)
|
||
|
(1
|
||
|
(with-gensyms (type stream value)
|
||
|
(destructuring-bind (derived-from &rest derived-args) (mklist (first spec))
|
||
|
`(progn
|
||
|
(defmethod read-value ((,type (eql ',name)) ,stream &key ,@args)
|
||
|
(read-value ',derived-from ,stream ,@derived-args))
|
||
|
(defmethod write-value ((,type (eql ',name)) ,stream ,value &key ,@args)
|
||
|
(write-value ',derived-from ,stream ,value ,@derived-args))))))
|
||
|
(2
|
||
|
(with-gensyms (type)
|
||
|
`(progn
|
||
|
,(destructuring-bind ((in) &body body) (rest (assoc :reader spec))
|
||
|
`(defmethod read-value ((,type (eql ',name)) ,in &key ,@args)
|
||
|
,@body))
|
||
|
,(destructuring-bind ((out value) &body body) (rest (assoc :writer spec))
|
||
|
`(defmethod write-value ((,type (eql ',name)) ,out ,value &key ,@args)
|
||
|
,@body)))))))</PRE><A NAME="the-current-object-stack"><H2>The Current Object Stack</H2></A><P>One last bit of functionality you'll need in the next chapter is a
|
||
|
way to get at the binary object being read or written while reading
|
||
|
and writing. More generally, when reading or writing nested composite
|
||
|
objects, it's useful to be able to get at any of the objects
|
||
|
currently being read or written. Thanks to dynamic variables and
|
||
|
<CODE>:around</CODE> methods, you can add this enhancement with about a
|
||
|
dozen lines of code. To start, you should define a dynamic variable
|
||
|
that will hold a stack of objects currently being read or written.</P><PRE>(defvar *in-progress-objects* nil)</PRE><P>Then you can define <CODE>:around</CODE> methods on <CODE>read-object</CODE> and
|
||
|
<CODE>write-object</CODE> that push the object being read or written onto
|
||
|
this variable before invoking <CODE><B>CALL-NEXT-METHOD</B></CODE>.</P><PRE>(defmethod read-object :around (object stream)
|
||
|
(declare (ignore stream))
|
||
|
(let ((*in-progress-objects* (cons object *in-progress-objects*)))
|
||
|
(call-next-method)))
|
||
|
|
||
|
(defmethod write-object :around (object stream)
|
||
|
(declare (ignore stream))
|
||
|
(let ((*in-progress-objects* (cons object *in-progress-objects*)))
|
||
|
(call-next-method)))</PRE><P>Note how you rebind <CODE>*in-progress-objects*</CODE> to a list with a new
|
||
|
item on the front rather than assigning it a new value. This way, at
|
||
|
the end of the <CODE><B>LET</B></CODE>, after <CODE><B>CALL-NEXT-METHOD</B></CODE> returns, the old
|
||
|
value of <CODE>*in-progress-objects*</CODE> will be restored, effectively
|
||
|
popping the object of the stack.</P><P>With those two methods defined, you can provide two convenience
|
||
|
functions for getting at specific objects in the in-progress stack.
|
||
|
The function <CODE>current-binary-object</CODE> will return the head of the
|
||
|
stack, the object whose <CODE>read-object</CODE> or <CODE>write-object</CODE>
|
||
|
method was invoked most recently. The other, <CODE>parent-of-type</CODE>,
|
||
|
takes an argument that should be the name of a binary object class
|
||
|
and returns the most recently pushed object of that type, using the
|
||
|
<CODE><B>TYPEP</B></CODE> function that tests whether a given object is an instance
|
||
|
of a particular type.</P><PRE>(defun current-binary-object () (first *in-progress-objects*))
|
||
|
|
||
|
(defun parent-of-type (type)
|
||
|
(find-if #'(lambda (x) (typep x type)) *in-progress-objects*))</PRE><P>These two functions can be used in any code that will be called
|
||
|
within the dynamic extent of a <CODE>read-object</CODE> or
|
||
|
<CODE>write-object</CODE> call. You'll see one example of how
|
||
|
<CODE>current-binary-object</CODE> can be used in the next
|
||
|
chapter.<SUP>11</SUP></P><P>Now you have all the tools you need to tackle an ID3 parsing library,
|
||
|
so you're ready to move onto the next chapter where you'll do just
|
||
|
that.
|
||
|
</P><HR/><DIV CLASS="notes"><P><SUP>1</SUP>In
|
||
|
ASCII, the first 32 characters are nonprinting <I>control characters</I>
|
||
|
originally used to control the behavior of a Teletype machine,
|
||
|
causing it to do such things as sound the bell, back up one
|
||
|
character, move to a new line, and move the carriage to the beginning
|
||
|
of the line. Of these 32 control characters, only three, the newline,
|
||
|
carriage return, and horizontal tab, are typically found in text
|
||
|
files.</P><P><SUP>2</SUP>Some
|
||
|
binary file formats <I>are</I> in-memory data structures--on many
|
||
|
operating systems it's possible to map a file into memory, and
|
||
|
low-level languages such as C can then treat the region of memory
|
||
|
containing the contents of the file just like any other memory; data
|
||
|
written to that area of memory is saved to the underlying file when
|
||
|
it's unmapped. However, these formats are platform-dependent since
|
||
|
the in-memory representation of even such simple data types as
|
||
|
integers depends on the hardware on which the program is running.
|
||
|
Thus, any file format that's intended to be portable must define a
|
||
|
canonical representation for all the data types it uses that can be
|
||
|
mapped to the actual in-memory data representation on a particular
|
||
|
kind of machine or in a particular language.</P><P><SUP>3</SUP>The term <I>big-endian</I> and its opposite,
|
||
|
<I>little-endian</I>, borrowed from Jonathan Swift's <I>Gulliver's
|
||
|
Travels</I>, refer to the way a multibyte number is represented in an
|
||
|
ordered sequence of bytes such as in memory or in a file. For
|
||
|
instance, the number 43981, or <CODE>abcd</CODE> in hex, represented as a
|
||
|
16-bit quantity, consists of two bytes, <CODE>ab</CODE> and <CODE>cd</CODE>. It
|
||
|
doesn't matter to a computer in what order these two bytes are stored
|
||
|
as long as everybody agrees. Of course, whenever there's an arbitrary
|
||
|
choice to be made between two equally good options, the one thing you
|
||
|
can be sure of is that everybody is not going to agree. For more than
|
||
|
you ever wanted to know about it, and to see where the terms
|
||
|
<I>big-endian</I> and <I>little-endian</I> were first applied in this
|
||
|
fashion, read "On Holy Wars and a Plea for Peace" by Danny Cohen,
|
||
|
available at
|
||
|
<CODE>http://khavrinen.lcs.mit.edu/wollman/ien-137.txt</CODE>.</P><P><SUP>4</SUP><CODE><B>LDB</B></CODE> and <CODE><B>DPB</B></CODE>, a
|
||
|
related function, were named after the DEC PDP-10 assembly functions
|
||
|
that did essentially the same thing. Both functions operate on
|
||
|
integers as if they were represented using twos-complement format,
|
||
|
regardless of the internal representation used by a particular Common
|
||
|
Lisp implementation.</P><P><SUP>5</SUP>Common Lisp
|
||
|
also provides functions for shifting and masking the bits of integers
|
||
|
in a way that may be more familiar to C and Java programmers. For
|
||
|
instance, you could write <CODE>read-u2</CODE> yet a third way, using those
|
||
|
functions, like this:</P><PRE>(defun read-u2 (in)
|
||
|
(logior (ash (read-byte in) 8) (read-byte in)))</PRE><P>which would be roughly equivalent to this Java method:</P><PRE>public int readU2 (InputStream in) throws IOException {
|
||
|
return (in.read() << 8) | (in.read());
|
||
|
}</PRE><P>The names <CODE><B>LOGIOR</B></CODE> and <CODE><B>ASH</B></CODE> are short for <I>LOGical Inclusive
|
||
|
OR</I> and <I>Arithmetic SHift</I>. <CODE><B>ASH</B></CODE> shifts an integer a given
|
||
|
number of bits to the left when its second argument is positive or to
|
||
|
the right if the second argument is negative. <CODE><B>LOGIOR</B></CODE> combines
|
||
|
integers by logically <I>or</I>ing each bit. Another function,
|
||
|
<CODE><B>LOGAND</B></CODE>, performs a bitwise <I>and</I>, which can be used to mask off
|
||
|
certain bits. However, for the kinds of bit twiddling you'll need to
|
||
|
do in this chapter and the next, <CODE><B>LDB</B></CODE> and <CODE><B>BYTE</B></CODE> will be both
|
||
|
more convenient and more idiomatic Common Lisp style.</P><P><SUP>6</SUP>Originally, UTF-8 was designed to
|
||
|
represent a 31-bit character code and used up to six bytes per code
|
||
|
point. However, the maximum Unicode code point is <CODE>#x10ffff</CODE>, so
|
||
|
a UTF-8 encoding of Unicode requires at most four bytes per code
|
||
|
point.</P><P><SUP>7</SUP>If you need to parse a file format that
|
||
|
uses other character codes, or if you need to parse files containing
|
||
|
arbitrary Unicode strings using a non-Unicode-Common-Lisp
|
||
|
implementation, you can always represent such strings in memory as
|
||
|
vectors of integer code points. They won't be Lisp strings, so you
|
||
|
won't be able to manipulate or compare them with the string
|
||
|
functions, but you'll still be able to do anything with them that you
|
||
|
can with arbitrary vectors.</P><P><SUP>8</SUP>Unfortunately, the language itself doesn't always
|
||
|
provide a good model in this respect: the macro <CODE><B>DEFSTRUCT</B></CODE>, which
|
||
|
I don't discuss since it has largely been superseded by <CODE><B>DEFCLASS</B></CODE>,
|
||
|
generates functions with names that it generates based on the name of
|
||
|
the structure it's given. <CODE><B>DEFSTRUCT</B></CODE>'s bad example leads many new
|
||
|
macro writers astray.</P><P><SUP>9</SUP>Technically there's no possibility of
|
||
|
<CODE>type</CODE> or <CODE>object</CODE> conflicting with slot names--at worst
|
||
|
they'd be shadowed within the <CODE><B>WITH-SLOTS</B></CODE> form. But it doesn't
|
||
|
hurt anything to simply <CODE><B>GENSYM</B></CODE> all local variable names used
|
||
|
within a macro template.</P><P><SUP>10</SUP>Using <CODE><B>ASSOC</B></CODE> to extract the <CODE>:reader</CODE> and
|
||
|
<CODE>:writer</CODE> elements of <CODE>spec</CODE> allows users of
|
||
|
<CODE>define-binary-type</CODE> to include the elements in either order; if
|
||
|
you required the <CODE>:reader</CODE> element to be always be first, you
|
||
|
could then have used <CODE>(rest (first spec))</CODE> to extract the reader
|
||
|
and <CODE>(rest (second spec))</CODE> to extract the writer. However, as
|
||
|
long as you require the <CODE>:reader</CODE> and <CODE>:writer</CODE> keywords to
|
||
|
improve the readability of <CODE>define-binary-type</CODE> forms, you might
|
||
|
as well use them to extract the correct data.</P><P><SUP>11</SUP>The ID3 format doesn't require the
|
||
|
<CODE>parent-of-type</CODE> function since it's a relatively flat
|
||
|
structure. This function comes into its own when you need to parse a
|
||
|
format made up of many deeply nested structures whose parsing depends
|
||
|
on information stored in higher-level structures. For example, in the
|
||
|
Java class file format, the top-level class file structure contains a
|
||
|
<I>constant pool</I> that maps numeric values used in other
|
||
|
substructures within the class file to constant values that are
|
||
|
needed while parsing those substructures. If you were writing a class
|
||
|
file parser, you could use <CODE>parent-of-type</CODE> in the code that
|
||
|
reads and writes those substructures to get at the top-level class
|
||
|
file object and from there to the constant pool.</P></DIV></BODY></HTML>
|