coleslaw/docs/hacking.md

## Coleslaw: A Hacker's Guide

Here we'll provide an overview of key concepts and technical decisions
in *coleslaw* and a few suggestions about future directions. Please
keep in mind that *coleslaw* was written on a lark when 3 friends had
the idea to each complete their half-dreamed wordpress replacement in
a week. Though it has evolved considerably since it's inception, like
any software some mess remains.

## Core Concepts

### Data and Deployment

**Coleslaw** is pretty fundamentally tied to the idea of git as both a
backing data store and a deployment method (via `git push`). The
consequence is that you need a bare repo somewhere with a post-recieve
hook. That post-recieve hook
([example](https://github.com/redline6561/coleslaw/blob/master/examples/example.post-receive))
will checkout the repo to a **$TMPDIR** and call `(coleslaw:main $TMPDIR)`.

It is then coleslaw's job to load all of your content, your config and
templates, and render the content to disk. Deployment is done by
moving the files to a location specified in the config and updating a
symlink.  It is assumed a web server is set up to serve from that
symlink. However, there are plugins for deploying to Heroku, S3, and
Github Pages.

### Blogs vs Sites

**Coleslaw** is blogware. When I designed it, I only cared that it
could replace my server's wordpress install. As a result, the code
until very recently was structured in terms of POSTs and
INDEXes. Roughly speaking, a POST is a blog entry and an INDEX is a
collection of POSTs or other content. An INDEX really only serves to
group a set of content objects on a page, it isn't content itself.

This isn't ideal if you're looking for a full-on static site
generator.  Content Types were added in 0.8 as a step towards making
*coleslaw* suitable for more use cases but still have some
limitations. Any subclass of CONTENT that implements the *document
protocol* counts as a content type. However, only POSTs are currently
included on INDEXes since their isn't yet a formal relationship to
determine what content types should be included on which indexes.

### The Document Protocol

The *document protocol* was born during a giant refactoring in 0.9.3.
Any object that will be rendered to HTML should adhere to the protocol.
Subclasses of CONTENT (content types) that implement the protocol will
be seamlessly picked up by *coleslaw* and included on the rendered site.

All current Content Types and Indexes implement the protocol faithfully.
It consists of 2 "class" methods, 2 instance methods, and an invariant.


**Class Methods**:

Since Common Lisp doesn't have explicit support for class methods, we
implement them by eql-specializing on the class, e.g.
```lisp
(defmethod foo ((doc-type (eql (find-class 'bar))))
  ... )
```

- `discover`: Create instances for documents of the class and put them in
  in-memory database with `add-document`. If your class is a subclass of
  CONTENT, there is a default method for this.
- `publish`: Iterate over all objects of the class


**Instance Methods**:

- `page-url`: Generate a unique, relative path for the object on the site
  sans file extension. An :around method adds that later. The `slug` slot
  on the object is generally used to hold a portion of the unique
  identifier. i.e. `(format nil "posts/~a" (content-slug object))`.
- `render`: A method that calls the appropriate template with `theme-fn`,
  passing it any needed arguments and returning rendered HTML.


**Invariants**:

- Any Content Types (subclasses of CONTENT) are expected to be stored in
  the site's git repo with the lowercased class-name as a file extension,
  i.e. (".post" for POST files).

### Current Content Types & Indexes

There are 5 INDEX subclasses at present: TAG-INDEX, MONTH-INDEX,
NUMERIC-INDEX, FEED, and TAG-FEED. Respectively, they support
grouping content by tags, publishing date, and reverse chronological
order. Feeds exist to special case RSS and ATOM generation.
Currently, there is only 1 content type: POST, for blog entries.

### Templates and Theming

User configs are allowed to specify a theme, otherwise the default is
used. A theme consists of a directory under "themes/" containing css,
images, and at least 3 templates: Base, Index, and Post.

**Coleslaw** uses
[cl-closure-template](https://github.com/archimag/cl-closure-template)
exclusively for templating. **cl-closure-template** is a well
documented CL implementation of Google's Closure Templates. Each
template file should contain a namespace like
`coleslaw.theme.theme-name`.

Each template creates a lisp function in the theme's package when
loaded. These functions take a property list (or plist) as an argument
and return rendered HTML.  **Coleslaw** defines a helper called
`theme-fn` for easy access to the template functions. Additionally,
there are RSS, ATOM, and sitemap templates *coleslaw* uses automatically.
No need for individual themes to reimplement a standard, after all!

### Plugins

**Coleslaw** also encourages extending functionality via plugins. The Plugin
API is well-documented and flexible enough for many use cases. Do check the
[API docs](https://github.com/redline6561/coleslaw/blob/master/docs/plugin-api.md)
when contemplating a new feature and see if a plugin would be appropriate.

### The Lifecycle of a Page

- `(load-content)`

A page starts, obviously, with a file. When *coleslaw* loads your
content, it iterates over a list of content types (i.e. subclasses of
CONTENT).  For each content type, it iterates over all files in the
repo with a matching extension, e.g. ".post" for POSTs. Objects of the
appropriate class are created from each matching file and inserted
into the an in-memory data store. Then the INDEXes are created by
iterating over the POSTs and inserted into the data store.

- `(compile-blog dir)`

Compilation starts by ensuring the staging directory (`/tmp/coleslaw/`
by default) exists, cd'ing there, and copying over any necessary theme
assets. Then *coleslaw* iterates over the content types and index
classes, calling the `publish` method on each one. Publish iterates
over the class instances, rendering each one and writing the result
out to disk with `write-file`. After this, an 'index.html' symlink is
created to point to the first index.

- `(deploy dir)`

Finally, we move the staging directory to a timestamped path under the
the config's `:deploy-dir`, delete the directory pointed to by the old
'.prev' symlink, point '.curr' at '.prev', and point '.curr' at our
freshly built site.

## Areas for Improvement

### Allow Arbitrary Repo Structure

Currently, *coleslaw* expects all posts to be in the top-level of the
blog repo. There is no technical reason that coleslaw should care.
The only change that needs to be made is to the `do-files` macro
used during content discovery. In particular, it should probably
use `cl-fad:walk-directory` instead of `cl-fad:list-directory`.

### Allow Tagless or Dateless Content

Several users have expected to be able to not supply tags or a date
for their content. This is a reasonable expectation and requires
changes to at least the post templates and the `read-content`
function. There may be other areas where it was assumed tags/dates
will always be present.

### Render Function Cleanup

There are currently 3 render-foo* functions and 3 implementations of the
render method. Only the render-foo* functions call `write-file` so there
should be some room for cleanup here. The render method implementations
are probably necessary unless we want to start storing their arguments
on the models. There may be a different way to abstract the data flow.

### User-Defined Routing

There is no reason *coleslaw* should be in charge of the site layout or
should care. If all objects only used the *slug* slot in their `page-url`
methods, there could be a :routing argument in the config containing
a plist of `(:class "~{format string~}")` pairs. A default method could
check the :class key under `(routing *config*)` if no specialized
`page-url` was defined. This would have the additional benefit of
localizing all the site routing in one place. New Content Types would
probably `pushnew` a plist onto the config key in their `enable` function.
This has been implemented on the branch `user-defined-routing`.

### New Content Type: Pages!

Many users have requested a content type PAGE, for static pages. It
should be a pretty straightforward subclass of CONTENT with the
necessary methods: `render`, `page-url` and `publish`. It could have a
`url` slot with `page-url` as a reader to allow arbitrary layout on
the site.  The big question is how to handle templating and how
indexes or other content should link to it.

### New Content Type: Shouts!

I've also toyed with the idea of a content type called a SHOUT, which
would be used primarily to reference or embed other content, sort of a
mix between a retweet and a del.icio.us bookmark. We encounter plenty
of great things on the web. Most of mine winds up forgotten in browser
tabs or stored on twitter's servers. It would be cool to see SHOUTs as
a plugin, probably with a dedicated SHOUT-INDEX, and some sort of
oEmbed/embed.ly/noembed support.

### Better Content Types

Creating a new content type is both straightforward and doable as a
plugin. All that is really required is a subclass of CONTENT with
any needed slots, a template, a `render` method to call the template
with any needed options, a `page-url` method for layout, and a
`publish` method.

Unfortunately, this does not solve:

1. The issue of compiling the template at load-time and making sure it
   was installed in the theme package. The plugin would need to do
   this itself or the template would need to be included in 'core'.
   Thankfully, this should be easy with *cl-closure-template*.
2. More seriously, there is no formal relationship between content
   types and indexes. Consequentially, INDEXes include only POST
   objects at the moment. Whether the INDEX should specify what
   Content Types it includes or the CONTENT which indexes it appears
   on is not yet clear.

### Incremental Compilation

Incremental compilation is doable, even straightforward if you ignore
indexes. It is also preferable to building the site in parallel as
avoiding work is better than using more workers. Moreover, being
able to determine (and expose) what files just changed enables new
functionality such as plugins that cross-post to tumblr.

Git's post-receieve hook is supposed to get a list of refs on $STDIN.
A brave soul could update our post-receive script to figure out the
original hash and pass that along to `coleslaw:main`. We could then
use it to run `git diff --name-status $HASH HEAD` to find changed
files and act accordingly.

This is a cool project and the effects are far reaching. Among other
things the existing deployment model would not work as it involves
rebuilding the entire site. In all likelihood we would want to update
the site 'in-place'. Atomicity of filesystem operations would be a
reasonable concern. Also, every numbered INDEX would have to be
regenerated along with any tag or month indexes matching the
modified files. If incremental compilation is a goal, simply
disabling the indexes may be appropriate for certain users.
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`## Coleslaw: A Hacker's Guide`

			`Here we'll provide an overview of key concepts and technical decisions`
			`in coleslaw and a few suggestions about future directions. Please`
			`keep in mind that coleslaw was written on a lark when 3 friends had`
			`the idea to each complete their half-dreamed wordpress replacement in`
			`a week. Though it has evolved considerably since it's inception, like`
			`any software some mess remains.`

			`## Core Concepts`

			`### Data and Deployment`

			`Coleslaw is pretty fundamentally tied to the idea of git as both a`
			backing data store and a deployment method (via `git push`). The
			`consequence is that you need a bare repo somewhere with a post-recieve`
			`hook. That post-recieve hook`
			`([example](https://github.com/redline6561/coleslaw/blob/master/examples/example.post-receive))`
			will checkout the repo to a $TMPDIR and call `(coleslaw:main $TMPDIR)`.

			`It is then coleslaw's job to load all of your content, your config and`
			`templates, and render the content to disk. Deployment is done by`
Update hacking docs. 2014-04-15 22:05:26 -04:00			`moving the files to a location specified in the config and updating a`
			`symlink. It is assumed a web server is set up to serve from that`
			`symlink. However, there are plugins for deploying to Heroku, S3, and`
			`Github Pages.`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
			`### Blogs vs Sites`

			`Coleslaw is blogware. When I designed it, I only cared that it`
Update hacking docs. 2014-04-15 22:05:26 -04:00			`could replace my server's wordpress install. As a result, the code`
			`until very recently was structured in terms of POSTs and`
			`INDEXes. Roughly speaking, a POST is a blog entry and an INDEX is a`
			`collection of POSTs or other content. An INDEX really only serves to`
			`group a set of content objects on a page, it isn't content itself.`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
			`This isn't ideal if you're looking for a full-on static site`
			`generator. Content Types were added in 0.8 as a step towards making`
			`coleslaw suitable for more use cases but still have some`
Update hacking docs. 2014-04-15 22:05:26 -04:00			`limitations. Any subclass of CONTENT that implements the *document`
			`protocol* counts as a content type. However, only POSTs are currently`
			`included on INDEXes since their isn't yet a formal relationship to`
			`determine what content types should be included on which indexes.`

			`### The Document Protocol`

			`The document protocol was born during a giant refactoring in 0.9.3.`
			`Any object that will be rendered to HTML should adhere to the protocol.`
			`Subclasses of CONTENT (content types) that implement the protocol will`
			`be seamlessly picked up by coleslaw and included on the rendered site.`

			`All current Content Types and Indexes implement the protocol faithfully.`
			`It consists of 2 "class" methods, 2 instance methods, and an invariant.`


Tweak some formatting in the hacking docs. 2014-04-15 22:29:23 -04:00			`Class Methods:`
Update hacking docs. 2014-04-15 22:05:26 -04:00
			`Since Common Lisp doesn't have explicit support for class methods, we`
			`implement them by eql-specializing on the class, e.g.`
			```lisp
			`(defmethod foo ((doc-type (eql (find-class 'bar))))`
			`... )`
			```

			- `discover`: Create instances for documents of the class and put them in
			in-memory database with `add-document`. If your class is a subclass of
			`CONTENT, there is a default method for this.`
			- `publish`: Iterate over all objects of the class


Tweak some formatting in the hacking docs. 2014-04-15 22:29:23 -04:00			`Instance Methods:`
Update hacking docs. 2014-04-15 22:05:26 -04:00
			- `page-url`: Generate a unique, relative path for the object on the site
			sans file extension. An :around method adds that later. The `slug` slot
			`on the object is generally used to hold a portion of the unique`
			identifier. i.e. `(format nil "posts/~a" (content-slug object))`.
			- `render`: A method that calls the appropriate template with `theme-fn`,
			`passing it any needed arguments and returning rendered HTML.`


Tweak some formatting in the hacking docs. 2014-04-15 22:29:23 -04:00			`Invariants:`
Update hacking docs. 2014-04-15 22:05:26 -04:00
			`- Any Content Types (subclasses of CONTENT) are expected to be stored in`
			`the site's git repo with the lowercased class-name as a file extension,`
			`i.e. (".post" for POST files).`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
Mass rename indices->indexes. 2014-04-07 20:54:45 -04:00			`### Current Content Types & Indexes`
Add some content type details to hacking.md. Minor renaming. 2014-04-06 17:16:25 -04:00
Massive indexes rewrite. 2014-04-15 15:27:46 -04:00			`There are 5 INDEX subclasses at present: TAG-INDEX, MONTH-INDEX,`
Make feeds a subclass of indexes. Cleanup! 2014-04-08 16:51:13 -04:00			`NUMERIC-INDEX, FEED, and TAG-FEED. Respectively, they support`
			`grouping content by tags, publishing date, and reverse chronological`
			`order. Feeds exist to special case RSS and ATOM generation.`
			`Currently, there is only 1 content type: POST, for blog entries.`
Add some content type details to hacking.md. Minor renaming. 2014-04-06 17:16:25 -04:00
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`### Templates and Theming`

			`User configs are allowed to specify a theme, otherwise the default is`
			`used. A theme consists of a directory under "themes/" containing css,`
			`images, and at least 3 templates: Base, Index, and Post.`

Fix FEEDS definition thinko and tweak some more docs. 2014-04-14 22:12:52 -04:00			`Coleslaw uses`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`[cl-closure-template](https://github.com/archimag/cl-closure-template)`
Fix FEEDS definition thinko and tweak some more docs. 2014-04-14 22:12:52 -04:00			`exclusively for templating. cl-closure-template is a well`
			`documented CL implementation of Google's Closure Templates. Each`
			`template file should contain a namespace like`
			`coleslaw.theme.theme-name`.
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
			`Each template creates a lisp function in the theme's package when`
			`loaded. These functions take a property list (or plist) as an argument`
			`and return rendered HTML. Coleslaw defines a helper called`
Update hacking docs. 2014-04-15 22:05:26 -04:00			`theme-fn` for easy access to the template functions. Additionally,
			`there are RSS, ATOM, and sitemap templates coleslaw uses automatically.`
			`No need for individual themes to reimplement a standard, after all!`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
Remove docs TODO, add small Plugins section. 2014-04-16 10:06:56 -04:00			`### Plugins`

			`Coleslaw also encourages extending functionality via plugins. The Plugin`
			`API is well-documented and flexible enough for many use cases. Do check the`
			`[API docs](https://github.com/redline6561/coleslaw/blob/master/docs/plugin-api.md)`
			`when contemplating a new feature and see if a plugin would be appropriate.`

Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`### The Lifecycle of a Page`

			- `(load-content)`

Update hacking docs. 2014-04-15 22:05:26 -04:00			`A page starts, obviously, with a file. When coleslaw loads your`
			`content, it iterates over a list of content types (i.e. subclasses of`
			`CONTENT). For each content type, it iterates over all files in the`
			`repo with a matching extension, e.g. ".post" for POSTs. Objects of the`
			`appropriate class are created from each matching file and inserted`
			`into the an in-memory data store. Then the INDEXes are created by`
			`iterating over the POSTs and inserted into the data store.`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
			- `(compile-blog dir)`

			Compilation starts by ensuring the staging directory (`/tmp/coleslaw/`
			`by default) exists, cd'ing there, and copying over any necessary theme`
Rename write-page -> write-file. Clean up some docstrings. 2014-04-18 12:12:57 -04:00			`assets. Then coleslaw iterates over the content types and index`
			classes, calling the `publish` method on each one. Publish iterates
			`over the class instances, rendering each one and writing the result`
			out to disk with `write-file`. After this, an 'index.html' symlink is
			`created to point to the first index.`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
			- `(deploy dir)`

			`Finally, we move the staging directory to a timestamped path under the`
			the config's `:deploy-dir`, delete the directory pointed to by the old
			`'.prev' symlink, point '.curr' at '.prev', and point '.curr' at our`
			`freshly built site.`

			`## Areas for Improvement`

Add some TODOs to hacking.md 2014-04-28 13:28:02 -04:00			`### Allow Arbitrary Repo Structure`

			`Currently, coleslaw expects all posts to be in the top-level of the`
			`blog repo. There is no technical reason that coleslaw should care.`
			The only change that needs to be made is to the `do-files` macro
			`used during content discovery. In particular, it should probably`
			use `cl-fad:walk-directory` instead of `cl-fad:list-directory`.

			`### Allow Tagless or Dateless Content`

			`Several users have expected to be able to not supply tags or a date`
			`for their content. This is a reasonable expectation and requires`
			changes to at least the post templates and the `read-content`
			`function. There may be other areas where it was assumed tags/dates`
			`will always be present.`

Update hacking docs. 2014-04-15 22:05:26 -04:00			`### Render Function Cleanup`
More comments and docs tweaks. 2014-04-15 19:25:19 -04:00
Update hacking docs. 2014-04-15 22:05:26 -04:00			`There are currently 3 render-foo* functions and 3 implementations of the`
Rename write-page -> write-file. Clean up some docstrings. 2014-04-18 12:12:57 -04:00			render method. Only the render-foo* functions call `write-file` so there
Update hacking docs. 2014-04-15 22:05:26 -04:00			`should be some room for cleanup here. The render method implementations`
			`are probably necessary unless we want to start storing their arguments`
			`on the models. There may be a different way to abstract the data flow.`

			`### User-Defined Routing`

			`There is no reason coleslaw should be in charge of the site layout or`
			should care. If all objects only used the slug slot in their `page-url`
			`methods, there could be a :routing argument in the config containing`
			a plist of `(:class "~{format string~}")` pairs. A default method could
			check the :class key under `(routing config)` if no specialized
			`page-url` was defined. This would have the additional benefit of
			`localizing all the site routing in one place. New Content Types would`
			probably `pushnew` a plist onto the config key in their `enable` function.
Note user-defined routing branch. 2014-04-16 11:22:36 -04:00			This has been implemented on the branch `user-defined-routing`.
More comments and docs tweaks. 2014-04-15 19:25:19 -04:00
Move Page content type to "Areas for Improvement", arrange by rough difficulty. 2014-04-16 11:29:16 -04:00			`### New Content Type: Pages!`

			`Many users have requested a content type PAGE, for static pages. It`
			`should be a pretty straightforward subclass of CONTENT with the`
			necessary methods: `render`, `page-url` and `publish`. It could have a
			`url` slot with `page-url` as a reader to allow arbitrary layout on
			`the site. The big question is how to handle templating and how`
			`indexes or other content should link to it.`

			`### New Content Type: Shouts!`

			`I've also toyed with the idea of a content type called a SHOUT, which`
			`would be used primarily to reference or embed other content, sort of a`
			`mix between a retweet and a del.icio.us bookmark. We encounter plenty`
			`of great things on the web. Most of mine winds up forgotten in browser`
			`tabs or stored on twitter's servers. It would be cool to see SHOUTs as`
			`a plugin, probably with a dedicated SHOUT-INDEX, and some sort of`
			`oEmbed/embed.ly/noembed support.`

Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`### Better Content Types`

Update hacking docs. 2014-04-15 22:05:26 -04:00			`Creating a new content type is both straightforward and doable as a`
			`plugin. All that is really required is a subclass of CONTENT with`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			any needed slots, a template, a `render` method to call the template
			with any needed options, a `page-url` method for layout, and a
			`publish` method.

			`Unfortunately, this does not solve:`

			`1. The issue of compiling the template at load-time and making sure it`
			`was installed in the theme package. The plugin would need to do`
			`this itself or the template would need to be included in 'core'.`
Update hacking docs. 2014-04-15 22:05:26 -04:00			`Thankfully, this should be easy with cl-closure-template.`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`2. More seriously, there is no formal relationship between content`
Update hacking docs. 2014-04-15 22:05:26 -04:00			`types and indexes. Consequentially, INDEXes include only POST`
			`objects at the moment. Whether the INDEX should specify what`
			`Content Types it includes or the CONTENT which indexes it appears`
			`on is not yet clear.`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00
			`### Incremental Compilation`

			`Incremental compilation is doable, even straightforward if you ignore`
Mass rename indices->indexes. 2014-04-07 20:54:45 -04:00			`indexes. It is also preferable to building the site in parallel as`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`avoiding work is better than using more workers. Moreover, being`
			`able to determine (and expose) what files just changed enables new`
			`functionality such as plugins that cross-post to tumblr.`

			`Git's post-receieve hook is supposed to get a list of refs on $STDIN.`
			`A brave soul could update our post-receive script to figure out the`
			original hash and pass that along to `coleslaw:main`. We could then
			use it to run `git diff --name-status $HASH HEAD` to find changed
			`files and act accordingly.`

			`This is a cool project and the effects are far reaching. Among other`
			`things the existing deployment model would not work as it involves`
			`rebuilding the entire site. In all likelihood we would want to update`
			`the site 'in-place'. Atomicity of filesystem operations would be a`
			`reasonable concern. Also, every numbered INDEX would have to be`
Mass rename indices->indexes. 2014-04-07 20:54:45 -04:00			`regenerated along with any tag or month indexes matching the`
Add HACKING docs and minor tweaks. 2014-03-25 18:06:18 -04:00			`modified files. If incremental compilation is a goal, simply`
Mass rename indices->indexes. 2014-04-07 20:54:45 -04:00			`disabling the indexes may be appropriate for certain users.`