<h1id="title-non-xs"><ahref="index.html">The Common Lisp Cookbook</a>– Performance Tuning and Tips</h1>
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
<pclass="announce">
📹 <ahref="https://www.udemy.com/course/common-lisp-programming/?couponCode=6926D599AA-LISP4ALL">NEW! Learn Lisp in videos and support our contributors with this 40% discount.</a>
</p>
<pclass="announce-neutral">
📕 <ahref="index.html#download-in-epub">Get the EPUB and PDF</a>
</p>
<divid="content"
<p>Many Common Lisp implementations translate the source code into assembly
language, so the performance is really good compared with some other
interpreted languages.</p>
<p>However, sometimes we just want the program to be faster. This chapter
introduces some techniques to squeeze the CPU power out.</p>
<p>The macro <ahref="http://www.lispworks.com/documentation/lw51/CLHS/Body/m_time.htm"><code>time</code></a> is very useful for finding out bottlenecks. It takes
a form, evaluates it and prints timing information in
<ahref="http://www.lispworks.com/documentation/lw71/CLHS/Body/v_debug_.htm#STtrace-outputST"><code>*trace-output*</code></a>, as shown below:</p>
0.000001 seconds of total run time (0.000001 user, 0.000000 system)
100.00% CPU
3,800 processor cycles
0 bytes consed
</code></pre>
<p>By using the <code>time</code> macro it is fairly easy to find out which part of your program
takes too much time.</p>
<p>Please note that the timing information provided here is not guaranteed to be
reliable enough for marketing comparisons. It should only be used for tuning
purpose, as demonstrated in this chapter.</p>
<h3id="know-your-lisps-statistical-profiler">Know your Lisp’s statistical profiler</h3>
<p>Implementations ship their own profilers. SBCL has
<ahref="http://www.sbcl.org/manual/#Deterministic-Profiler">sb-profile</a>, a
“classic, per-function-call” deterministic profiler and
<ahref="http://www.sbcl.org/manual/#Statistical-Profiler">sb-sprof</a>, a
statistical profiler. The latter works by taking samples of the
program execution at regular intervals, instead of instrumenting
functions like <code>sb-profile:profile</code> does.</p>
<blockquote>
<p>You might find sb-sprof more useful than the deterministic profiler when profiling functions in the common-lisp-package, SBCL internals, or code where the instrumenting overhead is excessive.</p>
</blockquote>
<h3id="use-flamegraphs-and-other-tracing-profilers">Use flamegraphs and other tracing profilers</h3>
<p><ahref="https://github.com/40ants/cl-flamegraph">cl-flamegraph</a> is a wrapper around SBCL’s statistical profiler to generate FlameGraph charts. Flamegraphs are a very visual way to search for hotspots in your code:</p>
<p><imgsrc="assets/cl-flamegraph.png"alt=""/></p>
<p>See also <ahref="https://github.com/TeMPOraL/tracer">tracer</a>, a tracing
profiler for SBCL. Its output is suitable for display in
Chrome’s or Chromium’s Tracing Viewer (<code>chrome://tracing</code>).</p>
<p>The function <ahref="http://www.lispworks.com/documentation/lw60/CLHS/Body/f_disass.htm"><code>disassemble</code></a> takes a function and prints the
compiled code of it to <code>*standard-output*</code>. For example:</p>
<pre><codeclass="language-lisp">* (defun plus (a b)
(+ a b))
PLUS
* (disassemble 'plus)
; disassembly for PLUS
; Size: 37 bytes. Origin: #x52B8063B
; 3B: 498B5D60 MOV RBX, [R13+96] ; no-arg-parsing entry point
<p>The <ahref="http://www.lispworks.com/documentation/lw71/CLHS/Body/s_declar.htm"><em>declare expression</em></a> can be used to provide hints for compilers
to perform various optimization. Please note that these hints are
implementation-dependent. Some implementations such as SBCL support this
feature, and you may refer to their own documentation for detailed
information. Here only some basic techniques mentioned in CLHS are introduced.</p>
<p>In general, declare expressions can occur only at the beginning of the bodies
of certain forms, or immediately after a documentation string if the context
allows. Also, the content of a declare expression is restricted to limited
forms. Here we introduce some of them that are related to performance tuning.</p>
<p>Please keep in mind that these optimization skills introduced in this section
are strongly connected to the Lisp implementation selected. Always check their
documentation before using <code>declare</code>!</p>
<h3id="speed-and-safety">Speed and Safety</h3>
<p>Lisp allows you to specify several quality properties for the compiler using
the declaration <ahref="http://www.lispworks.com/documentation/lw71/CLHS/Body/d_optimi.htm"><code>optimize</code></a>. Each quality may be assigned a value
from 0 to 3, with 0 being “totally unimportant” and 3 being “extremely
important”.</p>
<p>The most significant qualities might be <code>safety</code> and <code>speed</code>.</p>
<p>By default, Lisp considers code safety to be much more important than
speed. But you may adjust the weight for more aggressive optimization.</p>
<pre><codeclass="language-lisp">* (defun max-original (a b)
(max a b))
MAX-ORIGINAL
* (disassemble 'max-original)
; disassembly for MAX-ORIGINAL
; Size: 144 bytes. Origin: #x52D450EF
; 7A7: 8D46F1 lea eax, [rsi-15] ; no-arg-parsing entry point
<p>The size of generated assembly code shrunk to about 1/3 of the size. What
about speed?</p>
<pre><codeclass="language-lisp">* (time (dotimes (i 10000) (max-original 100 200)))
Evaluation took:
0.000 seconds of real time
0.000107 seconds of total run time (0.000088 user, 0.000019 system)
100.00% CPU
361,088 processor cycles
0 bytes consed
* (time (dotimes (i 10000) (max-with-type 100 200)))
Evaluation took:
0.000 seconds of real time
0.000044 seconds of total run time (0.000036 user, 0.000008 system)
100.00% CPU
146,960 processor cycles
0 bytes consed
</code></pre>
<p>You see, by specifying type hints, our code runs much faster!</p>
<p>But wait…What happens if we declare wrong types? The answer is: it depends.</p>
<p>For example, SBCL treats type declarations in a <ahref="http://sbcl.org/manual/index.html#Handling-of-Types">special way</a>. It
performs different levels of type checking according to the safety level. If
safety level is set to 0, no type checking will be performed. Thus a wrong
type specifier might cause a lot of damage.</p>
<h3id="more-on-type-declaration-with-declaim">More on Type Declaration with <code>declaim</code></h3>
<p>If you try to evaluate a <code>declare</code> form in the top level, you might get the
following error:</p>
<pre><codeclass="language-lisp">Execution of a form compiled with errors.
Form:
(DECLARE (SPEED 3))
Compile-time error:
There is no function named DECLARE. References to DECLARE in some contexts
(like starts of blocks) are unevaluated expressions, but here the expression is
being evaluated, which invokes undefined behaviour.
[Condition of type SB-INT:COMPILED-PROGRAM-ERROR]
</code></pre>
<p>This is because type declarations have <ahref="http://www.lispworks.com/documentation/lw71/CLHS/Body/03_cd.htm">scopes</a>. In the
examples above, we have seen type declarations applied to a function.</p>
<p>During development it is usually useful to raise the importance of safety in
order to find out potential problems as soon as possible. On the contrary,
speed might be more important after deployment. However, it might be too
verbose to specify declaration expression for each single function.</p>
<p>The macro <ahref="http://www.lispworks.com/documentation/lw71/CLHS/Body/m_declai.htm"><code>declaim</code></a> provides such possibility. It can be used as a
top level form in a file and the declarations will be made at compile-time.</p>
<p>The declaration <ahref="http://www.lispworks.com/documentation/lw51/CLHS/Body/d_inline.htm"><code>inline</code></a> replaces function calls with function body,
if the compiler supports it. It will save the cost of function calls but will
potentially increase the code size. The best situation to use <code>inline</code> might
be those small but frequently used functions. The following snippet shows how
to encourage and prohibit code inline.</p>
<pre><codeclass="language-lisp">;; The globally defined function DISPATCH should be open-coded,
;; if the implementation supports inlining, unless a NOTINLINE
<p>When this feature is present, all inlinable generic functions are inlined
unless it is declared <code>notinline</code>.</p>
<h2id="block-compilation">Block compilation</h2>
<p>SBCL <ahref="https://mstmetent.blogspot.com/2020/02/block-compilation-fresh-in-sbcl-202.html">got block compilation on version 2.0.2</a>, which was in CMUCL since 1991 but a little forgotten since.</p>
<p>You can enable block compilation with a one-liner:</p>
<p>Block compilation addresses a known aspect of dynamic languages: function calls to global, top-level functions are expensive.</p>
<blockquote>
<p>Much more expensive than in a statically compiled language. They’re slow because of the late-bound nature of top-level defined functions, allowing arbitrary redefinition while the program is running and forcing runtime checks on whether the function is being called with the right number or types of arguments. This type of call is known as a “full call” in Python (the compiler used in CMUCL and SBCL, not to be confused with the programming language), and their calling convention permits the most runtime flexibility.</p>
</blockquote>
<p>But local calls, the ones inside a top-level functions (for example <code>lambda</code>s, <code>labels</code> and <code>flet</code>s) are fast.</p>
<blockquote>
<p>These calls are more ‘static’ in the sense that they are treated more like function calls in static languages, being compiled “together” and at the same time as the local functions they reference, allowing them to be optimized at compile-time. For example, argument checking can be done at compile time because the number of arguments of the callee is known at compile time, unlike in the full call case where the function, and hence the number of arguments it takes, can change dynamically at runtime at any point. Additionally, the local call calling convention can allow for passing unboxed values like floats around, as they are put into unboxed registers never used in the full call convention, which must use boxed argument and return value registers.</p>
</blockquote>
<p>So enabling block compilation kind of turns your code into a giant <code>labels</code> form.</p>
<p>One evident possible drawback, dependending on your application, is that you can’t redefine functions at runtime anymore.</p>
<p>We can also enable block compilation with the <code>:block-compile</code> keyword to <code>compile-file</code>:</p>
<pre><codeclass="language-lisp">(defun foo (x y)
(print (bar x y))
(bar x y))
(defun bar (x y)
(+ x y))
(defun fact (n)
(if (zerop n)
1
(* n (fact (1- n)))))
> (compile-file "foo.lisp" :block-compile t :entry-points nil)
<p>you [will] see that FOO and BAR are now compiled into the same component (with local calls), and both have valid external entry points. This improves locality of code quite a bit and still allows calling both FOO and BAR externally from the file (e.g. in the REPL). […]</p>
</blockquote>
<p>But there is one more goody block compilation adds…</p>
<blockquote>
<p>Notice we specified <code>:entry-points</code> nil above. That’s telling the compiler to still create external entry points to every function in the file, since we’d like to be able to call them normally from outside the code component (i.e. the compiled compilation unit, here the entire file).</p>
</blockquote>
<p>For more explanations, I refer you to the mentioned blog post, the current de-facto documentation for SBCL, in addition to <ahref="https://cmucl.org/docs/cmu-user/html/Block-Compilation.html">CMUCL’s documentation</a> (note that the form-by-form level granularity in CMUCL (<code> (declaim (start-block ...)) ... (declaim (end-block ..))</code>) is missing in SBCL, at the time of writing).</p>
<p>Finally, be aware that “block compiling and inlining currently does not interact very well [in SBCL]”.</p>