779 lines
32 KiB
HTML
779 lines
32 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta name="generator" content=
|
||
"HTML Tidy for HTML5 for Linux version 5.2.0">
|
||
<title>Performance Tuning and Tips</title>
|
||
<meta charset="utf-8">
|
||
<meta name="description" content="A collection of examples of using Common Lisp">
|
||
<meta name="viewport" content=
|
||
"width=device-width, initial-scale=1">
|
||
<link rel="icon" href=
|
||
"assets/cl-logo-blue.png"/>
|
||
<link rel="stylesheet" href=
|
||
"assets/style.css">
|
||
<script type="text/javascript" src=
|
||
"assets/highlight-lisp.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/jquery-3.2.1.min.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/jquery.toc/jquery.toc.min.js">
|
||
</script>
|
||
<script type="text/javascript" src=
|
||
"assets/toggle-toc.js">
|
||
</script>
|
||
|
||
<link rel="stylesheet" href=
|
||
"assets/github.css">
|
||
|
||
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
|
||
</head>
|
||
<body>
|
||
<h1 id="title-xs"><a href="index.html">The Common Lisp Cookbook</a> – Performance Tuning and Tips</h1>
|
||
<div id="logo-container">
|
||
<a href="index.html">
|
||
<img id="logo" src="assets/cl-logo-blue.png"/>
|
||
</a>
|
||
|
||
<div id="searchform-container">
|
||
<form onsubmit="duckSearch()" action="javascript:void(0)">
|
||
<input id="searchField" type="text" value="" placeholder="Search...">
|
||
</form>
|
||
</div>
|
||
|
||
<div id="toc-container" class="toc-close">
|
||
<div id="toc-title">Table of Contents</div>
|
||
<ul id="toc" class="list-unstyled"></ul>
|
||
</div>
|
||
</div>
|
||
|
||
<div id="content-container">
|
||
<h1 id="title-non-xs"><a href="index.html">The Common Lisp Cookbook</a> – Performance Tuning and Tips</h1>
|
||
|
||
<!-- Announcement we can keep for 1 month or more. I remove it and re-add it from time to time. -->
|
||
<p class="announce">
|
||
📹 💻
|
||
<a style="font-size: 120%" href="https://www.udemy.com/course/common-lisp-programming/?couponCode=LISPMACROSPOWER" title="This course is under a paywall on the Udemy platform. Several videos are freely available so you can judge before diving in. vindarel is (I am) the main contributor to this Cookbook."> Discover vindarel's Lisp course in videos with this September coupon.</a>
|
||
<strong>
|
||
Recently added: 18 videos on MACROS.
|
||
</strong>
|
||
<a style="font-size: 90%" href="https://github.com/vindarel/common-lisp-course-in-videos/">Learn more</a>.
|
||
</p>
|
||
<p class="announce-neutral">
|
||
📕 <a href="index.html#download-in-epub">Get the EPUB and PDF</a>
|
||
</p>
|
||
|
||
|
||
<div id="content"
|
||
<p>Many Common Lisp implementations translate the source code into assembly
|
||
language, so the performance is really good compared with some other
|
||
interpreted languages.</p>
|
||
|
||
<p>However, sometimes we just want the program to be faster. This chapter
|
||
introduces some techniques to squeeze the CPU power out.</p>
|
||
|
||
<h2 id="finding-bottlenecks">Finding Bottlenecks</h2>
|
||
|
||
<h3 id="acquiring-execution-time">Acquiring Execution Time</h3>
|
||
|
||
<p>The macro <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/m_time.htm"><code>time</code></a> is very useful for finding out bottlenecks. It takes
|
||
a form, evaluates it and prints timing information in
|
||
<a href="http://www.lispworks.com/documentation/lw71/CLHS/Body/v_debug_.htm#STtrace-outputST"><code>*trace-output*</code></a>, as shown below:</p>
|
||
|
||
<pre><code class="language-lisp">* (defun collect (start end)
|
||
"Collect numbers [start, end] as list."
|
||
(loop for i from start to end
|
||
collect i))
|
||
|
||
* (time (collect 1 10))
|
||
|
||
Evaluation took:
|
||
0.000 seconds of real time
|
||
0.000001 seconds of total run time (0.000001 user, 0.000000 system)
|
||
100.00% CPU
|
||
3,800 processor cycles
|
||
0 bytes consed
|
||
</code></pre>
|
||
|
||
<p>By using the <code>time</code> macro it is fairly easy to find out which part of your program
|
||
takes too much time.</p>
|
||
|
||
<p>Please note that the timing information provided here is not guaranteed to be
|
||
reliable enough for marketing comparisons. It should only be used for tuning
|
||
purpose, as demonstrated in this chapter.</p>
|
||
|
||
<h3 id="know-your-lisps-statistical-profiler">Know your Lisp’s statistical profiler</h3>
|
||
|
||
<p>Implementations ship their own profilers. SBCL has
|
||
<a href="http://www.sbcl.org/manual/#Deterministic-Profiler">sb-profile</a>, a
|
||
“classic, per-function-call” deterministic profiler and
|
||
<a href="http://www.sbcl.org/manual/#Statistical-Profiler">sb-sprof</a>, a
|
||
statistical profiler. The latter works by taking samples of the
|
||
program execution at regular intervals, instead of instrumenting
|
||
functions like <code>sb-profile:profile</code> does.</p>
|
||
|
||
<blockquote>
|
||
<p>You might find sb-sprof more useful than the deterministic profiler when profiling functions in the common-lisp-package, SBCL internals, or code where the instrumenting overhead is excessive.</p>
|
||
</blockquote>
|
||
|
||
<h3 id="use-flamegraphs-and-other-tracing-profilers">Use flamegraphs and other tracing profilers</h3>
|
||
|
||
<p><a href="https://github.com/40ants/cl-flamegraph">cl-flamegraph</a> is a wrapper around SBCL’s statistical profiler to generate FlameGraph charts. Flamegraphs are a very visual way to search for hotspots in your code:</p>
|
||
|
||
<p><img src="assets/cl-flamegraph.png" alt="" /></p>
|
||
|
||
<p>See also <a href="https://github.com/TeMPOraL/tracer">tracer</a>, a tracing
|
||
profiler for SBCL. Its output is suitable for display in
|
||
Chrome’s or Chromium’s Tracing Viewer (<code>chrome://tracing</code>).</p>
|
||
|
||
<h3 id="checking-assembly-code">Checking Assembly Code</h3>
|
||
|
||
<p>The function <a href="http://www.lispworks.com/documentation/lw60/CLHS/Body/f_disass.htm"><code>disassemble</code></a> takes a function and prints the
|
||
compiled code of it to <code>*standard-output*</code>. For example:</p>
|
||
|
||
<pre><code class="language-lisp">* (defun plus (a b)
|
||
(+ a b))
|
||
PLUS
|
||
|
||
* (disassemble 'plus)
|
||
; disassembly for PLUS
|
||
; Size: 37 bytes. Origin: #x52B8063B
|
||
; 3B: 498B5D60 MOV RBX, [R13+96] ; no-arg-parsing entry point
|
||
; thread.binding-stack-pointer
|
||
; 3F: 48895DF8 MOV [RBP-8], RBX
|
||
; 43: 498BD0 MOV RDX, R8
|
||
; 46: 488BFE MOV RDI, RSI
|
||
; 49: FF1425B0001052 CALL QWORD PTR [#x521000B0] ; GENERIC-+
|
||
; 50: 488B75E8 MOV RSI, [RBP-24]
|
||
; 54: 4C8B45F0 MOV R8, [RBP-16]
|
||
; 58: 488BE5 MOV RSP, RBP
|
||
; 5B: F8 CLC
|
||
; 5C: 5D POP RBP
|
||
; 5D: C3 RET
|
||
; 5E: CC0F BREAK 15 ; Invalid argument count trap
|
||
</code></pre>
|
||
|
||
<p>The code above was evaluated in SBCL. In some other implementations such as
|
||
CLISP, <code>disassembly</code> might return something different:</p>
|
||
|
||
<pre><code class="language-lisp">* (defun plus (a b)
|
||
(+ a b))
|
||
PLUS
|
||
|
||
* (disassemble 'plus)
|
||
Disassembly of function PLUS
|
||
2 required arguments
|
||
0 optional arguments
|
||
No rest parameter
|
||
No keyword parameters
|
||
4 byte-code instructions:
|
||
0 (LOAD&PUSH 2)
|
||
1 (LOAD&PUSH 2)
|
||
2 (CALLSR 2 55) ; +
|
||
5 (SKIP&RET 3)
|
||
NIL
|
||
</code></pre>
|
||
|
||
<p>It is because SBCL compiles the Lisp code into machine code, while CLISP does
|
||
not.</p>
|
||
|
||
<h2 id="using-declare-expression">Using Declare Expression</h2>
|
||
|
||
<p>The <a href="http://www.lispworks.com/documentation/lw71/CLHS/Body/s_declar.htm"><em>declare expression</em></a> can be used to provide hints for compilers
|
||
to perform various optimization. Please note that these hints are
|
||
implementation-dependent. Some implementations such as SBCL support this
|
||
feature, and you may refer to their own documentation for detailed
|
||
information. Here only some basic techniques mentioned in CLHS are introduced.</p>
|
||
|
||
<p>In general, declare expressions can occur only at the beginning of the bodies
|
||
of certain forms, or immediately after a documentation string if the context
|
||
allows. Also, the content of a declare expression is restricted to limited
|
||
forms. Here we introduce some of them that are related to performance tuning.</p>
|
||
|
||
<p>Please keep in mind that these optimization skills introduced in this section
|
||
are strongly connected to the Lisp implementation selected. Always check their
|
||
documentation before using <code>declare</code>!</p>
|
||
|
||
<h3 id="speed-and-safety">Speed and Safety</h3>
|
||
|
||
<p>Lisp allows you to specify several quality properties for the compiler using
|
||
the declaration <a href="http://www.lispworks.com/documentation/lw71/CLHS/Body/d_optimi.htm"><code>optimize</code></a>. Each quality may be assigned a value
|
||
from 0 to 3, with 0 being “totally unimportant” and 3 being “extremely
|
||
important”.</p>
|
||
|
||
<p>The most significant qualities might be <code>safety</code> and <code>speed</code>.</p>
|
||
|
||
<p>By default, Lisp considers code safety to be much more important than
|
||
speed. But you may adjust the weight for more aggressive optimization.</p>
|
||
|
||
<pre><code class="language-lisp">* (defun max-original (a b)
|
||
(max a b))
|
||
MAX-ORIGINAL
|
||
|
||
* (disassemble 'max-original)
|
||
; disassembly for MAX-ORIGINAL
|
||
; Size: 144 bytes. Origin: #x52D450EF
|
||
; 7A7: 8D46F1 lea eax, [rsi-15] ; no-arg-parsing entry point
|
||
; 7AA: A801 test al, 1
|
||
; 7AC: 750E jne L0
|
||
; 7AE: 3C0A cmp al, 10
|
||
; 7B0: 740A jeq L0
|
||
; 7B2: A80F test al, 15
|
||
; 7B4: 7576 jne L5
|
||
; 7B6: 807EF11D cmp byte ptr [rsi-15], 29
|
||
; 7BA: 7770 jnbe L5
|
||
; 7BC: L0: 8D43F1 lea eax, [rbx-15]
|
||
; 7BF: A801 test al, 1
|
||
; 7C1: 750E jne L1
|
||
; 7C3: 3C0A cmp al, 10
|
||
; 7C5: 740A jeq L1
|
||
; 7C7: A80F test al, 15
|
||
; 7C9: 755A jne L4
|
||
; 7CB: 807BF11D cmp byte ptr [rbx-15], 29
|
||
; 7CF: 7754 jnbe L4
|
||
; 7D1: L1: 488BD3 mov rdx, rbx
|
||
; 7D4: 488BFE mov rdi, rsi
|
||
; 7D7: B9C1030020 mov ecx, 536871873 ; generic->
|
||
; 7DC: FFD1 call rcx
|
||
; 7DE: 488B75F0 mov rsi, [rbp-16]
|
||
; 7E2: 488B5DF8 mov rbx, [rbp-8]
|
||
; 7E6: 7E09 jle L3
|
||
; 7E8: 488BD3 mov rdx, rbx
|
||
; 7EB: L2: 488BE5 mov rsp, rbp
|
||
; 7EE: F8 clc
|
||
; 7EF: 5D pop rbp
|
||
; 7F0: C3 ret
|
||
; 7F1: L3: 4C8BCB mov r9, rbx
|
||
; 7F4: 4C894DE8 mov [rbp-24], r9
|
||
; 7F8: 4C8BC6 mov r8, rsi
|
||
; 7FB: 4C8945E0 mov [rbp-32], r8
|
||
; 7FF: 488BD3 mov rdx, rbx
|
||
; 802: 488BFE mov rdi, rsi
|
||
; 805: B929040020 mov ecx, 536871977 ; generic-=
|
||
; 80A: FFD1 call rcx
|
||
; 80C: 4C8B45E0 mov r8, [rbp-32]
|
||
; 810: 4C8B4DE8 mov r9, [rbp-24]
|
||
; 814: 488B75F0 mov rsi, [rbp-16]
|
||
; 818: 488B5DF8 mov rbx, [rbp-8]
|
||
; 81C: 498BD0 mov rdx, r8
|
||
; 81F: 490F44D1 cmoveq rdx, r9
|
||
; 823: EBC6 jmp L2
|
||
; 825: L4: CC0A break 10 ; error trap
|
||
; 827: 04 byte #X04
|
||
; 828: 13 byte #X13 ; OBJECT-NOT-REAL-ERROR
|
||
; 829: FE9B01 byte #XFE, #X9B, #X01 ; RBX
|
||
; 82C: L5: CC0A break 10 ; error trap
|
||
; 82E: 04 byte #X04
|
||
; 82F: 13 byte #X13 ; OBJECT-NOT-REAL-ERROR
|
||
; 830: FE1B03 byte #XFE, #X1B, #X03 ; RSI
|
||
; 833: CC0A break 10 ; error trap
|
||
; 835: 02 byte #X02
|
||
; 836: 19 byte #X19 ; INVALID-ARG-COUNT-ERROR
|
||
; 837: 9A byte #X9A ; RCX
|
||
|
||
* (defun max-with-speed-3 (a b)
|
||
(declare (optimize (speed 3) (safety 0)))
|
||
(max a b))
|
||
MAX-WITH-SPEED-3
|
||
|
||
* (disassemble 'max-with-speed-3)
|
||
; disassembly for MAX-WITH-SPEED-3
|
||
; Size: 92 bytes. Origin: #x52D452C3
|
||
; 3B: 48895DE0 mov [rbp-32], rbx ; no-arg-parsing entry point
|
||
; 3F: 488945E8 mov [rbp-24], rax
|
||
; 43: 488BD0 mov rdx, rax
|
||
; 46: 488BFB mov rdi, rbx
|
||
; 49: B9C1030020 mov ecx, 536871873 ; generic->
|
||
; 4E: FFD1 call rcx
|
||
; 50: 488B45E8 mov rax, [rbp-24]
|
||
; 54: 488B5DE0 mov rbx, [rbp-32]
|
||
; 58: 7E0C jle L1
|
||
; 5A: 4C8BC0 mov r8, rax
|
||
; 5D: L0: 498BD0 mov rdx, r8
|
||
; 60: 488BE5 mov rsp, rbp
|
||
; 63: F8 clc
|
||
; 64: 5D pop rbp
|
||
; 65: C3 ret
|
||
; 66: L1: 488945E8 mov [rbp-24], rax
|
||
; 6A: 488BF0 mov rsi, rax
|
||
; 6D: 488975F0 mov [rbp-16], rsi
|
||
; 71: 4C8BC3 mov r8, rbx
|
||
; 74: 4C8945F8 mov [rbp-8], r8
|
||
; 78: 488BD0 mov rdx, rax
|
||
; 7B: 488BFB mov rdi, rbx
|
||
; 7E: B929040020 mov ecx, 536871977 ; generic-=
|
||
; 83: FFD1 call rcx
|
||
; 85: 488B45E8 mov rax, [rbp-24]
|
||
; 89: 488B75F0 mov rsi, [rbp-16]
|
||
; 8D: 4C8B45F8 mov r8, [rbp-8]
|
||
; 91: 4C0F44C6 cmoveq r8, rsi
|
||
; 95: EBC6 jmp L0
|
||
</code></pre>
|
||
|
||
<p>As you can see, the generated assembly code is much shorter (92 bytes VS 144).
|
||
The compiler was able to perform optimizations. Yet we can do better by
|
||
declaring types.</p>
|
||
|
||
<h3 id="type-hints">Type Hints</h3>
|
||
|
||
<p>As mentioned in the <a href="type.html"><em>Type System</em></a> chapter, Lisp has a relatively
|
||
powerful type system. You may provide type hints so that the compiler may
|
||
reduce the size of the generated code.</p>
|
||
|
||
<pre><code class="language-lisp">* (defun max-with-type (a b)
|
||
(declare (optimize (speed 3) (safety 0)))
|
||
(declare (type integer a b))
|
||
(max a b))
|
||
MAX-WITH-TYPE
|
||
|
||
* (disassemble 'max-with-type)
|
||
; disassembly for MAX-WITH-TYPE
|
||
; Size: 42 bytes. Origin: #x52D48A23
|
||
; 1B: 488BF7 mov rsi, rdi ; no-arg-parsing entry point
|
||
; 1E: 488975F0 mov [rbp-16], rsi
|
||
; 22: 488BD8 mov rbx, rax
|
||
; 25: 48895DF8 mov [rbp-8], rbx
|
||
; 29: 488BD0 mov rdx, rax
|
||
; 2C: B98C030020 mov ecx, 536871820 ; generic-<
|
||
; 31: FFD1 call rcx
|
||
; 33: 488B75F0 mov rsi, [rbp-16]
|
||
; 37: 488B5DF8 mov rbx, [rbp-8]
|
||
; 3B: 480F4CDE cmovl rbx, rsi
|
||
; 3F: 488BD3 mov rdx, rbx
|
||
; 42: 488BE5 mov rsp, rbp
|
||
; 45: F8 clc
|
||
; 46: 5D pop rbp
|
||
; 47: C3 ret
|
||
</code></pre>
|
||
|
||
<p>The size of generated assembly code shrunk to about 1/3 of the size. What
|
||
about speed?</p>
|
||
|
||
<pre><code class="language-lisp">* (time (dotimes (i 10000) (max-original 100 200)))
|
||
Evaluation took:
|
||
0.000 seconds of real time
|
||
0.000107 seconds of total run time (0.000088 user, 0.000019 system)
|
||
100.00% CPU
|
||
361,088 processor cycles
|
||
0 bytes consed
|
||
|
||
* (time (dotimes (i 10000) (max-with-type 100 200)))
|
||
Evaluation took:
|
||
0.000 seconds of real time
|
||
0.000044 seconds of total run time (0.000036 user, 0.000008 system)
|
||
100.00% CPU
|
||
146,960 processor cycles
|
||
0 bytes consed
|
||
</code></pre>
|
||
|
||
<p>You see, by specifying type hints, our code runs much faster!</p>
|
||
|
||
<p>But wait…What happens if we declare wrong types? The answer is: it depends.</p>
|
||
|
||
<p>For example, SBCL treats type declarations in a <a href="http://sbcl.org/manual/index.html#Handling-of-Types">special way</a>. It
|
||
performs different levels of type checking according to the safety level. If
|
||
safety level is set to 0, no type checking will be performed. Thus a wrong
|
||
type specifier might cause a lot of damage.</p>
|
||
|
||
<h3 id="more-on-type-declaration-with-declaim">More on Type Declaration with <code>declaim</code></h3>
|
||
|
||
<p>If you try to evaluate a <code>declare</code> form in the top level, you might get the
|
||
following error:</p>
|
||
|
||
<pre><code class="language-lisp">Execution of a form compiled with errors.
|
||
Form:
|
||
(DECLARE (SPEED 3))
|
||
Compile-time error:
|
||
There is no function named DECLARE. References to DECLARE in some contexts
|
||
(like starts of blocks) are unevaluated expressions, but here the expression is
|
||
being evaluated, which invokes undefined behaviour.
|
||
[Condition of type SB-INT:COMPILED-PROGRAM-ERROR]
|
||
</code></pre>
|
||
|
||
<p>This is because type declarations have <a href="http://www.lispworks.com/documentation/lw71/CLHS/Body/03_cd.htm">scopes</a>. In the
|
||
examples above, we have seen type declarations applied to a function.</p>
|
||
|
||
<p>During development it is usually useful to raise the importance of safety in
|
||
order to find out potential problems as soon as possible. On the contrary,
|
||
speed might be more important after deployment. However, it might be too
|
||
verbose to specify declaration expression for each single function.</p>
|
||
|
||
<p>The macro <a href="http://www.lispworks.com/documentation/lw71/CLHS/Body/m_declai.htm"><code>declaim</code></a> provides such possibility. It can be used as a
|
||
top level form in a file and the declarations will be made at compile-time.</p>
|
||
|
||
<pre><code class="language-lisp">* (declaim (optimize (speed 0) (safety 3)))
|
||
NIL
|
||
|
||
* (defun max-original (a b)
|
||
(max a b))
|
||
MAX-ORIGINAL
|
||
|
||
* (disassemble 'max-original)
|
||
; disassembly for MAX-ORIGINAL
|
||
; Size: 181 bytes. Origin: #x52D47D9C
|
||
...
|
||
|
||
* (declaim (optimize (speed 3) (safety 3)))
|
||
NIL
|
||
|
||
* (defun max-original (a b)
|
||
(max a b))
|
||
MAX-ORIGINAL
|
||
|
||
* (disassemble 'max-original)
|
||
; disassembly for MAX-ORIGINAL
|
||
; Size: 142 bytes. Origin: #x52D4815D
|
||
</code></pre>
|
||
|
||
<p>Please note that <code>declaim</code> works in <strong>compile-time</strong> of a file. It is mostly
|
||
used to make some declares local to that file. And it is unspecified whether
|
||
or not the compile-time side-effects of a declaim persist after the file has
|
||
been compiled.</p>
|
||
|
||
<h3 id="declaring-function-types">Declaring function types</h3>
|
||
<p>Another useful declaration is a <code>ftype</code> declaration which establishes
|
||
the relationship between the function argument types and the return value type.
|
||
If the type of passed arguments matches the declared types, the return value type
|
||
is expected to match the declared one. Because of that, a function can have more
|
||
than one <code>ftype</code> declaration associated with it. A <code>ftype</code> declaration restricts
|
||
the type of the argument every time the function is called. It has the following form:</p>
|
||
|
||
<pre><code class="language-lisp"> (declaim (ftype (function (arg1 arg2 ...) return-value) function-name1))
|
||
</code></pre>
|
||
|
||
<p>If the function returns <code>nil</code>, its return type is <code>null</code>.
|
||
This declaration does not put any restriction on the types of arguments by itself.
|
||
It only takes effect if the provided arguments have the specified types – otherwise
|
||
no error is signaled and declaration has no effect. For example,
|
||
the following declamation states that if the argument to the function <code>square</code>
|
||
is a <code>fixnum</code>, the value of the function will also be a <code>fixnum</code>:</p>
|
||
|
||
<pre><code class="language-lisp">(declaim (ftype (function (fixnum) fixnum) square))
|
||
(defun square (x) (* x x))
|
||
</code></pre>
|
||
<p>If we provide it with the argument which is not declared to be of type <code>fixnum</code>,
|
||
no optimization will take place:</p>
|
||
|
||
<pre><code class="language-lisp">(defun do-some-arithmetic (x)
|
||
(the fixnum (+ x (square x))))
|
||
</code></pre>
|
||
|
||
<p>Now let’s try to optimize the speed. The compiler will state that there is type uncertainty:</p>
|
||
|
||
<pre><code class="language-lisp">(defun do-some-arithmetic (x)
|
||
(declare (optimize (speed 3) (debug 0) (safety 0)))
|
||
(the fixnum (+ x (square x))))
|
||
|
||
; compiling (DEFUN DO-SOME-ARITHMETIC ...)
|
||
|
||
; file: /tmp/slimeRzDh1R
|
||
in: DEFUN DO-SOME-ARITHMETIC
|
||
; (+ TEST-FRAMEWORK::X (TEST-FRAMEWORK::SQUARE TEST-FRAMEWORK::X))
|
||
;
|
||
; note: forced to do GENERIC-+ (cost 10)
|
||
; unable to do inline fixnum arithmetic (cost 2) because:
|
||
; The first argument is a NUMBER, not a FIXNUM.
|
||
; unable to do inline (signed-byte 64) arithmetic (cost 5) because:
|
||
; The first argument is a NUMBER, not a (SIGNED-BYTE 64).
|
||
; etc.
|
||
;
|
||
; compilation unit finished
|
||
; printed 1 note
|
||
|
||
|
||
(disassemble 'do-some-arithmetic)
|
||
; disassembly for DO-SOME-ARITHMETIC
|
||
; Size: 53 bytes. Origin: #x52CD1D1A
|
||
; 1A: 488945F8 MOV [RBP-8], RAX ; no-arg-parsing entry point
|
||
; 1E: 488BD0 MOV RDX, RAX
|
||
; 21: 4883EC10 SUB RSP, 16
|
||
; 25: B902000000 MOV ECX, 2
|
||
; 2A: 48892C24 MOV [RSP], RBP
|
||
; 2E: 488BEC MOV RBP, RSP
|
||
; 31: E8C2737CFD CALL #x504990F8 ; #<FDEFN SQUARE>
|
||
; 36: 480F42E3 CMOVB RSP, RBX
|
||
; 3A: 488B45F8 MOV RAX, [RBP-8]
|
||
; 3E: 488BFA MOV RDI, RDX
|
||
; 41: 488BD0 MOV RDX, RAX
|
||
; 44: E807EE42FF CALL #x52100B50 ; GENERIC-+
|
||
; 49: 488BE5 MOV RSP, RBP
|
||
; 4C: F8 CLC
|
||
; 4D: 5D POP RBP
|
||
; 4E: C3 RET
|
||
NIL
|
||
</code></pre>
|
||
|
||
<p>Now we can add a type declaration for <code>x</code>, so the compiler can assume
|
||
that the expression <code>(square x)</code> is a <code>fixnum</code>, and use the fixnum-specific <code>+</code>:</p>
|
||
|
||
<pre><code class="language-lisp">(defun do-some-arithmetic (x)
|
||
(declare (optimize (speed 3) (debug 0) (safety 0)))
|
||
(declare (type fixnum x))
|
||
(the fixnum (+ x (square x))))
|
||
|
||
(disassemble 'do-some-arithmetic)
|
||
|
||
; disassembly for DO-SOME-ARITHMETIC
|
||
; Size: 48 bytes. Origin: #x52C084DA
|
||
; 4DA: 488945F8 MOV [RBP-8], RAX ; no-arg-parsing entry point
|
||
; 4DE: 4883EC10 SUB RSP, 16
|
||
; 4E2: 488BD0 MOV RDX, RAX
|
||
; 4E5: B902000000 MOV ECX, 2
|
||
; 4EA: 48892C24 MOV [RSP], RBP
|
||
; 4EE: 488BEC MOV RBP, RSP
|
||
; 4F1: E8020C89FD CALL #x504990F8 ; #<FDEFN SQUARE>
|
||
; 4F6: 480F42E3 CMOVB RSP, RBX
|
||
; 4FA: 488B45F8 MOV RAX, [RBP-8]
|
||
; 4FE: 4801D0 ADD RAX, RDX
|
||
; 501: 488BD0 MOV RDX, RAX
|
||
; 504: 488BE5 MOV RSP, RBP
|
||
; 507: F8 CLC
|
||
; 508: 5D POP RBP
|
||
; 509: C3 RET
|
||
NIL
|
||
</code></pre>
|
||
|
||
<h3 id="code-inline">Code Inline</h3>
|
||
|
||
<p>The declaration <a href="http://www.lispworks.com/documentation/lw51/CLHS/Body/d_inline.htm"><code>inline</code></a> replaces function calls with function body,
|
||
if the compiler supports it. It will save the cost of function calls but will
|
||
potentially increase the code size. The best situation to use <code>inline</code> might
|
||
be those small but frequently used functions. The following snippet shows how
|
||
to encourage and prohibit code inline.</p>
|
||
|
||
<pre><code class="language-lisp">;; The globally defined function DISPATCH should be open-coded,
|
||
;; if the implementation supports inlining, unless a NOTINLINE
|
||
;; declaration overrides this effect.
|
||
(declaim (inline dispatch))
|
||
(defun dispatch (x) (funcall (get (car x) 'dispatch) x))
|
||
|
||
;; Here is an example where inlining would be encouraged.
|
||
;; Because function DISPATCH was defined as INLINE, the code
|
||
;; inlining will be encouraged by default.
|
||
(defun use-dispatch-inline-by-default ()
|
||
(dispatch (read-command)))
|
||
|
||
;; Here is an example where inlining would be prohibited.
|
||
;; The NOTINLINE here only affects this function.
|
||
(defun use-dispatch-with-declare-notinline ()
|
||
(declare (notinline dispatch))
|
||
(dispatch (read-command)))
|
||
|
||
;; Here is an example where inlining would be prohibited.
|
||
;; The NOTINLINE here affects all following code.
|
||
(declaim (notinline dispatch))
|
||
(defun use-dispatch-with-declaim-noinline ()
|
||
(dispatch (read-command)))
|
||
|
||
;; Inlining would be encouraged because you specified it.
|
||
;; The INLINE here only affects this function.
|
||
(defun use-dispatch-with-inline ()
|
||
(declare (inline dispatch))
|
||
(dispatch (read-command)))
|
||
</code></pre>
|
||
|
||
<p>Please note that when the inlined functions change, all the callers must be
|
||
re-compiled.</p>
|
||
|
||
<h2 id="optimizing-generic-functions">Optimizing Generic Functions</h2>
|
||
|
||
<h3 id="using-static-dispatch">Using Static Dispatch</h3>
|
||
|
||
<p>Generic functions provide much convenience and flexibility during
|
||
development. However, the flexibility comes with cost: generic methods are
|
||
much slower than trivial functions. The performance cost becomes a burden
|
||
especially when the flexibility is not needed.</p>
|
||
|
||
<p>The package <a href="https://github.com/guicho271828/inlined-generic-function"><code>inlined-generic-function</code></a> provides
|
||
functions to convert generic functions to static dispatch, moving the dispatch
|
||
cost to compile-time. You just need to define generic function as a
|
||
<code>inlined-generic-function</code>.</p>
|
||
|
||
<p><strong>Caution</strong></p>
|
||
|
||
<p>This package is declared as experimental thus is not recommended to be used in
|
||
a serious software production. Use it at your own risk!</p>
|
||
|
||
<pre><code class="language-lisp">* (defgeneric plus (a b)
|
||
(:generic-function-class inlined-generic-function))
|
||
#<INLINED-GENERIC-FUNCTION HELLO::PLUS (2)>
|
||
|
||
* (defmethod plus ((a fixnum) (b fixnum))
|
||
(+ a b))
|
||
#<INLINED-METHOD HELLO::PLUS (FIXNUM FIXNUM) {10056D7513}>
|
||
|
||
* (defun func-using-plus (a b)
|
||
(plus a b))
|
||
FUNC-USING-PLUS
|
||
|
||
* (defun func-using-plus-inline (a b)
|
||
(declare (inline plus))
|
||
(plus a b))
|
||
FUNC-USING-PLUS-INLINE
|
||
|
||
* (time
|
||
(dotimes (i 100000)
|
||
(func-using-plus 100 200)))
|
||
Evaluation took:
|
||
0.018 seconds of real time
|
||
0.017819 seconds of total run time (0.017800 user, 0.000019 system)
|
||
100.00% CPU
|
||
3 lambdas converted
|
||
71,132,440 processor cycles
|
||
6,586,240 bytes consed
|
||
|
||
* (time
|
||
(dotimes (i 100000)
|
||
(func-using-plus-inline 100 200)))
|
||
Evaluation took:
|
||
0.001 seconds of real time
|
||
0.000326 seconds of total run time (0.000326 user, 0.000000 system)
|
||
0.00% CPU
|
||
1,301,040 processor cycles
|
||
0 bytes consed
|
||
</code></pre>
|
||
|
||
<p>The inlining is not enabled by default because once inlined, changes made to
|
||
methods will not be reflected.</p>
|
||
|
||
<p>It can be enabled globally by adding <code>:inline-generic-function</code> flag in
|
||
<a href="http://www.lispworks.com/documentation/lw71/CLHS/Body/v_featur.htm"><code>*features*</code></a>.</p>
|
||
|
||
<pre><code class="language-lisp">* (push :inline-generic-function *features*)
|
||
(:INLINE-GENERIC-FUNCTION :SLYNK :CLOSER-MOP :CL-FAD :BORDEAUX-THREADS
|
||
:THREAD-SUPPORT :CL-PPCRE ALEXANDRIA.0.DEV::SEQUENCE-EMPTYP :QUICKLISP
|
||
:QUICKLISP-SUPPORT-HTTPS :SB-BSD-SOCKETS-ADDRINFO :ASDF3.3 :ASDF3.2 :ASDF3.1
|
||
:ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :ROS.INIT
|
||
:X86-64 :64-BIT :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :AVX2
|
||
:C-STACK-IS-CONTROL-STACK :CALL-SYMBOL :COMMON-LISP :COMPACT-INSTANCE-HEADER
|
||
:COMPARE-AND-SWAP-VOPS :CYCLE-COUNTER :ELF :FP-AND-PC-STANDARD-SAVE ..)
|
||
</code></pre>
|
||
|
||
<p>When this feature is present, all inlinable generic functions are inlined
|
||
unless it is declared <code>notinline</code>.</p>
|
||
|
||
<h2 id="block-compilation">Block compilation</h2>
|
||
|
||
<p>SBCL <a href="https://mstmetent.blogspot.com/2020/02/block-compilation-fresh-in-sbcl-202.html">got block compilation on version 2.0.2</a>, which was in CMUCL since 1991 but a little forgotten since.</p>
|
||
|
||
<p>You can enable block compilation with a one-liner:</p>
|
||
|
||
<pre><code class="language-lisp">(setq *block-compile-default* t)
|
||
</code></pre>
|
||
|
||
<p>But what is it?</p>
|
||
|
||
<p>Block compilation addresses a known aspect of dynamic languages: function calls to global, top-level functions are expensive.</p>
|
||
|
||
<blockquote>
|
||
<p>Much more expensive than in a statically compiled language. They’re slow because of the late-bound nature of top-level defined functions, allowing arbitrary redefinition while the program is running and forcing runtime checks on whether the function is being called with the right number or types of arguments. This type of call is known as a “full call” in Python (the compiler used in CMUCL and SBCL, not to be confused with the programming language), and their calling convention permits the most runtime flexibility.</p>
|
||
</blockquote>
|
||
|
||
<p>But local calls, the ones inside a top-level functions (for example <code>lambda</code>s, <code>labels</code> and <code>flet</code>s) are fast.</p>
|
||
|
||
<blockquote>
|
||
<p>These calls are more ‘static’ in the sense that they are treated more like function calls in static languages, being compiled “together” and at the same time as the local functions they reference, allowing them to be optimized at compile-time. For example, argument checking can be done at compile time because the number of arguments of the callee is known at compile time, unlike in the full call case where the function, and hence the number of arguments it takes, can change dynamically at runtime at any point. Additionally, the local call calling convention can allow for passing unboxed values like floats around, as they are put into unboxed registers never used in the full call convention, which must use boxed argument and return value registers.</p>
|
||
</blockquote>
|
||
|
||
<p>So enabling block compilation kind of turns your code into a giant <code>labels</code> form.</p>
|
||
|
||
<p>One evident possible drawback, dependending on your application, is that you can’t redefine functions at runtime anymore.</p>
|
||
|
||
<p>We can also enable block compilation with the <code>:block-compile</code> keyword to <code>compile-file</code>:</p>
|
||
|
||
<pre><code class="language-lisp">(defun foo (x y)
|
||
(print (bar x y))
|
||
(bar x y))
|
||
|
||
(defun bar (x y)
|
||
(+ x y))
|
||
|
||
(defun fact (n)
|
||
(if (zerop n)
|
||
1
|
||
(* n (fact (1- n)))))
|
||
|
||
> (compile-file "foo.lisp" :block-compile t :entry-points nil)
|
||
> (load "foo.fasl")
|
||
|
||
> (sb-disassem:disassemble-code-component #'foo)
|
||
</code></pre>
|
||
|
||
<p>If you inspect the assembly,</p>
|
||
|
||
<blockquote>
|
||
<p>you [will] see that FOO and BAR are now compiled into the same component (with local calls), and both have valid external entry points. This improves locality of code quite a bit and still allows calling both FOO and BAR externally from the file (e.g. in the REPL). […]</p>
|
||
</blockquote>
|
||
|
||
<p>But there is one more goody block compilation adds…</p>
|
||
|
||
<blockquote>
|
||
<p>Notice we specified <code>:entry-points</code> nil above. That’s telling the compiler to still create external entry points to every function in the file, since we’d like to be able to call them normally from outside the code component (i.e. the compiled compilation unit, here the entire file).</p>
|
||
</blockquote>
|
||
|
||
<p>For more explanations, I refer you to the mentioned blog post, the current de-facto documentation for SBCL, in addition to <a href="https://cmucl.org/docs/cmu-user/html/Block-Compilation.html">CMUCL’s documentation</a> (note that the form-by-form level granularity in CMUCL (<code> (declaim (start-block ...)) ... (declaim (end-block ..))</code>) is missing in SBCL, at the time of writing).</p>
|
||
|
||
<p>Finally, be aware that “block compiling and inlining currently does not interact very well [in SBCL]”.</p>
|
||
|
||
|
||
|
||
<p class="page-source">
|
||
Page source: <a href="https://github.com/LispCookbook/cl-cookbook/blob/master/performance.md">performance.md</a>
|
||
</p>
|
||
</div>
|
||
|
||
<script type="text/javascript">
|
||
|
||
// Don't write the TOC on the index.
|
||
if (window.location.pathname != "/cl-cookbook/") {
|
||
$("#toc").toc({
|
||
content: "#content", // will ignore the first h1 with the site+page title.
|
||
headings: "h1,h2,h3,h4"});
|
||
}
|
||
|
||
$("#two-cols + ul").css({
|
||
"column-count": "2",
|
||
});
|
||
$("#contributors + ul").css({
|
||
"column-count": "4",
|
||
});
|
||
</script>
|
||
|
||
|
||
|
||
<div>
|
||
<footer class="footer">
|
||
<hr/>
|
||
© 2002–2023 the Common Lisp Cookbook Project
|
||
<div>
|
||
📹 Discover <a style="color: darkgrey; text-decoration: underline", href="https://www.udemy.com/course/common-lisp-programming/?referralCode=2F3D698BBC4326F94358">vindarel's Lisp course on Udemy</a>
|
||
</div>
|
||
</footer>
|
||
|
||
</div>
|
||
<div id="toc-btn">T<br>O<br>C</div>
|
||
</div>
|
||
|
||
<script text="javascript">
|
||
HighlightLisp.highlight_auto({className: null});
|
||
</script>
|
||
|
||
<script type="text/javascript">
|
||
function duckSearch() {
|
||
var searchField = document.getElementById("searchField");
|
||
if (searchField && searchField.value) {
|
||
var query = escape("site:lispcookbook.github.io/cl-cookbook/ " + searchField.value);
|
||
window.location.href = "https://duckduckgo.com/?kj=b2&kf=-1&ko=1&q=" + query;
|
||
// https://duckduckgo.com/params
|
||
// kj=b2: blue header in results page
|
||
// kf=-1: no favicons
|
||
}
|
||
}
|
||
</script>
|
||
|
||
<script async defer data-domain="lispcookbook.github.io/cl-cookbook" src="https://plausible.io/js/plausible.js"></script>
|
||
|
||
</body>
|
||
</html>
|