Add clones websites
1
clones/abseil.io/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
!function(){"use strict";function e(e){try{if("undefined"==typeof console)return;"error"in console?console.error(e):console.log(e)}catch(e){}}function t(e){return d.innerHTML='<a href="'+e.replace(/"/g,""")+'"></a>',d.childNodes[0].getAttribute("href")||""}function r(e,t){var r=e.substr(t,2);return parseInt(r,16)}function n(n,c){for(var o="",a=r(n,c),i=c+2;i<n.length;i+=2){var l=r(n,i)^a;o+=String.fromCharCode(l)}try{o=decodeURIComponent(escape(o))}catch(u){e(u)}return t(o)}function c(t){for(var r=t.querySelectorAll("a"),c=0;c<r.length;c++)try{var o=r[c],a=o.href.indexOf(l);a>-1&&(o.href="mailto:"+n(o.href,a+l.length))}catch(i){e(i)}}function o(t){for(var r=t.querySelectorAll(u),c=0;c<r.length;c++)try{var o=r[c],a=o.parentNode,i=o.getAttribute(f);if(i){var l=n(i,0),d=document.createTextNode(l);a.replaceChild(d,o)}}catch(h){e(h)}}function a(t){for(var r=t.querySelectorAll("template"),n=0;n<r.length;n++)try{i(r[n].content)}catch(c){e(c)}}function i(t){try{c(t),o(t),a(t)}catch(r){e(r)}}var l="/cdn-cgi/l/email-protection#",u=".__cf_email__",f="data-cfemail",d=document.createElement("div");i(document),function(){var e=document.currentScript||document.scripts[document.scripts.length-1];e.parentNode.removeChild(e)}()}();
|
394
clones/abseil.io/resources/swe-book/html/ch01.html
Normal file
|
@ -0,0 +1,394 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="what_is_software_engineeringquestion_ma">
|
||||
<h1>What Is Software Engineering?</h1>
|
||||
|
||||
<p class="byline">Written by Titus Winters</p>
|
||||
|
||||
<p class="byline">Edited by Tom Manshreck</p>
|
||||
|
||||
<blockquote data-type="epigraph">
|
||||
<p>Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.</p>
|
||||
|
||||
<p data-type="attribution">Jorge Luis Borges</p>
|
||||
</blockquote>
|
||||
|
||||
<p>We see three critical differences between programming and software engineering: time, scale, and the trade-offs at play.<a contenteditable="false" data-primary="software engineering" data-secondary="programming versus" data-type="indexterm" id="id-RYheu0fA"> </a><a contenteditable="false" data-primary="programming" data-secondary="software engineering versus" data-type="indexterm" id="id-oMh0Uefy"> </a> On a software engineering project, engineers need to be more concerned with the passage of time and the eventual need for change. In a software engineering organization, we need to be more concerned about scale and efficiency, both for the software we produce as well as for the organization that is producing it. Finally, as software engineers, we are asked to make more complex decisions with higher-stakes outcomes, often based on imprecise estimates of time and growth.</p>
|
||||
|
||||
<p>Within Google, we sometimes say, “Software engineering is programming integrated over time.” Programming<a contenteditable="false" data-primary="time and change in software projects" data-secondary="life span of programs and" data-type="indexterm" id="id-oMhyu6Sy"> </a> is certainly a significant part of software engineering: after all, programming is how you generate new software in the first place. If you accept this distinction, it also becomes clear that we might need to delineate between programming tasks (development) and software engineering tasks (development, modification, maintenance). The addition of time adds an important new dimension to programming. Cubes aren’t squares, distance isn’t velocity. Software engineering isn’t programming.</p>
|
||||
|
||||
<p>One way to see the impact of time on a program is to think about the question, “What is the expected life span<sup><a data-type="noteref" id="ch01fn1-marker" href="ch01.html#ch01fn1">1</a></sup> of your code?” Reasonable answers to this question vary by roughly a factor of 100,000. It is just as reasonable to think of code that needs to last for a few minutes as it is to imagine code that will live for decades. Generally, code on the short end of that spectrum is unaffected by time. It is unlikely that you need to adapt to a new version of your underlying libraries, operating system (OS), hardware, or language version for a program whose utility spans only an hour. These short-lived systems are effectively “just” a programming problem, in the same way that a cube compressed far enough in one dimension is a square. As we expand that time to allow for longer life spans, change becomes more important. Over a span of a decade or more, most program dependencies, whether implicit or explicit, will likely change. This recognition is at the root of our distinction between software engineering and programming.</p>
|
||||
|
||||
<p>This distinction is at the core of what we call <em>sustainability</em> for software.<a contenteditable="false" data-primary="sustainability" data-secondary="for software" data-secondary-sortas="software" data-type="indexterm" id="id-bvh1UAIN"> </a> Your project is <em>sustainable</em> if, for the expected life span of your software, you are capable of reacting to whatever valuable change comes along, for either technical or business reasons.<a contenteditable="false" data-primary="upgrades" data-type="indexterm" id="id-AWhYT1Iq"> </a> Importantly, we are looking only for capability—you might choose not to perform a given upgrade, either for lack of value or other priorities.<sup><a data-type="noteref" id="ch01fn2-marker" href="ch01.html#ch01fn2">2</a></sup> When you are fundamentally incapable of reacting to a change in underlying technology or product direction, you’re placing a high-risk bet on the hope that such a change never becomes critical. For short-term projects, that might be a safe bet. Over multiple decades, it probably isn’t.<sup><a data-type="noteref" id="ch01fn3-marker" href="ch01.html#ch01fn3">3</a></sup></p>
|
||||
|
||||
<p>Another way to look at software engineering is to consider scale. <a contenteditable="false" data-primary="scale" data-secondary="in software engineering" data-type="indexterm" id="id-bvhWu3CN"> </a>How many people are involved? What part do they play in the development and maintenance over time? A programming task is often an act of individual creation, but a software engineering task is a team effort. An early attempt to define software engineering produced a good definition for this viewpoint: “The multiperson development of multiversion programs.”<sup><a data-type="noteref" id="ch01fn4-marker" href="ch01.html#ch01fn4">4</a></sup> This suggests the difference between software engineering and programming is one of both time and people. Team collaboration presents new problems, but also provides more potential to produce valuable systems than any single programmer could.</p>
|
||||
|
||||
<p>Team organization, project composition, and the policies and practices of a software project all dominate this aspect of software engineering complexity. These problems are inherent to scale: as the organization grows and its projects expand, does it become more efficient at producing software? Does our development workflow become more efficient as we grow, or do our version control policies and testing strategies cost us proportionally more? Scale issues around communication and human scaling have been discussed since the early days of software engineering, going all the way back to the <em>Mythical Man Month</em>.<sup><a data-type="noteref" id="ch01fn5-marker" href="ch01.html#ch01fn5">5</a></sup> Such scale issues are often matters of policy and are fundamental to the question of software sustainability: how much will it cost to do the things that we need to do repeatedly?<a contenteditable="false" data-primary="scale" data-secondary="issues in software engineering" data-type="indexterm" id="id-Zwh7H0cz"> </a></p>
|
||||
|
||||
<p>We can also say that software engineering is different from programming in terms of the complexity of decisions that need to be made and their stakes. In software engineering, we are regularly forced to evaluate the trade-offs between several paths forward, sometimes with high stakes and often with imperfect value metrics. The job of a software engineer, or a software engineering leader, is to aim for sustainability and management of the scaling costs for the organization, the product, and the development workflow. With those inputs in mind, evaluate your trade-offs and make rational decisions. We might sometimes defer maintenance changes, or even embrace policies that don’t scale well, with the knowledge that we’ll need to revisit those decisions. Those choices should be explicit and clear about the deferred costs.</p>
|
||||
|
||||
<p>Rarely is there a one-size-fits-all solution in software engineering, and the same applies to this book. Given a factor of 100,000 for reasonable answers on “How long will this software live,” a range of perhaps a factor of 10,000 for “How many engineers are in your organization,” and who-knows-how-much for “How many compute resources are available for your project,” Google’s experience will probably not match yours. In this book, we aim to present what we’ve found that works for us in the construction and maintenance of software that we expect to last for decades, with tens of thousands of engineers, and world-spanning compute resources. Most of the practices that we find are necessary at that scale will also work well for smaller endeavors: consider this a report on one engineering ecosystem that we think could be good as you scale up. In a few places, super-large scale comes with its own costs, and we’d be happier to not be paying extra overhead. We call those out as a warning. Hopefully if your organization grows large enough to be worried about those costs, you can find a better answer.</p>
|
||||
|
||||
<p>Before we get to specifics about teamwork, culture, policies, and tools, let’s first elaborate on these primary themes of time, scale, and trade-offs.</p>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="time_and_change">
|
||||
<h1 class="less_space">Time and Change</h1>
|
||||
|
||||
<p>When a novice is learning to program, the life span of the resulting code is usually measured in hours or days.<a contenteditable="false" data-primary="software engineering" data-secondary="time and change" data-type="indexterm" id="ix_sftengtmch"> </a><a contenteditable="false" data-primary="time and change in software projects" data-type="indexterm" id="ix_tmchg"> </a> Programming assignments and exercises tend to be write-once, with little to no refactoring and certainly no long-term maintenance. These programs are often not rebuilt or executed ever again after their initial production. This isn’t surprising in a pedagogical setting. Perhaps in secondary or post-secondary education, we may find a team project course or hands-on thesis. If so, such projects are likely the only time student code will live longer than a month or so. Those developers might need to refactor some code, perhaps as a response to changing requirements, but it is unlikely they are being asked to deal with broader changes to their environment.</p>
|
||||
|
||||
<p>We also find developers of short-lived code in common industry settings. Mobile apps often have a fairly short life span,<sup><a data-type="noteref" id="ch01fn6-marker" href="ch01.html#ch01fn6">6</a></sup> and for better or worse, full rewrites are relatively common. Engineers at an early-stage startup might rightly choose to focus on immediate goals over long-term investments: the company might not live long enough to reap the benefits of an infrastructure investment that pays off slowly. A serial startup developer could very reasonably have 10 years of development experience and little or no experience maintaining any piece of software expected to exist for longer than a year or two.</p>
|
||||
|
||||
<p>On the other end of the spectrum, some successful projects have an effectively unbounded life span: we can’t reasonably predict an endpoint for Google Search, the Linux kernel, or the Apache HTTP Server project.<a contenteditable="false" data-primary="Google Search" data-type="indexterm" id="id-yLheuBTxi1"> </a> For most Google projects, we must assume that they will live indefinitely—we cannot predict when we won’t need to upgrade our dependencies, language versions, and so on. As their lifetimes grow, these long-lived projects <em>eventually</em> have a different feel to them than programming assignments or startup development.</p>
|
||||
|
||||
<p>Consider <a data-type="xref" href="ch01.html#life_span_and_the_importance_of_upgrade">Figure 1-1</a>, which demonstrates two software projects on opposite ends of this “expected life span” spectrum. <a contenteditable="false" data-primary="upgrades" data-secondary="life span of software projects and importance of" data-type="indexterm" id="id-6zhKUVf8im"> </a>For a programmer working on a task with an expected life span of hours, what types of maintenance are reasonable to expect? That is, if a new version of your OS comes out while you’re working on a Python script that will be executed one time, should you drop what you’re doing and upgrade? Of course not: the upgrade is not critical. But on the opposite end of the spectrum, Google Search being stuck on a version of our OS from the 1990s would be a clear <span class="keep-together">problem.</span></p>
|
||||
|
||||
<figure id="life_span_and_the_importance_of_upgrade"><img alt="Life span and the importance of upgrades" src="images/seag_0101.png">
|
||||
<figcaption><span class="label">Figure 1-1. </span>Life span and the importance of upgrades</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The low and high points on the expected life span spectrum suggest that there’s a transition somewhere. Somewhere along the line between a one-off program and a project that lasts for decades, a transition happens: a project must begin to react to changing externalities.<sup><a data-type="noteref" id="ch01fn7-marker" href="ch01.html#ch01fn7">7</a></sup> For any project that didn’t plan for upgrades from the start, that transition is likely very painful for three reasons, each of which compounds the others:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>You’re performing a task that hasn’t yet been done for this project; more hidden assumptions have been baked-in.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The engineers trying to do the upgrade are less likely to have experience in this sort of task.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The size of the upgrade is often larger than usual, doing several years’ worth of upgrades at once instead of a more incremental upgrade.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>And thus, after actually going through such an upgrade once (or giving up part way through), it’s pretty reasonable to overestimate the cost of doing a subsequent upgrade and decide “Never again.” Companies that come to this conclusion end up committing to just throwing things out and rewriting their code, or deciding to never upgrade again. Rather than take the natural approach by avoiding a painful task, sometimes the more responsible answer is to invest in making it less painful. It all depends on the cost of your upgrade, the value it provides, and the expected life span of the project in question.</p>
|
||||
|
||||
<p>Getting through not only that first big upgrade, but getting to the point at which you can reliably stay current going forward, is the essence of long-term sustainability for your project. Sustainability requires planning and managing the impact of required change. For many projects at Google, we believe we have achieved this sort of sustainability, largely through trial and error.</p>
|
||||
|
||||
<p>So, concretely, how does short-term programming differ from producing code with a much longer expected life span? Over time, we need to be much more aware of the difference between “happens to work” and “is maintainable.” There is no perfect solution for identifying these issues. That is unfortunate, because keeping software maintainable for the long-term is a constant battle.</p>
|
||||
|
||||
<section data-type="sect2" id="hyrumapostrophes_law">
|
||||
<h2>Hyrum’s Law</h2>
|
||||
|
||||
<p>If you are maintaining a <a contenteditable="false" data-primary="Hyrum's Law" data-type="indexterm" id="id-73hDuYUZsoir"> </a>project that is used by <a contenteditable="false" data-primary="time and change in software projects" data-secondary="Hyrum's Law" data-type="indexterm" id="id-5gh6UzU8svi6"> </a>other engineers, the most important lesson about “it works” versus “it is maintainable” is what we’ve come to call <em>Hyrum’s Law</em>:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by <span class="keep-together">somebody.</span></p>
|
||||
</blockquote>
|
||||
|
||||
<p>In our experience, this axiom is a dominant factor in any discussion of changing software over time. It is conceptually akin to entropy: discussions of change and maintenance over time must be aware of Hyrum’s Law<sup><a data-type="noteref" id="ch01fn8-marker" href="ch01.html#ch01fn8">8</a></sup> just as discussions of efficiency or thermodynamics must be mindful of entropy. Just because entropy never decreases doesn’t mean we shouldn’t try to be efficient. Just because Hyrum’s Law will apply when maintaining software doesn’t mean we can’t plan for it or try to better understand it. We can mitigate it, but we know that it can never be eradicated.</p>
|
||||
|
||||
<p>Hyrum’s Law represents the practical knowledge that—even with the best of intentions, the best engineers, and solid practices for code review—we cannot assume perfect adherence to published contracts or best practices. As an API owner, you will gain <em>some</em> flexibility and freedom by being clear about interface promises, but in practice, the complexity and difficulty of a given change also depends on how useful a user finds some observable behavior of your API. If users cannot depend on such things, your API will be easy to change. Given enough time and enough users, even the most innocuous change <em>will</em> break something;<sup><a data-type="noteref" id="ch01fn9-marker" href="ch01.html#ch01fn9">9</a></sup> your analysis of the value of that change must incorporate the difficulty in investigating, identifying, and resolving those breakages.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="example_hash_ordering">
|
||||
<h2>Example: Hash Ordering</h2>
|
||||
|
||||
<p>Consider the example of hash iteration ordering.<a contenteditable="false" data-primary="time and change in software projects" data-secondary="hash ordering (example)" data-type="indexterm" id="id-5ghWuzUZFvi6"> </a><a contenteditable="false" data-primary="hash ordering (example)" data-type="indexterm" id="id-qlhOUKU5FAia"> </a> If we insert five elements into a hash-based set, in what order do we get them out?</p>
|
||||
|
||||
<pre data-code-language="pycon" data-type="programlisting">>>> for i in {"apple", "banana", "carrot", "durian", "eggplant"}: print(i)
|
||||
...
|
||||
durian
|
||||
carrot
|
||||
apple
|
||||
eggplant
|
||||
banana</pre>
|
||||
|
||||
<p>Most programmers know that hash tables are non-obviously ordered. Few know the specifics of whether the particular hash table they are using is <em>intending</em> to provide that particular ordering forever. This might seem unremarkable, but over the past decade or two, the computing industry’s experience using such types has evolved:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p><em>Hash flooding</em><sup><a data-type="noteref" id="ch01fn10-marker" href="ch01.html#ch01fn10">10</a></sup> attacks provide an increased incentive for nondeterministic hash iteration.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Potential efficiency<a contenteditable="false" data-primary="hash flooding attacks" data-type="indexterm" id="id-Lmh9uvurUMfmFRid"> </a> gains from research into improved hash algorithms or hash containers require changes to hash iteration order.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Per Hyrum’s Law, programmers will write programs that depend on the order in which a hash table is traversed, if <a contenteditable="false" data-primary="Hyrum's Law" data-secondary="hash ordering (example)" data-type="indexterm" id="id-4whauGu2H2fVF7i9"> </a>they have the ability to do so.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>As a result, if you ask any expert “Can I assume a particular output sequence for my hash container?” that expert will presumably say “No.” By and large that is correct, but perhaps simplistic. A more nuanced answer is, “If your code is short-lived, with no changes to your hardware, language runtime, or choice of data structure, such an assumption is fine. If you don’t know how long your code will live, or you cannot promise that nothing you depend upon will ever change, such an assumption is incorrect.” Moreover, even if your own implementation does not depend on hash container order, it might be used by other code that implicitly creates such a dependency. For example, if your library serializes values into a Remote Procedure Call (RPC) response, the RPC caller might wind up depending on the order of those <span class="keep-together">values.</span></p>
|
||||
|
||||
<p>This is a very basic example of the difference between “it works” and “it is correct.” For a short-lived program, depending on the iteration order of your containers will not cause any technical problems. For a software engineering project, on the other hand, such reliance on a defined order is a risk—given enough time, something will make it valuable to change that iteration order. That value can manifest in a number of ways, be it efficiency, security, or merely future-proofing the data structure to allow for future changes. When that value becomes clear, you will need to weigh the trade-offs between that value and the pain of breaking your developers or customers.</p>
|
||||
|
||||
<p>Some languages specifically randomize hash ordering between library versions or even between execution of the same program in an attempt to prevent dependencies. But even this still allows for some Hyrum’s Law surprises: there is code that uses hash iteration ordering as an inefficient random-number generator. Removing such <span class="keep-together">randomness</span> now would break those users. Just as entropy increases in every <span class="keep-together">thermodynamic</span> system, Hyrum’s Law applies to every observable behavior.</p>
|
||||
|
||||
<p>Thinking over the differences between code written with a “works now” and a “works indefinitely” mentality, we can extract some clear relationships. Looking at code as an artifact with a (highly) variable lifetime requirement, we can begin to categorize programming styles: code that depends on brittle and unpublished features of its dependencies is <a contenteditable="false" data-primary="“clever” code" data-primary-sortas="clever" data-type="indexterm" id="id-Jxhru9CNFeiv"> </a><a contenteditable="false" data-primary="“hacky” or “clever” code" data-primary-sortas="hacky" data-type="indexterm" id="id-11hEUKCmFbi4"> </a>likely to be described as “hacky” or “clever,” whereas code <a contenteditable="false" data-primary="“clean” and “maintainable” code" data-primary-sortas="clean" data-type="indexterm" id="id-Eyh6HECNF3iA"> </a>that follows best practices and has planned for the future is more likely to be described as “clean” and “maintainable.” Both have their purposes, but which one you select depends crucially on the expected life span of the code in question. <a contenteditable="false" data-primary="programming" data-secondary="clever code and" data-type="indexterm" id="id-G6h0TlCwFGiW"> </a><a contenteditable="false" data-primary="software engineering" data-secondary="clever code and" data-type="indexterm" id="id-wBhMfgCOFKi6"> </a>We’ve taken to saying, “It’s <em>programming</em> if 'clever' is a compliment, but it’s <em>software engineering</em> if 'clever' is an <span class="keep-together">accusation.”</span></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="why_not_just_aim_for_quotation_marknoth">
|
||||
<h2>Why Not Just Aim for “Nothing Changes”?</h2>
|
||||
|
||||
<p>Implicit in all of this discussion of time and the need to react to change is the assumption that change might be necessary.<a contenteditable="false" data-primary="time and change in software projects" data-secondary="aiming for nothing changes" data-type="indexterm" id="id-qlhXuKUBiAia"> </a> Is it?</p>
|
||||
|
||||
<p>As with effectively everything else in this book, it depends. We’ll readily commit to “For most projects, over a long enough time period, everything underneath them might need to be changed.” If you have a <a contenteditable="false" data-primary="C language, projects written in, changes to" data-type="indexterm" id="id-BKhEuBH4iDiA"> </a>project written in pure C with no external dependencies (or only external dependencies that promise great long-term stability, like POSIX), you might well be able to avoid any form of refactoring or difficult upgrade. C does a great job of providing stability—in many respects, that is its primary purpose.</p>
|
||||
|
||||
<p>Most projects have far more exposure to shifting underlying technology. Most programming languages and runtimes change much more than C does. Even libraries implemented in pure C might change to support new features, which can affect downstream users. <a contenteditable="false" data-primary="security" data-secondary="reacting to threats and vulnerabilities" data-type="indexterm" id="id-YGh8uaTKiziy"> </a>Security problems are disclosed in all manner of technology, from processors to networking libraries to application code. <em>Every</em> piece of technology upon which your project depends has some (hopefully small) risk of containing critical bugs and security vulnerabilities that might come to light only after you’ve started relying on it. <a contenteditable="false" data-primary="Heartbleed" data-type="indexterm" id="id-4wh0H2TbiYiR"> </a>If you are incapable of deploying a patch for <a href="http://heartbleed.com">Heartbleed</a> or mitigating speculative execution problems like <a href="https://meltdownattack.com">Meltdown and Spectre</a> because you’ve assumed (or promised) that nothing will ever change, that is a significant gamble.<a contenteditable="false" data-primary="Meltdown and Spectre" data-type="indexterm" id="id-11hGS6Tqibi4"> </a></p>
|
||||
|
||||
<p>Efficiency improvements further complicate the picture.<a contenteditable="false" data-primary="efficiency improvements, changing code for" data-type="indexterm" id="id-Lmh9uNfOiNio"> </a> We want to outfit our datacenters with cost-effective computing equipment, especially enhancing CPU efficiency. However, algorithms and data structures from early-day Google are simply less efficient on modern equipment: a linked-list or a binary search tree will still work fine, but the ever-widening gap between CPU cycles versus memory latency impacts what “efficient” code looks like. Over time, the value in upgrading to newer hardware can be diminished without accompanying design changes to the software. <a contenteditable="false" data-primary="backward compatibility and reactions to efficiency improvement" data-type="indexterm" id="id-4whBUvfbiYiR"> </a>Backward compatibility ensures that older systems still function, but that is no guarantee that old optimizations are still helpful. Being unwilling or unable to take advantage of such opportunities risks incurring large costs. Efficiency concerns like this are particularly subtle: the original design might have been perfectly logical and following reasonable best practices. It’s only after an evolution of backward-compatible changes that a new, more efficient option becomes important. No mistakes were made, but the passage of time still made change valuable.</p>
|
||||
|
||||
<p>Concerns like those just mentioned are why there are large risks for long-term projects that haven’t invested in sustainability. We must be capable of responding to these sorts of issues and taking advantage of these opportunities, regardless of whether they directly affect us or manifest in only the transitive closure of technology we build upon. Change is not inherently good. We shouldn’t change just for the sake of change. But we do need to be capable of change. If we allow for that eventual necessity, we should also consider whether to invest in making that capability cheap. As every system administrator knows, it’s one thing to know in theory that you can recover from tape, and another to know in practice exactly how to do it and how much it will cost when it becomes necessary. Practice and expertise are great drivers of efficiency and reliability.<a contenteditable="false" data-primary="time and change in software projects" data-startref="ix_tmchg" data-type="indexterm" id="id-4whau0SbiYiR"> </a><a contenteditable="false" data-primary="software engineering" data-secondary="time and change" data-startref="ix_sftengtmch" data-type="indexterm" id="id-2Eh6UVSXi4i0"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="scale_and_efficiency">
|
||||
<h1>Scale and Efficiency</h1>
|
||||
|
||||
<p>As noted in the <em>Site Reliability Engineering</em> (SRE) book,<sup><a data-type="noteref" id="ch01fn11-marker" href="ch01.html#ch01fn11">11</a></sup> Google’s production system as a whole is among the most complex machines created by humankind.<a contenteditable="false" data-primary="scale and efficiency" data-type="indexterm" id="ix_sceff"> </a><a contenteditable="false" data-primary="software engineering" data-secondary="scale and efficiency" data-type="indexterm" id="ix_sftengscef"> </a> The complexity involved in building such a machine and keeping it running smoothly has required countless hours of thought, discussion, and redesign from experts across our organization and around the globe. So, we have already written a book about the complexity of keeping that machine running at that scale.</p>
|
||||
|
||||
<p>Much of <em>this</em> book focuses on the complexity of scale of the organization that produces such a machine, and the processes that we use to keep that machine running over time.<a contenteditable="false" data-primary="codebase" data-secondary="sustainability" data-type="indexterm" id="id-O5hmUgH0u4"> </a><a contenteditable="false" data-primary="sustainability" data-secondary="codebase" data-type="indexterm" id="id-6zhzHGHzum"> </a> Consider again the concept of codebase sustainability: “Your organization’s codebase is <em>sustainable</em> when you are <em>able</em> to change all of the things that you ought to change, safely, and can do so for the life of your codebase.” Hidden in the discussion of capability is also one of costs: if changing something comes at inordinate cost, it will likely be deferred.<a contenteditable="false" data-primary="costs" data-secondary="in software engineering" data-type="indexterm" id="id-xYhVS9HZuO"> </a> If costs grow superlinearly over time, the operation clearly is not scalable.<sup><a data-type="noteref" id="ch01fn12-marker" href="ch01.html#ch01fn12">12</a></sup> Eventually, time will take hold and something unexpected will arise that you absolutely must change. When your project doubles in scope and you need to perform that task again, will it be twice as labor intensive? Will you even have the human resources required to address the issue next time?</p>
|
||||
|
||||
<p>Human costs are not the only finite resource that needs to scale. Just as software itself needs to scale well with traditional resources such as compute, memory, storage, and bandwidth, the development of that software also needs to scale, both in terms of human time involvement and the compute resources that power your development workflow. If the compute cost for your test cluster grows superlinearly, consuming more compute resources per person each quarter, you’re on an unsustainable path and need to make changes soon.</p>
|
||||
|
||||
<p>Finally, the most precious asset of a software organization—the codebase itself—also needs to scale.<a contenteditable="false" data-primary="codebase" data-secondary="scalability" data-type="indexterm" id="id-6zhZuVfzum"> </a> If your build system or version control system scales superlinearly over time, perhaps as a result of growth and increasing changelog history, a point might come at which you simply cannot proceed. Many questions, such as “How long does it take to do a full build?”, “How long does it take to pull a fresh copy of the repository?”, or “How much will it cost to upgrade to a new language version?” aren’t actively monitored and change at a slow pace. They can easily become like the <a href="https://oreil.ly/clqZN">metaphorical boiled frog</a>; it is far too easy for problems to worsen slowly and never manifest as a singular moment of crisis. Only with an organization-wide awareness and commitment to scaling are you likely to keep on top of these issues.</p>
|
||||
|
||||
<p>Everything your organization relies upon to produce and maintain code should be scalable in terms of overall cost and resource consumption. In particular, everything your organization must do repeatedly should be scalable in terms of human effort. Many common policies don’t seem to be scalable in this sense.</p>
|
||||
|
||||
<section data-type="sect2" id="policies_that_donapostrophet_scale">
|
||||
<h2>Policies That Don’t Scale</h2>
|
||||
|
||||
<p>With a little practice, it becomes easier to spot policies with bad <a contenteditable="false" data-primary="scale and efficiency" data-secondary="policies that don't scale" data-type="indexterm" id="id-xYh1u7URtwuK"> </a>scaling properties. Most commonly, these can be identified by considering the work imposed on a single engineer and imagining the organization scaling up by 10 or 100 times. When we are 10 times larger, will we add 10 times more work with which our sample engineer needs to keep up? Does the amount of work our engineer must perform grow as a function of the size of the organization? Does the work scale up with the size of the codebase? If either of these are true, do we have any mechanisms in place to automate or optimize that work? If not, we have scaling problems.</p>
|
||||
|
||||
<p>Consider a traditional approach to deprecation. <a contenteditable="false" data-primary="deprecation" data-secondary="as example of scaling problems" data-type="indexterm" id="id-lOhBu1H3twu9"> </a>We discuss deprecation much more in <a data-type="xref" href="ch15.html#deprecation">Deprecation</a>, but the common approach to deprecation serves as a great example of scaling problems. A new Widget has been developed. The decision is made that everyone should use the new one and stop using the old one. To motivate this, project leads say “We’ll delete the old Widget on August 15th; make sure you’ve converted to the new Widget.”</p>
|
||||
|
||||
<p>This type of approach might work in a small software setting but quickly fails as both the depth and breadth of the dependency graph increases. Teams depend on an ever-increasing number of Widgets, and a single build break can affect a growing percentage of the company. Solving these problems in a scalable way means changing the way we do deprecation: instead of pushing migration work to customers, teams can internalize it themselves, with all the economies of scale that provides.</p>
|
||||
|
||||
<p>In 2012, we tried to put a stop to this with rules mitigating churn: infrastructure teams must do the work to move their internal users to new versions themselves or do the update in place, in backward-compatible fashion.<a contenteditable="false" data-primary="Churn Rule" data-type="indexterm" id="id-eAh9uKfdtXuz"> </a> This policy, which we’ve called the “Churn Rule,” scales better: dependent projects are no longer spending progressively greater effort just to keep up. We’ve also learned that having a dedicated group of experts execute the change scales better than asking for more maintenance effort from every user: experts spend some time learning the whole problem in depth and then apply that expertise to every subproblem. Forcing users to respond to churn means that every affected team does a worse job ramping up, solves their immediate problem, and then throws away that now-useless knowledge. Expertise scales better.</p>
|
||||
|
||||
<p>The traditional use of development branches is another example of policy that has built-in scaling problems. An organization might identify that merging large features into trunk has destabilized the product and conclude, “We need tighter controls on when things merge. We should merge less frequently.” This leads quickly to every team or every feature having separate dev branches. Whenever any branch is decided to be “complete,” it is tested and merged into trunk, triggering some potentially expensive work for other engineers still working on their dev branch, in the form of resyncing and testing. Such branch management can be made to work for a small organization juggling 5 to 10 such branches. As the size of an organization (and the number of branches) increases, it quickly becomes apparent that we’re paying an ever-increasing amount of overhead to do the same task. We’ll need a different approach as we scale up, and we discuss that in <a data-type="xref" href="ch16.html#version_control_and_branch_management">Version Control and Branch Management</a>.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="policies_that_scale_well">
|
||||
<h2>Policies That Scale Well</h2>
|
||||
|
||||
<p>What sorts of policies result in better costs as the organization grows? <a contenteditable="false" data-primary="scale and efficiency" data-secondary="policies that scale well" data-type="indexterm" id="id-lOhBuzUMIwu9"> </a>Or, better still, what sorts of policies can we put in place that provide superlinear value as the organization grows?</p>
|
||||
|
||||
<p>One of our favorite internal policies is a great enabler of infrastructure teams, protecting their ability to make infrastructure changes safely. “If a product experiences outages or other problems as a result of infrastructure changes, but the issue wasn’t surfaced<a contenteditable="false" data-primary="continuous integration (CI)" data-type="indexterm" id="id-m6hnuxHaIvuX"> </a> by tests in our Continuous Integration (CI) system, it is not the fault of the infrastructure change.” More colloquially, this is phrased as “If you liked it, you should have put a CI test on it,” which we call “The Beyoncé Rule."<sup><a data-type="noteref" id="ch01fn13-marker" href="ch01.html#ch01fn13">13</a></sup> From a scaling<a contenteditable="false" data-primary="Beyoncé Rule" data-type="indexterm" id="id-73hGHRHxIdur"> </a> perspective, the Beyoncé Rule implies that complicated, one-off bespoke tests that aren’t triggered by our common CI system do not count. Without this, an engineer on an infrastructure team could conceivably need to track down every team with any affected code and ask them how to run their tests. We could do that when there were a hundred engineers. We definitely cannot afford to do that anymore.</p>
|
||||
|
||||
<p>We’ve found that expertise and shared communication forums offer great value as an organization scales.<a contenteditable="false" data-primary="expertise" data-secondary="and shared communication forums" data-secondary-sortas="shared" data-type="indexterm" id="id-eAh9uZT6IXuz"> </a> As engineers discuss and answer questions in shared forums, knowledge tends to spread. New experts grow. If you have a hundred engineers writing Java, a single friendly and helpful Java expert willing to answer questions will soon produce a hundred engineers writing better Java code. Knowledge is viral, experts are carriers, and there’s a lot to be said for the value of clearing away the common stumbling blocks for your engineers. We cover this in greater detail in <span class="keep-together"><a data-type="xref" href="ch03.html#knowledge_sharing">Knowledge Sharing</a></span>.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="example_compiler_upgrade">
|
||||
<h2>Example: Compiler Upgrade</h2>
|
||||
|
||||
<p>Consider the daunting task of upgrading your compiler.<a contenteditable="false" data-primary="upgrades" data-secondary="compiler upgrade example" data-type="indexterm" id="ix_upgcmp"> </a><a contenteditable="false" data-primary="compiler upgrage (example)" data-type="indexterm" id="ix_cmpup"> </a><a contenteditable="false" data-primary="scale and efficiency" data-secondary="compiler upgrade (example)" data-type="indexterm" id="ix_sceffcmp"> </a> Theoretically, a compiler upgrade should be cheap given how much effort languages take to be backward compatible, but how cheap of an operation is it in practice? If you’ve never done such an upgrade before, how would you evaluate whether your codebase is compatible with that change?</p>
|
||||
|
||||
<p class="pagebreak-before">In our experience, language and compiler upgrades are subtle and difficult tasks even when they are broadly expected to be backward compatible. A compiler upgrade will almost always result in minor changes to behavior: fixing miscompilations, tweaking optimizations, or potentially changing the results of anything that was previously undefined. How would you evaluate the correctness of your entire codebase against all of these potential outcomes?</p>
|
||||
|
||||
<p>The most storied compiler upgrade in Google’s history took place all the way back in 2006. At that point, we had been operating for a few years and had several thousand engineers on staff. We hadn’t updated compilers in about five years. Most of our engineers had no experience with a compiler change. Most of our code had been exposed to only a single compiler version. It was a difficult and painful task for a team of (mostly) volunteers, which eventually became a matter of finding shortcuts and simplifications in order to work around upstream compiler and language changes that we didn’t know how to adopt.<sup><a data-type="noteref" id="ch01fn14-marker" href="ch01.html#ch01fn14">14</a></sup> In the end, the 2006 compiler upgrade was extremely painful. Many Hyrum’s Law problems, big and small, had crept into the codebase and served to deepen our dependency on a particular compiler version. Breaking those implicit dependencies was painful. The engineers in question were taking a risk: we didn’t have the Beyoncé Rule yet, nor did we have a pervasive CI system, so it was difficult to know the impact of the change ahead of time or be sure they wouldn’t be blamed for regressions.</p>
|
||||
|
||||
<p>This story isn’t at all unusual. Engineers at many companies can tell a similar story about a painful upgrade. What is unusual is that we recognized after the fact that the task had been painful and began focusing on technology and organizational changes to overcome the scaling problems and turn scale to our advantage: automation (so that a single human can do more), consolidation/consistency (so that low-level changes have a limited problem scope), and expertise (so that a few humans can do more).</p>
|
||||
|
||||
<p>The more frequently you change your infrastructure, the easier it becomes to do so. We have found that most of the time, when code is updated as part of something like a compiler upgrade, it becomes less brittle and easier to upgrade in the future. In an ecosystem in which most code has gone through several upgrades, it stops depending on the nuances of the underlying implementation; instead, it depends on the actual abstraction guaranteed by the language or OS. Regardless of what exactly you are upgrading, expect the first upgrade for a codebase to be significantly more expensive than later upgrades, even controlling for other factors.</p>
|
||||
|
||||
<p>Through this and other experiences, we’ve discovered many factors that affect the flexibility<a contenteditable="false" data-primary="codebase" data-secondary="factors affecting flexibility of" data-type="indexterm" id="id-BKhEuEtmC0uA"> </a> of a codebase:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Expertise</dt>
|
||||
<dd>We know how to do this; for some languages, we’ve now done hundreds of compiler upgrades across many platforms.</dd>
|
||||
<dt>Stability</dt>
|
||||
<dd>There is less change between releases because we adopt releases more regularly; for some languages, we’re now deploying compiler upgrades every week or two.</dd>
|
||||
<dt>Conformity</dt>
|
||||
<dd>There is less code that hasn’t been through an upgrade already, again because we are upgrading regularly.</dd>
|
||||
<dt>Familiarity</dt>
|
||||
<dd>Because we do this regularly enough, we can spot redundancies in the process of performing an upgrade and attempt to automate. This overlaps significantly with SRE views on toil.<sup><a data-type="noteref" id="ch01fn15-marker" href="ch01.html#ch01fn15">15</a></sup></dd>
|
||||
<dt>Policy</dt>
|
||||
<dd>We have processes and policies like the Beyoncé Rule. The net effect of these processes is that upgrades remain feasible because infrastructure teams do not need to worry about every unknown usage, only the ones that are visible in our CI <span class="keep-together">systems.</span></dd>
|
||||
</dl>
|
||||
|
||||
<p>The underlying lesson is not about the frequency or difficulty of compiler upgrades, but that as soon as we became aware that compiler upgrade tasks were necessary, we found ways to make sure to perform those tasks with a constant number of engineers, even as the codebase grew.<sup><a data-type="noteref" id="ch01fn16-marker" href="ch01.html#ch01fn16">16</a></sup> If we had instead decided that the task was too expensive and should be avoided in the future, we might still be using a decade-old compiler version. We would be paying perhaps 25% extra for computational resources as a result of missed optimization opportunities. Our central infrastructure could be vulnerable to significant security risks given that a 2006-era compiler is certainly not helping to mitigate speculative execution vulnerabilities. Stagnation is an option, but often not a wise one.<a contenteditable="false" data-primary="upgrades" data-secondary="compiler upgrade example" data-startref="ix_upgcmp" data-type="indexterm" id="id-4whBUwC5CKuR"> </a><a contenteditable="false" data-primary="scale and efficiency" data-secondary="compiler upgrade (example)" data-startref="ix_sceffcmp" data-type="indexterm" id="id-2EhAHECNCYu0"> </a><a contenteditable="false" data-primary="compiler upgrage (example)" data-startref="ix_cmpup" data-type="indexterm" id="id-JxhXT9CnCVuv"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="shifting_left">
|
||||
<h2>Shifting Left</h2>
|
||||
|
||||
<p>One of the broad truths we’ve seen to be true is the idea that finding problems earlier in the developer workflow usually reduces costs.<a contenteditable="false" data-primary="shifting left" data-type="indexterm" id="id-eAh9ugUBcXuz"> </a><a contenteditable="false" data-primary="costs" data-secondary="reducing by finding problems earlier in development" data-type="indexterm" id="id-73h8UYUdcdur"> </a><a contenteditable="false" data-primary="scale and efficiency" data-secondary="finding problems earlier in developer workflow" data-type="indexterm" id="id-5ghMHzUEcZu6"> </a> Consider a timeline of the developer workflow for a feature that progresses from left to right, starting from conception and design, progressing through implementation, review, testing, commit, canary, and eventual production deployment. Shifting problem detection to the “left” earlier on this timeline makes it cheaper to fix than waiting longer, as shown in <a data-type="xref" href="ch01.html#timeline_of_the_developer_workflow">Figure 1-2</a>.</p>
|
||||
|
||||
<p>This term seems to have originated from arguments that security mustn’t be deferred until the end of the development process, with requisite calls to “shift left on security.” The argument in this case is relatively simple: if a security problem is discovered only after your product has gone to production, you have a very expensive problem. If it is caught before deploying to production, it may still take a lot of work to identify and remedy the problem, but it’s cheaper. If you can catch it before the original developer commits the flaw to version control, it’s even cheaper: they already have an understanding of the feature; revising according to new security constraints is cheaper than committing and forcing someone else to triage it and fix it.</p>
|
||||
|
||||
<figure id="timeline_of_the_developer_workflow"><img alt="Timeline of the developer workflow" src="images/seag_0102.png">
|
||||
<figcaption><span class="label">Figure 1-2. </span>Timeline of the developer workflow</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The same basic pattern emerges many times in this book. Bugs that are caught by static analysis and code review before they are committed are much cheaper than bugs that make it to production. Providing tools and practices that highlight quality, reliability, and security early in the development process is a common goal for many of our infrastructure teams. No single process or tool needs to be perfect, so we can assume a defense-in-depth approach, hopefully catching as many defects on the left side of the graph as possible. <a contenteditable="false" data-primary="scale and efficiency" data-startref="ix_sceff" data-type="indexterm" id="id-qlhXubfVcKua"> </a><a contenteditable="false" data-primary="software engineering" data-secondary="scale and efficiency" data-startref="ix_sftengscef" data-type="indexterm" id="id-BKh7URfLc0uA"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tradeoffs_and_costs">
|
||||
<h1>Trade-offs and Costs</h1>
|
||||
|
||||
<p>If we understand how to program, understand the lifetime <a contenteditable="false" data-primary="software engineering" data-secondary="trade-offs and costs" data-type="indexterm" id="ix_sftengtrco"> </a>of the software we’re maintaining, and <a contenteditable="false" data-primary="trade-offs" data-secondary="cost/benefit" data-type="indexterm" id="ix_troff"> </a>understand how to maintain<a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-type="indexterm" id="ix_costtrd"> </a> it as we scale up with more engineers producing and maintaining new features, all that is left is to make good decisions. This seems obvious: in software engineering, as in life, good choices lead to good outcomes. However, the ramifications of this observation are easily overlooked. Within Google, there is a strong distaste for “because I said so.” It is important for there to be a decider for any topic and clear escalation paths when decisions seem to be wrong, but the goal is consensus, not unanimity. It’s fine and expected to see some instances of “I don’t agree with your metrics/valuation, but I see how you can come to that conclusion.” Inherent in all of this is the idea that there needs to be a reason for everything; “just because,” “because I said so,” or “because everyone else does it this way” are places where bad decisions lurk. Whenever it is efficient to do so, we should be able to explain our work when deciding between the general costs for two engineering options.</p>
|
||||
|
||||
<p>What do we mean by cost? <a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="types of costs" data-type="indexterm" id="id-O5hxugHdU4"> </a>We are not only talking about dollars here. “Cost” roughly translates to effort and can involve any or all of these factors:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Financial costs (e.g., money)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Resource costs (e.g., CPU time)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Personnel costs (e.g., engineering effort)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Transaction costs (e.g., what does it cost to take action?)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Opportunity costs (e.g., what does it cost to not take action?)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Societal costs (e.g., what impact will this choice have on society at large?)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Historically, it’s been particularly easy to ignore the question of societal costs.<a contenteditable="false" data-primary="societal costs" data-type="indexterm" id="id-XEh0uqfyUM"> </a> However, Google and other large tech companies can now credibly deploy products with billions of users. In many cases, these products are a clear net benefit, but when we’re operating at such a scale, even small discrepancies in usability, accessibility, fairness, or potential for abuse are magnified, often to the detriment of groups that are already marginalized. Software pervades so many aspects of society and culture; therefore, it is wise for us to be aware of both the good and the bad that we enable when making product and technical decisions. We discuss this much more in <a data-type="xref" href="ch04.html#engineering_for_equity">Engineering for Equity</a>.</p>
|
||||
|
||||
<p>In addition to the aforementioned costs (or our estimate of them), there are biases: status quo bias, loss aversion, and others. <a contenteditable="false" data-primary="biases" data-type="indexterm" id="id-MehAuQSyU2"> </a>When we evaluate cost, we need to keep all of the previously listed costs in mind: the health of an organization isn’t just whether there is money in the bank, it’s also whether its members are feeling valued and productive.<a contenteditable="false" data-primary="personnel costs" data-type="indexterm" id="id-xYh6UxS2UO"> </a> In highly creative and lucrative fields like software engineering, financial cost is usually not the limiting factor—personnel cost usually is. Efficiency gains from keeping engineers happy, focused, and engaged can easily dominate other factors, simply because focus and productivity are so variable, and a 10-to-20% difference is easy to imagine.</p>
|
||||
|
||||
<section data-type="sect2" id="example_markers">
|
||||
<h2>Example: Markers</h2>
|
||||
|
||||
<p>In many organizations, whiteboard markers are treated as precious goods. They are tightly controlled and always in short supply.<a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="whiteboard markers (example)" data-type="indexterm" id="id-lOhBuzU3tnU9"> </a><a contenteditable="false" data-primary="trade-offs" data-secondary="cost/benefit" data-tertiary="whiteboard markers (example)" data-type="indexterm" id="id-m6hBUMUOtNUX"> </a> Invariably, half of the markers at any given whiteboard are dry and unusable. How often have you been in a meeting that was disrupted by lack of a working marker? How often have you had your train of thought derailed by a marker running out? How often have all the markers just gone missing, presumably because some other team ran out of markers and had to abscond with yours? All for a product that costs less than a dollar.</p>
|
||||
|
||||
<p>Google tends to have unlocked closets full of office supplies, including whiteboard markers, in most work areas. With a moment’s notice it is easy to grab dozens of markers in a variety of colors. Somewhere along the line we made an explicit trade-off: it is far more important to optimize for obstacle-free brainstorming than to protect against someone wandering off with a bunch of markers.</p>
|
||||
|
||||
<p>We aim to have the same level of eyes-open and explicit weighing of the cost/benefit trade-offs involved for everything we do, from office supplies and employee perks through day-to-day experience for developers to how to provision and run global-scale services. <a contenteditable="false" data-primary="data-driven culture" data-secondary="about" data-type="indexterm" id="id-eAh9uZTdt0Uz"> </a><a contenteditable="false" data-primary="culture" data-secondary="data-driven" data-type="indexterm" id="id-73h8UgTYt8Ur"> </a>We often say, “Google is a data-driven culture.” In fact, that’s a simplification: even when there isn’t <em>data</em>, there might still be <em>evidence</em>, <em>precedent</em>, and <em>argument</em>. Making good engineering decisions is all about weighing all of the available inputs and making informed decisions about the trade-offs. Sometimes, those decisions are based on instinct or accepted best practice, but only after we have exhausted approaches that try to measure or estimate the true underlying costs.</p>
|
||||
|
||||
<p>In the end, decisions in an engineering group <a contenteditable="false" data-primary="decisions" data-secondary="in an engineering group, justifications for" data-type="indexterm" id="id-73hDuxfYt8Ur"> </a>should come down to very few things:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>We are doing this because we must (legal requirements, customer requirements).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>We are doing this because it is the best option (as determined by some appropriate decider) we can see at the time, based on current evidence.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Decisions should not be “We are doing this because I said so.”<sup><a data-type="noteref" id="ch01fn17-marker" href="ch01.html#ch01fn17">17</a></sup></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="inputs_to_decision_making">
|
||||
<h2>Inputs to Decision Making</h2>
|
||||
|
||||
<p>When we are <a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="inputs to decision making" data-type="indexterm" id="id-m6hnuMUaINUX"> </a>weighing data, we<a contenteditable="false" data-primary="decisions" data-secondary="inputs to decision making" data-type="indexterm" id="id-eAhXUgU6I0Uz"> </a> find two common<a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="inputs to decision making" data-type="indexterm" id="id-73hGHYUxI8Ur"> </a> scenarios:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>All of the quantities involved are measurable or can at least be estimated. This usually means that we’re evaluating trade-offs between CPU and network, or dollars and RAM, or considering whether to spend two weeks of engineer-time in order to save <em>N</em> CPUs across our datacenters.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Some of the quantities are subtle, or we don’t know how to measure them. Sometimes this manifests as “We don’t know how much engineer-time this will take.” Sometimes it is even more nebulous: how do you measure the engineering cost of a poorly designed API? Or the societal impact of a product choice?</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>There is little reason to be deficient on the first type of decision. Any software engineering organization can and should track the current cost for compute resources, engineer-hours, and other quantities you interact with regularly. Even if you don’t want to publicize to your organization the exact dollar amounts, you can still produce a conversion table: this many CPUs cost the same as this much RAM or this much network bandwidth.</p>
|
||||
|
||||
<p>With an agreed-upon conversion table in hand, every engineer can do their own analysis. “If I spend two weeks changing this linked-list into a higher-performance structure, I’m going to use five gibibytes more production RAM but save two thousand CPUs. Should I do it?” Not only does this question depend upon the relative cost of RAM and CPUs, but also on personnel costs (two weeks of support for a software engineer) and opportunity costs (what else could that engineer produce in two weeks?).</p>
|
||||
|
||||
<p>For the second type of decision, there is no easy answer. We rely on experience, leadership, and precedent to negotiate these issues.<a contenteditable="false" data-primary="measurements" data-secondary="in hard-to-quantify areas" data-type="indexterm" id="id-qlhXuoS8IBUa"> </a> We’re investing in research to help us quantify the hard-to-quantify (see <a data-type="xref" href="ch07.html#measuring_engineering_productivity">Measuring Engineering Productivity</a>). However, the best broad suggestion that we have is to be aware that not everything is measurable or predictable and to attempt to treat such decisions with the same priority and greater care. They are often just as important, but more difficult to manage.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="example_distributed_builds">
|
||||
<h2>Example: Distributed Builds</h2>
|
||||
|
||||
<p>Consider your build. According to completely unscientific Twitter polling, something like 60 to 70% of developers build locally, even with today’s large, complicated builds.<a contenteditable="false" data-primary="distributed builds" data-secondary="trade-offs and costs example" data-type="indexterm" id="id-eAh9ugUbC0Uz"> </a> <a contenteditable="false" data-primary="trade-offs" data-secondary="cost/benefit" data-tertiary="distributed builds (example)" data-type="indexterm" id="id-73h8UYUWC8Ur"> </a><a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="distributed builds (example)" data-type="indexterm" id="id-5ghMHzUoC9U6"> </a>This leads directly to nonjokes as illustrated by <a href="https://xkcd.com/303">this "Compiling" comic</a>—how much productive time in your organization is lost waiting for a build? Compare that to the cost to run something like <code>distcc</code> for a small group. Or, how much does it cost to run a small build farm for a large group? How many weeks/months does it take for those costs to be a net win?</p>
|
||||
|
||||
<p>Back in the mid-2000s, Google relied purely on a local build system: you checked out code and you compiled it locally. We had massive local machines in some cases (you could build Maps on your desktop!), but compilation times became longer and longer as the codebase grew. Unsurprisingly, we incurred increasing overhead in personnel costs due to lost time, as well as increased resource costs for larger and more powerful local machines, and so on. These resource costs were particularly troublesome: of course we want people to have as fast a build as possible, but most of the time, a high-performance desktop development machine will sit idle. This doesn’t feel like the proper way to invest those resources.</p>
|
||||
|
||||
<p>Eventually, Google developed its own distributed build system. Development of this system incurred a cost, of course: it took engineers time to develop, it took more engineer time to change everyone’s habits and workflow and learn the new system, and of course it cost additional computational resources. But the overall savings were clearly worth it: builds became faster, engineer-time was recouped, and hardware investment could focus on managed shared infrastructure (in actuality, a subset of our production fleet) rather than ever-more-powerful desktop machines. <a data-type="xref" href="ch18.html#build_systems_and_build_philosophy">Build Systems and Build Philosophy</a> goes into more of the details on our approach to distributed builds and the relevant trade-offs.</p>
|
||||
|
||||
<p>So, we built a new system, deployed it to production, and sped up everyone’s build. Is that the happy ending to the story? Not quite: providing a distributed build system made massive improvements to engineer productivity, but as time went on, the distributed builds themselves became bloated. What was constrained in the previous case by individual engineers (because they had a vested interest in keeping their local builds as fast as possible) was unconstrained within a distributed build system. Bloated or unnecessary dependencies in the build graph became all too common. When everyone directly felt the pain of a nonoptimal build and was incentivized to be vigilant, incentives were better aligned. By removing those incentives and hiding bloated dependencies in a parallel distributed build, we created a situation in which consumption could run rampant, and almost nobody was incentivized to keep an eye on build bloat. <a contenteditable="false" data-primary="Jevons Paradox" data-type="indexterm" id="id-qlhXubfQCBUa"> </a>This is reminiscent of <a href="https://oreil.ly/HL0sl">Jevons Paradox</a>: consumption of a resource may <em>increase</em> as a response to greater efficiency in its use.</p>
|
||||
|
||||
<p>Overall, the saved costs associated with adding a distributed build system far, far outweighed the negative costs associated with its construction and maintenance. But, as we saw with increased consumption, we did not foresee all of these costs. Having blazed ahead, we found ourselves in a situation in which we needed to reconceptualize the goals and constraints of the system and our usage, identify best practices (small dependencies, machine-management of dependencies), and fund the tooling and maintenance for the new ecosystem. Even a relatively simple trade-off of the form “We’ll spend $$$s for compute resources to recoup engineer time” had unforeseen downstream effects.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="example_deciding_between_time_and_scale">
|
||||
<h2>Example: Deciding Between Time and Scale</h2>
|
||||
|
||||
<p>Much of the time, our major themes of time and scale overlap and work in conjunction.<a contenteditable="false" data-primary="trade-offs" data-secondary="cost/benefit" data-tertiary="deciding between time and scale (example)" data-type="indexterm" id="id-73hDuYUdc8Ur"> </a><a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="deciding between time and scale (example)" data-type="indexterm" id="id-5gh6UzUEc9U6"> </a><a contenteditable="false" data-primary="time" data-secondary="deciding between time and scale" data-type="indexterm" id="id-qlheHKUVcBUa"> </a><a contenteditable="false" data-primary="scale" data-secondary="deciding between time and" data-type="indexterm" id="id-BKh5T4ULcbUA"> </a> A policy like the Beyoncé Rule scales well and helps us maintain things over time. A change to an OS interface might require many small refactorings to adapt to, but most of those changes will scale well because they are of a similar form: the OS change doesn’t manifest differently for every caller and every project.</p>
|
||||
|
||||
<p>Occasionally time and scale come into conflict, and nowhere so clearly as in the basic question: should we add a dependency or fork/reimplement it to better suit our local needs?<a contenteditable="false" data-primary="reimplementing/forking versus adding a dependency" data-type="indexterm" id="id-5ghWuLHEc9U6"> </a><a contenteditable="false" data-primary="dependencies" data-secondary="forking/reimplementing versus adding a dependency" data-type="indexterm" id="id-qlhOU7HVcBUa"> </a><a contenteditable="false" data-primary="forking/reimplementing versus adding a dependency" data-type="indexterm" id="id-BKhYHBHLcbUA"> </a></p>
|
||||
|
||||
<p>This question can arise at many levels of the software stack because it is regularly the case that a bespoke solution customized for your narrow problem space may outperform the general utility solution that needs to handle all possibilities. By forking or reimplementing utility code and customizing it for your narrow domain, you can add new features with greater ease, or optimize with greater certainty, regardless of whether we are talking about a microservice, an in-memory cache, a compression routine, or anything else in our software ecosystem. Perhaps more important, the control you gain from such a fork isolates you from changes in your underlying dependencies: those changes aren’t dictated by another team or third-party provider. You are in control of how and when to react to the passage of time and necessity to change.</p>
|
||||
|
||||
<p>On the other hand, if every developer forks everything used in their software project instead of reusing <a contenteditable="false" data-primary="scalability" data-secondary="forking and" data-type="indexterm" id="id-BKhEuRfLcbUA"> </a>what exists, scalability suffers alongside sustainability.<a contenteditable="false" data-primary="sustainability" data-secondary="forking and" data-type="indexterm" id="id-YGhEURfGcGUy"> </a> Reacting to a security issue in an underlying library is no longer a matter of updating a single dependency and its users: it is now a matter of identifying every vulnerable fork of that dependency and the users of those forks.</p>
|
||||
|
||||
<p>As with most software engineering decisions, there isn’t a one-size-fits-all answer to this situation. If your project life span is short, forks are less risky. If the fork in question is provably limited in scope, that helps, as well—avoid forks for interfaces that could operate across time or project-time boundaries (data structures, serialization formats, networking protocols). Consistency has great value, but generality comes with its own costs, and you can often win by doing your own thing—if you do it <span class="keep-together">carefully.</span></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="revisiting_decisionscomma_making_mistak">
|
||||
<h2>Revisiting Decisions, Making Mistakes</h2>
|
||||
|
||||
<p>One of the unsung<a contenteditable="false" data-primary="culture" data-secondary="data-driven" data-type="indexterm" id="id-5ghWuzU3h9U6"> </a> benefits of committing to a data-driven culture is the combined ability and necessity of admitting to mistakes.<a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-tertiary="mistakes in decision making" data-type="indexterm" id="id-qlhOUKUNhBUa"> </a><a contenteditable="false" data-primary="trade-offs" data-secondary="cost/benefit" data-tertiary="mistakes in decision making" data-type="indexterm" id="id-BKhYH4UdhbUA"> </a><a contenteditable="false" data-primary="data-driven culture" data-secondary="admitting to mistakes" data-type="indexterm" id="id-YGhxT4U7hGUy"> </a><a contenteditable="false" data-primary="decisions" data-secondary="admitting to making mistakes" data-type="indexterm" id="id-LmhBfzUBhZUo"> </a> A decision will be made at some point, based on the available data—hopefully based on good data and only a few assumptions, but implicitly based on currently available data. As new data comes in, contexts change, or assumptions are dispelled, it might become clear that a decision was in error or that it made sense at the time but no longer does. This is particularly critical for a long-lived organization: time doesn’t only trigger changes in technical dependencies and software systems, but in data used to drive decisions.</p>
|
||||
|
||||
<p>We believe strongly in data informing decisions, but we recognize that the data will change over time, and new data may present itself. This means, inherently, that decisions will need to be revisited from time to time over the life span of the system in question. For long-lived projects, it’s often critical to have the ability to change directions after an initial decision is made. And, importantly, it means that the deciders need to have the right to admit mistakes. Contrary to some people’s instincts, leaders who admit mistakes are more respected, not less.</p>
|
||||
|
||||
<p>Be evidence driven, but also realize that things that can’t be measured may still have value. If you’re a leader, that’s what you’ve been asked to do: exercise judgement, assert that things are important. <a contenteditable="false" data-primary="costs" data-secondary="trade-offs and" data-startref="ix_costtrd" data-type="indexterm" id="id-BKhEurTdhbUA"> </a><a contenteditable="false" data-primary="software engineering" data-secondary="trade-offs and costs" data-startref="ix_sftengtrco" data-type="indexterm" id="id-YGhEUaT7hGUy"> </a><a contenteditable="false" data-primary="trade-offs" data-secondary="cost/benefit" data-startref="ix_troff" data-type="indexterm" id="id-LmhEH8TBhZUo"> </a>We’ll speak more on leadership in Chapters <a data-type="xref" data-xrefstyle="select:labelnumber" href="ch05.html#how_to_lead_a_team">How to Lead a Team</a> and <a data-type="xref" data-xrefstyle="select:labelnumber" href="ch06.html#leading_at_scale">Leading at Scale</a>.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="software_engineering_versus_programming">
|
||||
<h1>Software Engineering Versus Programming</h1>
|
||||
|
||||
<p>When presented<a contenteditable="false" data-primary="software engineering" data-secondary="programming versus" data-type="indexterm" id="id-O5hxuxUrH4"> </a> with <a contenteditable="false" data-primary="programming" data-secondary="software engineering versus" data-type="indexterm" id="id-6zhKUqUbHm"> </a>our distinction between software engineering and programming, you might ask whether there is an inherent value judgement in play. Is programming somehow worse than software engineering? Is a project that is expected to last a decade with a team of hundreds inherently more valuable than one that is useful for only a month and built by two people?</p>
|
||||
|
||||
<p>Of course not. Our point is not that software engineering is superior, merely that these represent two different problem domains with distinct constraints, values, and best practices. Rather, the value in pointing out this difference comes from recognizing that some tools are great in one domain but not in the other. You probably don’t need to rely on integration tests (see <a data-type="xref" href="ch14.html#larger_testing">Larger Testing</a>) and Continuous Deployment (CD) practices (see <a data-type="xref" href="ch24.html#continuous_delivery-id00035">Continuous Delivery</a>) for a project that will last only a few days. Similarly, all of our long-term concerns about semantic versioning (SemVer) and dependency management in software engineering projects (see <a data-type="xref" href="ch21.html#dependency_management">Dependency Management</a>) don’t really apply for short-term programming projects: use whatever is available to solve the task at hand.</p>
|
||||
|
||||
<p>We believe it is important to differentiate between the related-but-distinct terms “programming” and “software engineering.” Much of that difference stems from the management of code over time, the impact of time on scale, and decision making in the face of those ideas. Programming is the immediate act of producing code. Software engineering is the set of policies, practices, and tools that are necessary to make that code useful for as long as it needs to be used and allowing collaboration across a team.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>This book discusses all of these topics: policies for an organization and for a single programmer, how to evaluate and refine your best practices, and the tools and technologies that go into maintainable software. Google has worked hard to have a sustainable codebase and culture. We don’t necessarily think that our approach is the one true way to do things, but it does provide proof by example that it can be done. We hope it will provide a useful framework for thinking about the general problem: how do you maintain your code for as long as it needs to keep working?</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondr">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>“Software engineering” differs from “programming” in dimensionality: programming is about producing code. Software engineering extends that to include the maintenance of that code for its useful life span.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>There is a factor of at least 100,000 times between the life spans of short-lived code and long-lived code. It is silly to assume that the same best practices apply universally on both ends of that spectrum.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Software is sustainable when, for the expected life span of the code, we are capable of responding to changes in dependencies, technology, or product requirements. We may choose to not change things, but we need to be capable.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Hyrum’s Law: with a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Every task your organization has to do repeatedly should be scalable (linear or better) in terms of human input. Policies are a wonderful tool for making process scalable.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Process inefficiencies and other software-development tasks tend to scale up slowly. Be careful about boiled-frog problems.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Expertise pays off particularly well when combined with economies of scale.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>“Because I said so” is a terrible reason to do things.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Being data driven is a good start, but in reality, most decisions are based on a mix of data, assumption, precedent, and argument. It’s best when objective data makes up the majority of those inputs, but it can rarely be <em>all</em> of them.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Being data driven over time implies the need to change directions when the data changes (or when assumptions are dispelled). Mistakes or revised plans are inevitable.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn1"><sup><a href="ch01.html#ch01fn1-marker">1</a></sup>We don’t mean “execution lifetime,” we mean “maintenance lifetime”—how long will the code continue to be built, executed, and maintained? How long will this software provide value?</p><p data-type="footnote" id="ch01fn2"><sup><a href="ch01.html#ch01fn2-marker">2</a></sup>This is perhaps a reasonable hand-wavy definition of technical debt: things that “should” be done, but aren’t yet—the delta between our code and what we wish it was.</p><p data-type="footnote" id="ch01fn3"><sup><a href="ch01.html#ch01fn3-marker">3</a></sup>Also consider the issue of whether we know ahead of time that a project is going to be long lived.</p><p data-type="footnote" id="ch01fn4"><sup><a href="ch01.html#ch01fn4-marker">4</a></sup>There is some question as to the original attribution of this quote; consensus seems to be that it was originally phrased by Brian Randell or Margaret Hamilton, but it might have been wholly made up by Dave Parnas. The common citation for it is “Software Engineering Techniques: Report of a conference sponsored by the NATO Science Committee,” Rome, Italy, 27–31 Oct. 1969, Brussels, Scientific Affairs Division, NATO.</p><p data-type="footnote" id="ch01fn5"><sup><a href="ch01.html#ch01fn5-marker">5</a></sup>Frederick P. Brooks Jr. <em>The Mythical Man-Month: Essays on Software Engineering</em> (Boston: Addison-Wesley, 1995).</p><p data-type="footnote" id="ch01fn6"><sup><a href="ch01.html#ch01fn6-marker">6</a></sup>Appcelerator, “<a href="https://oreil.ly/pnT2_">Nothing is Certain Except Death, Taxes and a Short Mobile App Lifespan</a>,” Axway Developer blog, December 6, 2012.</p><p data-type="footnote" id="ch01fn7"><sup><a href="ch01.html#ch01fn7-marker">7</a></sup>Your own priorities and tastes will inform where exactly that transition happens. We’ve found that most projects seem to be willing to upgrade within five years. Somewhere between 5 and 10 years seems like a conservative estimate for this transition in general.</p><p data-type="footnote" id="ch01fn8"><sup><a href="ch01.html#ch01fn8-marker">8</a></sup>To his credit, Hyrum tried really hard to humbly call this “The Law of Implicit Dependencies,” but “Hyrum’s Law” is the shorthand that most people at Google have settled on.</p><p data-type="footnote" id="ch01fn9"><sup><a href="ch01.html#ch01fn9-marker">9</a></sup>See “<a href="https://xkcd.com/1172">Workflow</a>,” an <em>xkcd</em> comic.</p><p data-type="footnote" id="ch01fn10"><sup><a href="ch01.html#ch01fn10-marker">10</a></sup>A type of Denial-of-Service (DoS) attack in which an untrusted user knows the structure of a hash table and the hash function and provides data in such a way as to degrade the algorithmic performance of operations on the table.</p><p data-type="footnote" id="ch01fn11"><sup><a href="ch01.html#ch01fn11-marker">11</a></sup>Beyer, B. et al. <a class="orm:hideurl" href="http://shop.oreilly.com/product/0636920041528.do"><em>Site Reliability Engineering: How Google Runs Production Systems</em></a>. (Boston: O'Reilly Media, 2016).</p><p data-type="footnote" id="ch01fn12"><sup><a href="ch01.html#ch01fn12-marker">12</a></sup>Whenever we use “scalable” in an informal context in this chapter, we mean “sublinear scaling with regard to human interactions.”</p><p data-type="footnote" id="ch01fn13"><sup><a href="ch01.html#ch01fn13-marker">13</a></sup>This is a reference to the popular song "Single Ladies," which includes the refrain “If you liked it then you shoulda put a ring on it.”</p><p data-type="footnote" id="ch01fn14"><sup><a href="ch01.html#ch01fn14-marker">14</a></sup>Specifically, interfaces from the C++ standard library needed to be referred to in namespace std, and an optimization change for <code>std::string</code> turned out to be a significant pessimization for our usage, thus requiring some additional workarounds.</p><p data-type="footnote" id="ch01fn15"><sup><a href="ch01.html#ch01fn15-marker">15</a></sup>Beyer et al. <em>Site Reliability Engineering: How Google Runs Production Systems</em>, Chapter 5, "Eliminating Toil."</p><p data-type="footnote" id="ch01fn16"><sup><a href="ch01.html#ch01fn16-marker">16</a></sup>In our experience, an average software engineer (SWE) produces a pretty constant number of lines of code per unit time. For a fixed SWE population, a codebase grows linearly—proportional to the count of SWE-months over time. If your tasks require effort that scales with lines of code, that’s concerning.</p><p data-type="footnote" id="ch01fn17"><sup><a href="ch01.html#ch01fn17-marker">17</a></sup>This is not to say that decisions need to be made unanimously, or even with broad consensus; in the end, someone must be the decider. This is primarily a statement of how the decision-making process should flow for whoever is actually responsible for the decision.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
382
clones/abseil.io/resources/swe-book/html/ch02.html
Normal file
|
@ -0,0 +1,382 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="how_to_work_well_on_teams">
|
||||
<h1>How to Work Well on Teams</h1>
|
||||
|
||||
<p class="byline">Written by Brian Fitzpatrick</p>
|
||||
|
||||
<p class="byline">Edited by Riona MacNamara</p>
|
||||
|
||||
<p>Because this chapter is about the cultural and social aspects of software engineering at Google, it makes sense to begin by focusing on the one variable over which you definitely have control: you.</p>
|
||||
|
||||
<p>People are inherently imperfect—we like to say that humans are mostly a collection of intermittent bugs. But before you can understand the bugs in your coworkers, you need to understand the bugs in yourself. We’re going to ask you to think about your own reactions, behaviors, and attitudes—and in return, we hope you gain some real insight into how to become a more efficient and successful software engineer who spends less energy dealing with people-related problems and more time writing great code.</p>
|
||||
|
||||
<p>The critical idea in this chapter is that software development is a team endeavor. And to succeed on an engineering team—or in any other creative collaboration—you need to reorganize your behaviors around the core principles of humility, respect, and trust.</p>
|
||||
|
||||
<p>Before we get ahead of ourselves, let’s begin by observing how software engineers tend to behave in general.</p>
|
||||
|
||||
<section data-type="sect1" id="help_me_hide_my_code">
|
||||
<h1>Help Me Hide My Code</h1>
|
||||
|
||||
<p>For the past 20 years, my colleague Ben<sup><a data-type="noteref" id="ch01fn18-marker" href="ch02.html#ch01fn18">1</a></sup> and I have spoken at many programming conferences. In 2006, we launched Google’s (now deprecated) open source Project Hosting service, and at first, we used to get lots of questions and requests about the product. But around mid-2008, we began to notice a trend in the sort of requests we were getting:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>"Can you please give Subversion on Google Code the ability to hide specific branches?"</p>
|
||||
|
||||
<p>"Can you make it possible to create open source projects that start out hidden to the world and then are revealed when they’re ready?"</p>
|
||||
|
||||
<p>"Hi, I want to rewrite all my code from scratch, can you please wipe all the history?"</p>
|
||||
</blockquote>
|
||||
|
||||
<p>Can you spot a common theme to these requests?</p>
|
||||
|
||||
<p>The answer is <em>insecurity</em>. People<a contenteditable="false" data-primary="insecurity" data-type="indexterm" id="id-PdSzCPh4U8"> </a> are afraid of others seeing and judging their work in progress. In one sense, insecurity is just a part of human nature—nobody likes to be criticized, especially for things that aren’t finished. Recognizing this theme tipped us off to a more general trend within software development: insecurity is actually a symptom of a larger problem.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="the_genius_myth">
|
||||
<h1>The Genius Myth</h1>
|
||||
|
||||
<p>Many humans have the instinct to find and worship idols.<a contenteditable="false" data-primary="teams" data-secondary="Genius Myth and" data-type="indexterm" id="id-YdSEI8Clcl"> </a><a contenteditable="false" data-primary="hero worship" data-type="indexterm" id="id-WdSlCQCpcA"> </a> For software engineers, those might be Linus Torvalds, Guido Van Rossum, Bill Gates—all heroes who changed the world with heroic feats. Linus wrote Linux by himself, right?</p>
|
||||
|
||||
<p>Actually, what Linus<a contenteditable="false" data-primary="Torvalds, Linus" data-type="indexterm" id="id-WdS2I2HpcA"> </a> did was write just the beginnings of a proof-of-concept Unix-like kernel and show it to an email list. <a contenteditable="false" data-primary="Linux" data-secondary="developers of" data-type="indexterm" id="id-mGSMClHWcb"> </a>That was no small accomplishment, and it was definitely an impressive achievement, but it was just the tip of the iceberg. Linux is hundreds of times bigger than that initial kernel and was developed by <em>thousands</em> of smart people. Linus’ real achievement was to lead these people and coordinate their work; Linux is the shining result not of his original idea, but of the collective labor of the <a contenteditable="false" data-primary="Unix, developers of" data-type="indexterm" id="id-1JS8tvHKcx"> </a>community. (And Unix itself was not entirely written by Ken Thompson and Dennis Ritchie, but by a group of smart people at Bell Labs.)</p>
|
||||
|
||||
<p>On that same note, did Guido Van Rossum personally write all of Python?<a contenteditable="false" data-primary="Van Rossum, Guido" data-type="indexterm" id="id-mGSdIvtWcb"> </a> Certainly, he wrote the first version. <a contenteditable="false" data-primary="Python" data-type="indexterm" id="id-PdSzCetQc8"> </a>But hundreds of others were responsible for contributing to subsequent versions, including ideas, features, and bug fixes.<a contenteditable="false" data-primary="Jobs, Steve" data-type="indexterm" id="id-1JSNHXtKcx"> </a> Steve Jobs led an entire team that built the Macintosh, and although Bill Gates is known for writing a BASIC interpreter for early home computers, his bigger achievement was building a successful company around MS-DOS. <a contenteditable="false" data-primary="Gates, Bill" data-type="indexterm" id="id-ZdSrtAtNce"> </a>Yet they all became leaders and symbols of the collective achievements of their communities. The Genius Myth is the tendency that we as humans need to ascribe the success of a team to a single person/leader.<a contenteditable="false" data-primary="Genius Myth" data-type="indexterm" id="id-DdS0hetDco"> </a></p>
|
||||
|
||||
<p>And what about Michael Jordan?</p>
|
||||
|
||||
<p>It’s the same story.<a contenteditable="false" data-primary="Jordan, Michael" data-type="indexterm" id="id-1JSXI7uKcx"> </a> We idolized him, but the fact is that he didn’t win every basketball game by himself. His true genius was in the way he worked with his team. The team’s coach, Phil Jackson, was extremely clever, and his coaching techniques are legendary. He recognized that one player alone never wins a championship, and so he assembled an entire “dream team” around MJ. This team was a well-oiled machine—at least as impressive as Michael himself.</p>
|
||||
|
||||
<p>So, why do we repeatedly idolize the individual in these stories? Why do people buy products endorsed by celebrities? Why do we want to buy Michelle Obama’s dress or Michael Jordan’s shoes?</p>
|
||||
|
||||
<p>Celebrity is a big part of it.<a contenteditable="false" data-primary="celebrity" data-type="indexterm" id="id-DdS2IZUDco"> </a> Humans have a natural instinct to find leaders and role models, idolize them, and attempt to imitate them. We all need heroes for inspiration, and the programming world has its heroes, too.<a contenteditable="false" data-primary="techie-celebrity phenomenon" data-type="indexterm" id="id-y8SYCJU1cj"> </a> The phenomenon of “techie-celebrity” has almost spilled over into mythology. We all want to write something world-changing like Linux or design the next brilliant programming language.</p>
|
||||
|
||||
<p>Deep down, many engineers secretly wish to be seen as geniuses. <a contenteditable="false" data-primary="hiding your work" data-secondary="Genius Myth and" data-type="indexterm" id="id-y8SAIAc1cj"> </a>This fantasy goes something like this:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>You are struck by an awesome new concept.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>You vanish into your cave for weeks or months, slaving away at a perfect implementation of your idea.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>You then “unleash” your software on the world, shocking everyone with your genius.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Your peers are astonished by your cleverness.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>People line up to use your software.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Fame and fortune follow naturally.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>But hold on: time for a reality check. You’re probably not a genius.</p>
|
||||
|
||||
<p>No offense, of course—we’re sure that you’re a very intelligent person. But do you realize how rare actual geniuses really are? Sure, you write code, and that’s a tricky skill. But even if you are a genius, it turns out that that’s not enough. Geniuses still make mistakes, and having brilliant ideas and elite programming skills doesn’t guarantee that your software will be a hit. Worse, you might find yourself solving only analytical problems and not <em>human</em> problems.<a contenteditable="false" data-primary="human problems, solving" data-type="indexterm" id="id-lrSPCNs9cW"> </a> Being a genius is most definitely not an excuse for being a jerk: anyone—genius or not—with poor social skills tends to be a poor teammate.<a contenteditable="false" data-primary="social skills" data-type="indexterm" id="id-NdS9Hds8cg"> </a> The vast majority of the work at Google (and at most companies!) doesn’t require genius-level intellect, but 100% of the work requires a minimal level of social skills. What will make or break your career, especially at a company like <span class="keep-together">Google,</span> is how well you collaborate with others.</p>
|
||||
|
||||
<p>It turns out that this<a contenteditable="false" data-primary="insecurity" data-secondary="manifestation in Genius Myth" data-type="indexterm" id="id-lrSgILF9cW"> </a> Genius Myth is just another manifestation of our insecurity. Many programmers are afraid to share work they’ve only just started because it means peers will see their mistakes and know the author of the code is not a genius.</p>
|
||||
|
||||
<p>To quote a friend:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>I know I get SERIOUSLY insecure about people looking before something is done. Like they are going to seriously judge me and think I’m an idiot.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>This is an extremely common feeling among programmers, and the natural reaction is to hide in a cave, work, work, work, and then polish, polish, polish, sure that no one will see your goof-ups and that you’ll still have a chance to unveil your masterpiece when you’re done. Hide away until your code is perfect.</p>
|
||||
|
||||
<p>Another common motivation for hiding your work is the fear that another programmer might take your idea and run with it before you get around to working on it. By keeping it secret, you control the idea.</p>
|
||||
|
||||
<p>We know what you’re probably thinking now: so what? Shouldn’t people be allowed to work however they want?</p>
|
||||
|
||||
<p>Actually, no. In this case, we assert that you’re doing it wrong, and it <em>is</em> a big deal. Here’s why.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="hiding_considered_harmful">
|
||||
<h1>Hiding Considered Harmful</h1>
|
||||
|
||||
<p>If you spend all of your<a contenteditable="false" data-primary="risks" data-secondary="of working alone" data-secondary-sortas="working" data-type="indexterm" id="id-WdS2IQCGfA"> </a> time working alone, you’re increasing the risk of unnecessary failure and cheating your potential for growth.<a contenteditable="false" data-primary="hiding your work" data-secondary="harmful effects of" data-type="indexterm" id="ix_hideharm"> </a> Even though software development is deeply intellectual work that can require deep concentration and alone time, you must play that off against the value (and need!) for collaboration and review.</p>
|
||||
|
||||
<p>First of all, how do you even know whether you’re on the right track?</p>
|
||||
|
||||
<p>Imagine you’re a bicycle-design enthusiast, and one day you get a brilliant idea for a completely new way to design a gear shifter. You order parts and proceed to spend weeks holed up in your garage trying to build a prototype. When your neighbor—also a bike advocate—asks you what’s up, you decide not to talk about it. You don’t want anyone to know about your project until it’s absolutely perfect. Another few months go by and you’re having trouble making your prototype work correctly. But because you’re working in secrecy, it’s impossible to solicit advice from your mechanically inclined friends.</p>
|
||||
|
||||
<p>Then, one day your neighbor pulls his bike out of his garage with a radical new gear-shifting mechanism. Turns out he’s been building something very similar to your invention, but with the help of some friends down at the bike shop. At this point, you’re exasperated. You show him your work. He points out that your design had some simple flaws—ones that might have been fixed in the first week if you had shown him. There are a number of lessons to learn here.</p>
|
||||
|
||||
<section data-type="sect2" id="early_detection">
|
||||
<h2>Early Detection</h2>
|
||||
|
||||
<p>If you keep your great idea hidden<a contenteditable="false" data-primary="hiding your work" data-secondary="harmful effects of" data-tertiary="forgoing early detection of flaws or issues" data-type="indexterm" id="id-DdS2I7CWu3fw"> </a> from the world and refuse to show anyone anything until the implementation is polished, you’re taking a huge gamble. It’s easy to make fundamental design mistakes early on. You risk reinventing wheels.<sup><a data-type="noteref" id="ch01fn19-marker" href="ch02.html#ch01fn19">2</a></sup> And you forfeit the benefits of collaboration, too: notice how much faster your neighbor moved by working with others? This is why people dip their toes in the water before jumping in the deep end: you need to make sure that you’re working on the right thing, you’re doing it correctly, and it hasn’t been done before. The chances of an early misstep are high. The more feedback you solicit early on, the more you lower this risk.<sup><a data-type="noteref" id="ch01fn20-marker" href="ch02.html#ch01fn20">3</a></sup> Remember the tried-and-true mantra of “Fail early, fail fast, fail often.”</p>
|
||||
|
||||
<p>Early sharing isn’t just about preventing personal missteps and getting your ideas vetted. <a contenteditable="false" data-primary="“Fail early, fail fast, fail often”" data-primary-sortas="Fail early" data-type="indexterm" id="id-y8SAIxHJuYfL"> </a>It’s also important to strengthen what we call the bus factor of your project.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_bus_factor">
|
||||
<h2>The Bus Factor</h2>
|
||||
|
||||
<blockquote>
|
||||
<p>Bus factor (noun): the number of people that need to get hit by a bus before your project is completely doomed.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>How dispersed is the knowledge and know-how in your project?<a contenteditable="false" data-primary="bus factor" data-type="indexterm" id="id-7eS9ILH1TJfJ"> </a><a contenteditable="false" data-primary="hiding your work" data-secondary="harmful effects of" data-tertiary="bus factor" data-type="indexterm" id="id-rMSDC2H0TmfW"> </a> If you’re the only person who understands how the prototype code works, you might enjoy good job security—but if you get hit by a bus, the project is toast. If you’re working with a colleague, however, you’ve doubled the bus factor. And if you have a small team designing and prototyping together, things are even better—the project won’t be marooned when a team member disappears. Remember: team members might not literally be hit by buses, but other unpredictable life events still happen. Someone might get married, move away, leave the company, or take leave to care for a sick relative. Ensuring that there is <em>at least</em> good documentation in addition to a primary and a secondary owner for each area of responsibility helps future-proof your project’s success and increases your project’s bus factor. Hopefully most engineers recognize that it is better to be one part of a successful project than the critical part of a failed project.</p>
|
||||
|
||||
<p>Beyond the bus factor, there’s the issue of overall pace of progress. It’s easy to forget that working alone is often a tough slog, much slower than people want to admit. How much do you learn when working alone? <a contenteditable="false" data-primary="knowledge sharing" data-secondary="increasing knowledge by working with others" data-type="indexterm" id="id-rMSxIMt0TmfW"> </a>How fast do you move? Google and Stack Overflow are great sources of opinions and information, but they’re no substitute for actual human experience. Working with other people directly increases the collective wisdom behind the effort. When you become stuck on something absurd, how much time do you waste pulling yourself out of the hole? Think about how <span class="keep-together">different</span> the experience would be if you had a couple of peers to look over your shoulder and tell you—instantly—how you goofed and how to get past the problem. This is exactly why teams sit together (or do pair programming) in software engineering companies. Programming is hard. Software engineering is even harder. You need that second pair of eyes.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="pace_of_progress">
|
||||
<h2>Pace of Progress</h2>
|
||||
|
||||
<p>Here’s another analogy. Think about how you work with your compiler. <a contenteditable="false" data-primary="hiding your work" data-secondary="harmful effects of" data-tertiary="pace of progress" data-type="indexterm" id="id-7eS9IWCDUJfJ"> </a>When you sit down to write a large piece of software, do you spend days writing 10,000 lines of code, and then, after writing that final, perfect line, press the “compile” button for the very first time? Of course you don’t. Can you imagine what sort of disaster would result?<a contenteditable="false" data-primary="feedback" data-secondary="accelerating pace of progress with" data-type="indexterm" id="id-rMSDCkC4UmfW"> </a> Programmers work best in tight feedback loops: write a new function, compile. Add a test, compile. Refactor some code, compile. This way, we discover and fix typos and bugs as soon as possible after generating code. We want the compiler at our side for every little step; some environments can even compile our code as we type. This is how we keep code quality high and make sure our software is evolving correctly, bit by bit. <a contenteditable="false" data-primary="DevOps" data-secondary="philosophy on tech productivity" data-type="indexterm" id="id-GdSMHgCdURfd"> </a>The current DevOps philosophy toward tech productivity is explicit about these sorts of goals: get feedback as early as possible, test as early as possible, and think about security and production environments as early as possible.<a contenteditable="false" data-primary="shifting left" data-type="indexterm" id="id-lrSZtjCmU7fA"> </a> This is all bundled into the idea of "shifting left" in the developer workflow; the earlier we find a problem, the cheaper it is to fix it.</p>
|
||||
|
||||
<p>The same sort of rapid feedback loop is needed not just at the code level, but at the whole-project level, too. Ambitious projects evolve quickly and must adapt to changing environments as they go. Projects run into unpredictable design obstacles or political hazards, or we simply discover that things aren’t working as planned. Requirements morph unexpectedly. How do you get that feedback loop so that you know the instant your plans or designs need to change? Answer: by working in a team. Most engineers know the quote, “Many eyes make all bugs shallow,” but a better version might be, “Many eyes make sure your project stays relevant and on track.” People working in caves awaken to discover that while their original vision might be complete, the world has changed and their project has become irrelevant.</p>
|
||||
|
||||
<aside data-type="sidebar" id="case-study-engineers-and-offices-AbIYtRUkfz">
|
||||
<h5>Case Study: Engineers and Offices</h5>
|
||||
|
||||
<p>Twenty-five years ago, conventional wisdom stated that for an engineer to be productive, they needed to have their own office with a door that closed. <a contenteditable="false" data-primary="software engineers" data-secondary="offices for" data-type="indexterm" id="id-lrSgIjCMtZUGf6"> </a><a contenteditable="false" data-primary="hiding your work" data-secondary="harmful effects of" data-tertiary="engineers and offices" data-type="indexterm" id="id-NdSaClCQtpU6fo"> </a>This was supposedly the only way they could have big, uninterrupted slabs of time to deeply concentrate on writing reams of code.<a contenteditable="false" data-primary="teams" data-secondary="engineers and offices, opinions on" data-type="indexterm" id="id-63SLHgCotmU0fx"> </a></p>
|
||||
|
||||
<p>I think that it’s not only unnecessary for most engineers<sup><a data-type="noteref" id="ch01fn21-marker" href="ch02.html#ch01fn21">4</a></sup> to be in a private office, it’s downright dangerous. Software today is written by teams, not individuals, and a high-bandwidth, readily available connection to the rest of your team is even more valuable than your internet connection. You can have all the uninterrupted time in the world, but if you’re using it to work on the wrong thing, you’re wasting your time.</p>
|
||||
|
||||
<p>Unfortunately, it seems that modern-day tech companies (including Google, in some cases) have swung the pendulum to the exact opposite extreme. Walk into their offices and you’ll often find engineers clustered together in massive rooms—a hundred or more people together—with no walls whatsoever. This “open floor plan” is now a topic of huge debate and, as a result, hostility toward open offices is on the rise. The tiniest conversation becomes public, and people end up not talking for risk of annoying dozens of neighbors. This is just as bad as private offices!</p>
|
||||
|
||||
<p>We think the middle ground is really the best solution. Group teams of four to eight people together in small rooms (or large offices) to make it easy (and non-embarrassing) for spontaneous conversation to happen.</p>
|
||||
|
||||
<p>Of course, in any situation, individual engineers still need a way to filter out noise and interruptions, which is why most teams I’ve seen have developed a way to communicate that they’re currently busy and that you should limit interruptions. Some of us used to work on a team with a vocal interrupt protocol: if you wanted to talk, you would say “Breakpoint Mary,” where Mary was the name of the person you wanted to talk to. If Mary was at a point where she could stop, she would swing her chair around and listen. If Mary was too busy, she’d just say “ack,” and you’d go on with other things until she finished with her current head state.</p>
|
||||
|
||||
<p>Other teams have tokens or stuffed animals that team members put on their monitor to signify that they should be interrupted only in case of emergency. Still other teams give out noise-canceling headphones to engineers to make it easier to deal with background noise—in fact, in many companies, the very act of wearing headphones is a common signal that means “don’t disturb me unless it’s really important.” Many engineers tend to go into headphones-only mode when coding, which may be useful for short spurts but, if used all the time, can be just as bad for collaboration as walling yourself off in an office.</p>
|
||||
|
||||
<p>Don’t misunderstand us—we still think engineers need uninterrupted time to focus on writing code, but we think they need a high-bandwidth, low-friction connection to their team just as much. If less-knowledgeable people on your team feel that there’s a barrier to asking you a question, it’s a problem: finding the right balance is an art.</p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="in_shortcomma_donapostrophet_hide">
|
||||
<h2>In Short, Don’t Hide</h2>
|
||||
|
||||
<p>So, what “hiding” boils down to is this: working alone is inherently riskier than working with others. Even though you might be afraid of someone stealing your idea or thinking you’re not intelligent, you should be much more concerned about wasting huge swaths of your time toiling away on the wrong thing.</p>
|
||||
|
||||
<p>Don’t become another statistic.<a contenteditable="false" data-primary="hiding your work" data-secondary="harmful effects of" data-startref="ix_hideharm" data-type="indexterm" id="id-GdSxIXHDcRfd"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="itapostrophes_all_about_the_team">
|
||||
<h1>It’s All About the Team</h1>
|
||||
|
||||
<p>So, let’s back up now and put all of these ideas together.</p>
|
||||
|
||||
<p>The point <a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-type="indexterm" id="ix_teamsfteng"> </a>we’ve been hammering away at is that, in the realm of programming, lone craftspeople are extremely rare—and even when they do exist, they don’t perform superhuman achievements in a vacuum; their world-changing accomplishment is almost always the result of a spark of inspiration followed by a heroic team effort.</p>
|
||||
|
||||
<p>A great team makes brilliant use of its superstars, but the whole is always greater than the sum of its parts. But creating a superstar team is fiendishly difficult.</p>
|
||||
|
||||
<p>Let’s put this idea into simpler words: <em>software engineering is a team endeavor</em>.</p>
|
||||
|
||||
<p>This concept directly contradicts the inner Genius Programmer fantasy so many of us hold, but it’s not enough to be brilliant when you’re alone in your hacker’s lair. You’re not going to change the world or delight millions of computer users by hiding and preparing your secret invention. You need to work with other people. Share your vision. Divide the labor. Learn from others. Create a brilliant team.</p>
|
||||
|
||||
<p>Consider this: how many pieces of widely used, successful software can you name that were truly written by a single person? (Some people might say “LaTeX,” but it’s hardly “widely used,” unless you consider the number of people writing scientific papers to be a statistically significant portion of all computer users!)</p>
|
||||
|
||||
<p>High-functioning teams are gold and the true key to success. You should be aiming for this experience however you can.</p>
|
||||
|
||||
<section data-type="sect2" id="the_three_pillars_of_social_interaction">
|
||||
<h2>The Three Pillars of Social Interaction</h2>
|
||||
|
||||
<p>So, if teamwork is the best route to producing great software, how does one build (or find) a great team?<a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-tertiary="pillars of social interaction" data-type="indexterm" id="id-GdSxIgCDcQSd"> </a><a contenteditable="false" data-primary="social interaction" data-secondary="pillars of" data-type="indexterm" id="id-lrSPCjC9cvSA"> </a></p>
|
||||
|
||||
<p>To reach collaborative nirvana, you first need to learn and embrace what I call the “three pillars” of social skills. These three principles aren’t just about greasing the wheels of relationships; they’re the foundation on which all healthy interaction and collaboration are based:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Pillar 1: Humility</dt>
|
||||
<dd>You are not the center of the universe (nor is your code!). You’re neither omniscient nor infallible. <a contenteditable="false" data-primary="humility" data-type="indexterm" id="id-63SVIgCotlcZSx"> </a>You’re open to self-improvement.</dd>
|
||||
<dt>Pillar 2: Respect</dt>
|
||||
<dd>You genuinely care about others you work with. <a contenteditable="false" data-primary="respect" data-type="indexterm" id="id-MdSdIJt1tvczSZ"> </a>You treat them kindly and appreciate their abilities and accomplishments.</dd>
|
||||
<dt>Pillar 3: Trust</dt>
|
||||
<dd>You believe others are<a contenteditable="false" data-primary="trust" data-type="indexterm" id="id-xaSeIbu4tXc7SP"> </a> competent and will do the right thing, and you’re OK with letting them drive when appropriate.<sup><a data-type="noteref" id="ch01fn22-marker" href="ch02.html#ch01fn22">5</a></sup></dd>
|
||||
</dl>
|
||||
|
||||
<p>If you perform a root-cause analysis on almost any social conflict, you can ultimately trace it back to a lack of humility, respect, and/or trust. That might sound implausible at first, but give it a try. Think about some nasty or uncomfortable social situation currently in your life. At the basest level, is everyone being appropriately humble? Are people really respecting one another? Is there mutual trust?</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="why_do_these_pillars_matterquestion_mar">
|
||||
<h2>Why Do These Pillars Matter?</h2>
|
||||
|
||||
<p>When you began this chapter, you probably weren’t planning to sign up for some sort of weekly support group. <a contenteditable="false" data-primary="social interaction" data-secondary="why the pillars matter" data-type="indexterm" id="id-lrSgIjCbfvSA"> </a><a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-tertiary="why social interaction pillars matter" data-type="indexterm" id="id-NdSaClClf2S9"> </a>We empathize. Dealing with social problems can be difficult: people are messy, unpredictable, and often annoying to interface with. Rather than putting energy into analyzing social situations and making strategic moves, it’s tempting to write off the whole effort. It’s much easier to hang out with a predictable compiler, isn’t it? Why bother with the social stuff at all?</p>
|
||||
|
||||
<p>Here’s a quote from a<a contenteditable="false" data-primary="Hamming, Richard" data-type="indexterm" id="id-NdSWIoHlf2S9"> </a> <a href="https://bit.ly/hamming_paper">famous lecture by Richard Hamming</a>:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>By taking the trouble to tell jokes to the secretaries and being a little friendly, I got superb secretarial help. For instance, one time for some idiot reason all the reproducing services at Murray Hill were tied up. Don’t ask me how, but they were. I wanted something done. My secretary called up somebody at Holmdel, hopped [into] the company car, made the hour-long trip down and got it reproduced, and then came back. It was a payoff for the times I had made an effort to cheer her up, tell her jokes and be friendly; it was that little extra work that later paid off for me. By realizing you have to use the system and studying how to get the system to do your work, you learn how to adapt the system to your desires.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>The moral is this: do not underestimate the power of playing the social game. It’s not about tricking or manipulating people; it’s about creating relationships to get things done. Relationships always outlast projects. When you’ve got richer relationships with your coworkers, they’ll be more willing to go the extra mile when you need them.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="humilitycomma_respectcomma_and_trust_in">
|
||||
<h2>Humility, Respect, and Trust in Practice</h2>
|
||||
|
||||
<p>All of this preaching about humility, respect, and trust sounds like a sermon.<a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-tertiary="humility, respect, and trust in practice" data-type="indexterm" id="ix_teamsftenghrt"> </a><a contenteditable="false" data-primary="social interaction" data-secondary="humility, respect, and trust in practice" data-type="indexterm" id="ix_socinthrt"> </a> Let’s come out of the clouds and think about how to apply these ideas in real-life situations.<a contenteditable="false" data-primary="humility" data-secondary="practicing" data-type="indexterm" id="ix_humprac"> </a><a contenteditable="false" data-primary="respect" data-secondary="practicing" data-type="indexterm" id="ix_resp"> </a><a contenteditable="false" data-primary="trust" data-secondary="practicing" data-type="indexterm" id="ix_trst"> </a> We’re going to examine a list of specific behaviors and examples that you can start with. Many of them might sound obvious at first, but after you begin thinking about them, you’ll notice how often you (and your peers) are guilty of not following them—we’ve certainly noticed this about ourselves!</p>
|
||||
|
||||
<section data-type="sect3" id="lose_the_ego-id00070">
|
||||
<h3>Lose the ego</h3>
|
||||
|
||||
<p>OK, this is sort of a simpler way of telling someone without enough humility to lose their ’tude. <a contenteditable="false" data-primary="ego, losing" data-type="indexterm" id="id-p6SYIJC9HGS9S4"> </a>Nobody wants to work with someone who consistently behaves like they’re the most important person in the room. Even if you know you’re the wisest person in the discussion, don’t wave it in people’s faces. For example, do you always feel like you need to have the first or last word on every subject? Do you feel the need to comment on every detail in a proposal or discussion? Or do you know somebody who does these things?</p>
|
||||
|
||||
<p>Although it’s important to be humble, that doesn’t mean you need to be a doormat; there’s nothing wrong with self-confidence.<a contenteditable="false" data-primary="self-confidence" data-type="indexterm" id="id-MdSdI2H3HRSzSZ"> </a> Just don’t come off like a know-it-all. Even better, think about going for a “collective” ego, instead; rather than worrying about whether you’re personally awesome, try to build a sense of team accomplishment and group pride. For example, the Apache Software Foundation has a long history of creating communities around software projects. These communities have incredibly strong identities and reject people who are more concerned with self-promotion.</p>
|
||||
|
||||
<p>Ego manifests itself in many ways, and a lot of the time, it can get in the way of your productivity and slow you down.<a contenteditable="false" data-primary="Hamming, Richard" data-type="indexterm" id="id-2KSWIDtqH1SLS0"> </a> Here’s another great story from Hamming’s lecture that illustrates this point perfectly (emphasis ours):</p>
|
||||
|
||||
<blockquote>
|
||||
<p>John Tukey almost always dressed very casually. He would go into an important office and it would take a long time before the other fellow realized that this is a first-class man and he had better listen. For a long time, John has had to overcome this kind of hostility. It’s wasted effort! I didn’t say you should conform; I said, “The appearance of conforming gets you a long way.” If you chose to assert your ego in any number of ways, “I am going to do it my way,” you pay a small steady price throughout the whole of your professional career. And this, over a whole lifetime, adds up to an enormous amount of needless trouble. […] By realizing you have to use the system and studying how to get the system to do your work, you learn how to adapt the system to your desires. <em>Or you can fight it steadily, as a small, undeclared war, for the whole of your life.</em></p>
|
||||
</blockquote>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="learn_to_give_and_take_criticism">
|
||||
<h3>Learn to give <em>and</em> take criticism</h3>
|
||||
|
||||
<p>A few years ago, Joe started a new job as a programmer. <a contenteditable="false" data-primary="criticism, learning to give and take" data-type="indexterm" id="id-MdSdIoC1tRSzSZ"> </a>After his first week, he really began digging into the codebase. Because he cared about what was going on, he started gently questioning other teammates about their contributions. He sent simple code reviews by email, politely asking about design assumptions or pointing out places where logic could be improved. After a couple of weeks, he was summoned to his director’s office. “What’s the problem?” Joe asked. “Did I do something wrong?” The director looked concerned: “We’ve had a lot of complaints about your behavior, Joe. Apparently, you’ve been really harsh toward your teammates, criticizing them left and right. They’re upset. You need to tone it down.” Joe was utterly baffled. Surely, he thought, his code reviews should have been welcomed and appreciated by his peers. In this case, however, Joe should have been more sensitive to the team’s widespread insecurity and should have used a subtler means to introduce code reviews into the culture—perhaps even something as simple as discussing the idea with the team in advance and asking team members to try it out for a few weeks.</p>
|
||||
|
||||
<p>In a professional software engineering environment, criticism is almost never personal—it’s usually just part of the process of making a better project. <a contenteditable="false" data-primary="constructive criticism" data-type="indexterm" id="id-2KSWIlHdt1SLS0"> </a>The trick is to make sure you (and those around you) understand the difference between a constructive criticism of someone’s creative output and a flat-out assault against someone’s character. The latter is useless—it’s petty and nearly impossible to act on. The former can (and should!) be helpful and give guidance on how to improve. And, most important, it’s imbued with respect: the person giving the constructive criticism genuinely cares about the other person and wants them to improve themselves or their work. Learn to respect your peers and give constructive criticism politely. If you truly respect someone, you’ll be motivated to choose tactful, helpful phrasing—a skill acquired with much practice. We cover this much more in <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a>.</p>
|
||||
|
||||
<p>On the other side of the conversation, you need to learn to accept criticism as well. This means not just being humble about your skills, but trusting that the other person has your best interests (and those of your project!) at heart and doesn’t actually think you’re an idiot. Programming is a skill like anything else: it improves with practice. If a peer pointed out ways in which you could improve your juggling, would you take it as an attack on your character and value as a human being? We hope not. In the same way, your self-worth shouldn’t be connected to the code you write—or any creative project you build. To repeat ourselves: <em>you are not your code</em>. Say that over and over. You are not what you make. You need to not only believe it yourself, but get your coworkers to believe it, too.</p>
|
||||
|
||||
<p>For example, if you have an insecure collaborator, here’s what not to say: “Man, you totally got the control flow wrong on that method there. <a contenteditable="false" data-primary="insecurity" data-secondary="criticism and" data-type="indexterm" id="id-woSVImhEtyS7SR"> </a>You should be using the standard xyzzy code pattern like everyone else.” This feedback is full of antipatterns: you’re telling someone they’re “wrong” (as if the world were black and white), demanding they change something, and accusing them of creating something that goes against what everyone else is doing (making them feel stupid). Your coworker will immediately be put on the offense, and their response is bound to be overly <span class="keep-together">emotional.</span></p>
|
||||
|
||||
<p>A better way to say the same thing might be, “Hey, I’m confused by the control flow in this section here. I wonder if the xyzzy code pattern might make this clearer and easier to maintain?” Notice how you’re using humility to make the question about you, not them. They’re not wrong; you’re just having trouble understanding the code. The suggestion is merely offered up as a way to clarify things for poor little you while possibly helping the project’s long-term sustainability goals. You’re also not demanding anything—you’re giving your collaborator the ability to peacefully reject the suggestion. The discussion stays focused on the code itself, not on anyone’s value or coding skills.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="fail_fast_and_iterate">
|
||||
<h3>Fail fast and iterate</h3>
|
||||
|
||||
<p>There’s a well-known urban legend in the business world about a manager who makes a mistake and loses an impressive $10 million.<a contenteditable="false" data-primary="failures" data-secondary="fail fast and iterate" data-type="indexterm" id="id-2KSWIwCwh1SLS0"> </a> He dejectedly goes into the office the next day and starts packing up his desk, and when he gets the inevitable “the CEO wants to see you in his office” call, he trudges into the CEO’s office and quietly slides a piece of paper across the desk.</p>
|
||||
|
||||
<blockquote>
|
||||
<p>“What’s this?” asks the CEO.</p>
|
||||
|
||||
<p>“My resignation,” says the executive. “I assume you called me in here to fire me.”</p>
|
||||
|
||||
<p>“Fire you?” responds the CEO, incredulously. “Why would I fire you? I just spent $10 million training you!”<sup><a data-type="noteref" id="ch01fn24-marker" href="ch02.html#ch01fn24">6</a></sup></p>
|
||||
</blockquote>
|
||||
|
||||
<p>It’s an extreme story, to be sure, but the CEO in this story understands that firing the executive wouldn’t undo the $10 million loss, and it would compound it by losing a valuable executive who he can be very sure won’t make that kind of mistake again.</p>
|
||||
|
||||
<p>At Google, one of our favorite mottos is that “Failure is an option.” It’s widely recognized that if you’re not failing now and then, you’re not being innovative enough or taking enough risks. Failure is viewed as a golden opportunity to learn and improve for the next go-around.<sup><a data-type="noteref" id="ch01fn25-marker" href="ch02.html#ch01fn25">7</a></sup> In fact, Thomas Edison<a contenteditable="false" data-primary="Edison, Thomas" data-type="indexterm" id="id-k7SvC1hehYSPSJ"> </a> is often quoted as saying, “If I find 10,000 ways something won’t work, I haven’t failed. I am not discouraged, because every wrong attempt discarded is another step forward.”</p>
|
||||
|
||||
<p>Over in Google X—the division that works on “moonshots” like self-driving cars and internet access delivered by balloons—failure is deliberately built into its incentive system. People come up with outlandish ideas and coworkers are actively encouraged to shoot them down as fast as possible. Individuals are rewarded (and even compete) to see how many ideas they can disprove or invalidate in a fixed period of time. Only when a concept truly cannot be debunked at a whiteboard by all peers does it proceed to early prototype.<a contenteditable="false" data-primary="trust" data-secondary="practicing" data-startref="ix_trst" data-type="indexterm" id="id-k7SGINuehYSPSJ"> </a><a contenteditable="false" data-primary="respect" data-secondary="practicing" data-startref="ix_resp" data-type="indexterm" id="id-aESdC9ukhPSmSV"> </a><a contenteditable="false" data-primary="humility" data-secondary="practicing" data-startref="ix_humprac" data-type="indexterm" id="id-89SYHbuMh2SqS1"> </a><a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-startref="ix_teamsftenghrt" data-tertiary="humility, respect, and trust in practice" data-type="indexterm" id="id-g1SZt9u3hDS4SJ"> </a><a contenteditable="false" data-primary="social interaction" data-secondary="humility, respect, and trust in practice" data-startref="ix_socinthrt" data-type="indexterm" id="id-LdSZhvuoheSLSA"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="blameless_postmortem_culture">
|
||||
<h2>Blameless Post-Mortem Culture</h2>
|
||||
|
||||
<p>The key to learning from your mistakes is to document your failures <a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-tertiary="blameless postmortem culture" data-type="indexterm" id="ix_teamsftengpst"> </a>by performing a root-cause <a contenteditable="false" data-primary="postmortems, blameless" data-type="indexterm" id="ix_pstmrt"> </a>analysis <a contenteditable="false" data-primary="blameless postmortems" data-type="indexterm" id="ix_blapst"> </a>and writing up a “postmortem,” as it’s called at Google (and many other companies). Take extra care to make sure the postmortem document isn’t just a useless list of apologies or excuses or finger-pointing—that’s not its purpose. A proper postmortem should always contain an explanation of what was learned and what is going to change as a result of the learning experience. Then, make sure that the postmortem is readily accessible and that the team really follows through on the proposed changes. Properly documenting failures also makes it easier for other people (present and future) to know what happened and avoid repeating history. Don’t erase your tracks—light them up like a runway for those who follow you!</p>
|
||||
|
||||
<p>A good postmortem should include the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A brief summary of the event</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A timeline of the event, from discovery through investigation to resolution</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The primary cause of the event</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Impact and damage assessment</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A set of action items (with owners) to fix the problem immediately</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A set of action items to prevent the event from happening again</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Lessons learned</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<section data-type="sect3" id="learn_patience">
|
||||
<h3>Learn patience</h3>
|
||||
|
||||
<p>Years ago, I was writing a tool to convert CVS repositories to Subversion (and later, Git). <a contenteditable="false" data-primary="patience, learning" data-type="indexterm" id="id-xaSeIDC8hgs7SP"> </a>Due to the vagaries of CVS, I kept unearthing bizarre bugs. Because my longtime friend and coworker Karl knew CVS quite intimately, we decided we should work together to fix these bugs.</p>
|
||||
|
||||
<p>A problem arose when we began pair programming: I’m a bottom-up engineer who is content to dive into the muck and dig my way out by trying a lot of things quickly and skimming over the details. Karl, however, is a top-down engineer who wants to get the full lay of the land and dive into the implementation of almost every method on the call stack before proceeding to tackle the bug. This resulted in some epic interpersonal conflicts, disagreements, and the occasional heated argument. It got to the point at which the two of us simply couldn’t pair-program together: it was too frustrating for us both.</p>
|
||||
|
||||
<p>That said, we had a longstanding history of trust and respect for each other. Combined with patience, this helped us work out a new method of collaborating. We would sit together at the computer, identify the bug, and then split up and attack the problem from two directions at once (top-down and bottom-up) before coming back together with our findings. Our patience and willingness to improvise new working styles not only saved the project, but also our friendship.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="be_open_to_influence">
|
||||
<h3>Be open to influence</h3>
|
||||
|
||||
<p>The more open you are to influence, the more you are able to influence; the more vulnerable you are, the stronger you appear.<a contenteditable="false" data-primary="influence, being open to" data-type="indexterm" id="id-woSVILCluZs7SR"> </a> These statements sound like bizarre contradictions. But everyone can think of someone they’ve worked with who is just maddeningly stubborn—no matter how much people try to persuade them, they dig their heels in even more. What eventually happens to such team members? In our experience, people stop listening to their opinions or objections; instead, they end up “routing around” them like an obstacle everyone takes for granted. You certainly don’t want to be that person, so keep this idea in your head: it’s OK for someone else to change your mind. In the opening chapter of this book, we said that engineering is inherently about trade-offs. It’s impossible for you to be right about everything all the time unless you have an unchanging environment and perfect knowledge, so of course you should change your mind when presented with new evidence. Choose your battles carefully: to be heard properly, you first need to listen to others. It’s better to do this listening <em>before</em> putting a stake in the ground or firmly announcing a decision—if you’re constantly changing your mind, people will think you’re wishy-washy.</p>
|
||||
|
||||
<p>The idea of vulnerability can seem strange, too. <a contenteditable="false" data-primary="vulnerability, showing" data-type="indexterm" id="id-4jSRI8HXuEsGS9"> </a>If someone admits ignorance of the topic at hand or the solution to a problem, what sort of credibility will they have in a group? Vulnerability is a show of weakness, and that destroys trust, right?<a contenteditable="false" data-primary="trust" data-secondary="vulnerability and" data-type="indexterm" id="id-k7SvCwHkulsPSJ"> </a></p>
|
||||
|
||||
<p>Not true. Admitting that you’ve made a mistake or you’re simply out of your league can increase your status over the long run. In fact, the willingness to express vulnerability is an outward show of humility, it demonstrates accountability and the willingness to take responsibility, and it’s a signal that you trust others’ opinions. In return, people end up respecting your honesty and strength. Sometimes, the best thing you can do is just say, “I don’t know.”</p>
|
||||
|
||||
<p>Professional politicians, for example, are notorious for never admitting error or ignorance, even when it’s patently obvious that they’re wrong or unknowledgeable about a subject. This behavior exists primarily because politicians are constantly under attack by their opponents, and it’s why most people don’t believe a word that politicians say. When you’re writing software, however, you don’t need to be continually on the defensive—your teammates are collaborators, not competitors. You all have the same goal.<a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-startref="ix_teamsftengpst" data-tertiary="blameless postmortem culture" data-type="indexterm" id="id-aESNIDhNuEsmSV"> </a><a contenteditable="false" data-primary="postmortems, blameless" data-startref="ix_pstmrt" data-type="indexterm" id="id-89SJCqhmursqS1"> </a><a contenteditable="false" data-primary="blameless postmortems" data-startref="ix_blapst" data-type="indexterm" id="id-g1S4Hdhyu6s4SJ"> </a> </p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="being_googley">
|
||||
<h2>Being Googley</h2>
|
||||
|
||||
<p>At Google, we have our own internal version of the principles of “humility, respect, and trust” when it comes to behavior and human interactions.<a contenteditable="false" data-primary="“Googley”, being" data-primary-sortas="Googley" data-type="indexterm" id="id-p6SYIJC2FGSN"> </a><a contenteditable="false" data-primary="social interaction" data-secondary="being “Googley”" data-type="indexterm" id="id-MdSJCoCAFRSg"> </a><a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-tertiary="being “Googley”" data-type="indexterm" id="id-2KSaHwCJF1Sr"> </a></p>
|
||||
|
||||
<p>From the earliest days of our culture, we often referred to actions as being “Googley” or “not Googley.” The word was never explicitly defined; rather, everyone just sort of took it to mean “don’t be evil” or “do the right thing” or “be good to each other.” Over time, people also started using the term “Googley” as an informal test for culture-fit whenever we would interview a candidate for an engineering job, or when writing internal performance reviews of one another. People would often express opinions about others using the term; for example, “the person coded well, but didn’t seem to have a very Googley attitude.”</p>
|
||||
|
||||
<p>Of course, we eventually realized that the term “Googley” was being overloaded with meaning; worse yet, it could become a source of unconscious bias in hiring or evaluations. If “Googley” means something different to every employee, we run the risk of the term starting to mean “<em>is just like me.</em>” Obviously, that’s not a good test for hiring—we don’t want to hire people “just like me,” but people from a diverse set of backgrounds and with different opinions and experiences. An interviewer’s personal desire to have a beer with a candidate (or coworker) should <em>never</em> be considered a valid signal about somebody else’s performance or ability to thrive at Google.<a contenteditable="false" data-primary="trust" data-secondary="being “Googley”" data-type="indexterm" id="id-woSNH7t9FySD"> </a><a contenteditable="false" data-primary="respect" data-secondary="being “Googley”" data-type="indexterm" id="id-4jS2t3tjFPS1"> </a><a contenteditable="false" data-primary="humility" data-secondary="being “Googley”" data-type="indexterm" id="id-k7SNhXtZFYSo"> </a></p>
|
||||
|
||||
<p>Google eventually fixed the problem by explicitly defining a rubric for what we mean by “Googleyness”—a set of attributes and behaviors that we look for that represent strong leadership and exemplify “humility, respect, and trust”:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Thrives in ambiguity</dt>
|
||||
<dd>Can deal with conflicting messages or directions, build consensus, and make progress against a problem, even when the environment is constantly shifting.</dd>
|
||||
<dt>Values feedback</dt>
|
||||
<dd>Has humility to both receive and give feedback gracefully and understands how valuable feedback is for personal (and team) development.</dd>
|
||||
<dt>Challenges status quo</dt>
|
||||
<dd>Is able to set ambitious goals and pursue them even when there might be resistance or inertia from others.</dd>
|
||||
<dt>Puts the user first</dt>
|
||||
<dd>Has empathy and respect for users of Google’s products and pursues actions that are in their best interests.</dd>
|
||||
<dt>Cares about the team</dt>
|
||||
<dd>Has empathy and respect for coworkers and actively works to help them without being asked, improving team cohesion.</dd>
|
||||
<dt>Does the right thing</dt>
|
||||
<dd>Has a strong sense of ethics about everything they do; willing to make difficult or inconvenient decisions to protect the integrity of the team and product.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Now that we have these best-practice behaviors better defined, we’ve begun to shy away from using the term “Googley.” It’s always better to be specific about expectations!</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00006">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>The foundation for almost any software endeavor—of almost any size—is a well-functioning team. Although the Genius Myth of the solo software developer still persists, the truth is that no one really goes it alone. For a software organization to stand the test of time, it must have a healthy culture, rooted in humility, trust, and respect that revolves around the team, rather than the individual. Further, the creative nature of software development <em>requires</em> that people take risks and occasionally fail; for people to accept that failure, a healthy team environment must exist.<a contenteditable="false" data-primary="teams" data-secondary="software engineering as team endeavor" data-startref="ix_teamsfteng" data-type="indexterm" id="id-1JSyCeC4sx"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondr-id00171">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Be aware of the trade-offs of working in isolation.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Acknowledge the amount of time that you and your team spend communicating and in interpersonal conflict. A small investment in understanding personalities and working styles of yourself and others can go a long way toward improving productivity.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If you want to work effectively with a team or a large organization, be aware of your preferred working style and that of others.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn18"><sup><a href="ch02.html#ch01fn18-marker">1</a></sup>Ben Collins-Sussman, also an author within this book.</p><p data-type="footnote" id="ch01fn19"><sup><a href="ch02.html#ch01fn19-marker">2</a></sup>Literally, if you are, in fact, a bike designer.</p><p data-type="footnote" id="ch01fn20"><sup><a href="ch02.html#ch01fn20-marker">3</a></sup>I should note that sometimes it’s dangerous to get too much feedback too early in the process if you’re still unsure of your general direction or goal.</p><p data-type="footnote" id="ch01fn21"><sup><a href="ch02.html#ch01fn21-marker">4</a></sup>I do, however, acknowledge that serious introverts likely need more peace, quiet, and alone time than most people and might benefit from a quieter environment, if not their own office.</p><p data-type="footnote" id="ch01fn22"><sup><a href="ch02.html#ch01fn22-marker">5</a></sup>This is incredibly difficult if you’ve been burned in the past by delegating to incompetent people.</p><p data-type="footnote" id="ch01fn24"><sup><a href="ch02.html#ch01fn24-marker">6</a></sup>You can find a dozen variants of this legend on the web, attributed to different famous managers.</p><p data-type="footnote" id="ch01fn25"><sup><a href="ch02.html#ch01fn25-marker">7</a></sup>By the same token, if you do the same thing over and over and keep failing, it’s not failure, it’s incompetence.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
542
clones/abseil.io/resources/swe-book/html/ch03.html
Normal file
206
clones/abseil.io/resources/swe-book/html/ch04.html
Normal file
|
@ -0,0 +1,206 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="engineering_for_equity">
|
||||
<h1>Engineering for Equity</h1>
|
||||
|
||||
<p class="byline">Written by Demma Rodriguez</p>
|
||||
|
||||
<p class="byline">Edited by Riona MacNamara</p>
|
||||
|
||||
<p>In earlier chapters, we’ve explored the <a contenteditable="false" data-primary="equitable and inclusive engineering" data-type="indexterm" id="ix_equi"> </a>contrast between programming as the production of code that addresses the problem of the moment, and software engineering as the broader application of code, tools, policies, and processes to a dynamic and ambiguous problem that can span decades or even lifetimes. In this chapter, we’ll discuss the unique responsibilities of an engineer when designing products for a broad base of users. Further, we evaluate how an organization, by embracing diversity, can design systems that work for everyone, and avoid perpetuating harm against our users.</p>
|
||||
|
||||
<p>As new as the field of software engineering is, we’re newer still at understanding the impact it has on underrepresented people and diverse societies. We did not write this chapter because we know all the answers. We do not. In fact, understanding how to engineer products that empower and respect all our users is still something Google is learning to do. We have had many public failures in protecting our most vulnerable users, and so we are writing this chapter because the path forward to more equitable products begins with evaluating our own failures and encouraging growth.</p>
|
||||
|
||||
<p>We are also writing this chapter because of the increasing imbalance of power between those who make development decisions that impact the world and those who simply must accept and live with those decisions that sometimes disadvantage already marginalized communities globally. It is important to share and reflect on what we’ve learned so far with the next generation of software engineers. It is even more important that we help influence the next generation of engineers to be better than we are today.</p>
|
||||
|
||||
<p class="pagebreak-before">Just picking up this book means that you likely aspire to be an exceptional engineer. You want to solve problems. You aspire to build products that drive positive outcomes for the broadest base of people, including people who are the most difficult to reach. To do this, you will need to consider how the tools you build will be leveraged to change the trajectory of humanity, hopefully for the better.</p>
|
||||
|
||||
<section data-type="sect1" id="bias_is_the_default">
|
||||
<h1>Bias Is the Default</h1>
|
||||
|
||||
<p>When engineers do not focus on users of different nationalities, ethnicities, races, genders, ages, socioeconomic statuses, abilities, and belief systems, even the most talented staff will inadvertently fail their users.<a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="bias and" data-type="indexterm" id="id-VOTKHLSRtQ"> </a><a contenteditable="false" data-primary="biases" data-secondary="universal presence of" data-type="indexterm" id="id-1kTdSXS6tP"> </a> Such failures are often unintentional; all people have certain biases, and social scientists have recognized over the past several decades that most people exhibit unconscious bias, enforcing and promulgating existing stereotypes. Unconscious bias is insidious and often more difficult to mitigate than intentional acts of exclusion. Even when we want to do the right thing, we might not recognize our own biases. By the same token, our organizations must also recognize that such bias exists and work to address it in their workforces, product development, and user outreach.</p>
|
||||
|
||||
<p>Because of bias, Google has at times failed to represent users equitably within their products, with launches over the past several years that did not focus enough on underrepresented groups. Many users attribute our lack of awareness in these cases to the fact that our engineering population is mostly male, mostly White or Asian, and certainly not representative of all the communities that use our products. The lack of representation of such users in our workforce<sup><a data-type="noteref" id="ch01fn51-marker" href="ch04.html#ch01fn51">1</a></sup> means that we often do not have the requisite diversity to understand how the use of our products can affect underrepresented or vulnerable users.</p>
|
||||
|
||||
<aside data-type="sidebar" id="case-study-google-misses-the-mark-on-racial-inclusion-gaH3C9tZ">
|
||||
<h5>Case Study: Google Misses the Mark on Racial Inclusion</h5>
|
||||
|
||||
<p>In 2015, software engineer Jacky Alciné pointed out<sup><a data-type="noteref" id="ch01fn52-marker" href="ch04.html#ch01fn52">2</a></sup> that the image recognition algorithms<a contenteditable="false" data-primary="image recognition, racial inclusion and" data-type="indexterm" id="id-ddTLSMSpCEt5"> </a> in <a contenteditable="false" data-primary="racial inclusion" data-type="indexterm" id="id-GLTJfRSZCKtN"> </a>Google Photos were classifying his <a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="racial inclusion" data-type="indexterm" id="id-8wT7CDS5CMtX"> </a>black friends as “gorillas.” Google was slow to respond to these mistakes and incomplete in addressing them.</p>
|
||||
|
||||
<p>What caused such a monumental failure? Several things:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Image recognition algorithms depend on being supplied a “proper” (often meaning “complete”) dataset. The photo data fed into Google’s image recognition algorithm was clearly incomplete. In short, the data did not represent the population.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Google itself (and the tech industry in general) did not (and does not) have much black representation,<sup><a data-type="noteref" id="ch01fn53-marker" href="ch04.html#ch01fn53">3</a></sup> and that affects decisions subjective in the design of such algorithms and the collection of such datasets. The unconscious bias of the organization itself likely led to a more representative product being left on the table.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Google’s target market for image recognition did not adequately include such underrepresented groups. Google’s tests did not catch these mistakes; as a result, our users did, which both embarrassed Google and harmed our users.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>As late as 2018, Google still had not adequately addressed the underlying problem.<sup><a data-type="noteref" id="ch01fn54-marker" href="ch04.html#ch01fn54">4</a></sup></p>
|
||||
</aside>
|
||||
|
||||
<p>In this example, our product was inadequately designed and executed, failing to properly consider all racial groups, and as a result, failed our users and caused Google bad press. Other technology suffers from similar failures: autocomplete can return offensive or racist results. Google’s Ad system could be manipulated to show racist or offensive ads. YouTube might not catch hate speech, though it is technically outlawed on that platform.</p>
|
||||
|
||||
<p>In all of these cases, the technology itself is not really to blame. Autocomplete, for example, was not designed to target users or to discriminate. But it was also not resilient enough in its design to exclude discriminatory language that is considered hate speech. As a result, the algorithm returned results that caused harm to our users. The harm to Google itself should also be obvious: reduced user trust and engagement with the company. For example, Black, Latinx, and Jewish applicants could lose faith in Google as a platform or even as an inclusive environment itself, therefore undermining Google’s goal of improving representation in hiring.</p>
|
||||
|
||||
<p>How could this happen? After all, Google hires technologists with impeccable education and/or professional experience—exceptional programmers who write the best code and test their work. "Build for everyone" is a Google brand statement, but the truth is that we still have a long way to go before we can claim that we do. One way to address these problems is to help the software engineering organization itself look like the populations for whom we build products.</p>
|
||||
</section>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="understanding_the_need_for_diversity">
|
||||
<h1 class="less_space">Understanding the Need for Diversity</h1>
|
||||
|
||||
<p>At Google, we<a contenteditable="false" data-primary="diversity" data-secondary="understanding the need for" data-type="indexterm" id="id-1kTMHXSDhP"> </a> believe that being an exceptional<a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="need for diversity" data-type="indexterm" id="id-KnTOSAS8hy"> </a> engineer requires that you also focus on bringing diverse perspectives into product design and implementation. It also means that Googlers responsible for hiring or interviewing other engineers must contribute to building a more representative workforce. For example, if you interview other engineers for positions at your company, it is important to learn how biased outcomes happen in hiring. There are significant prerequisites for understanding how to anticipate harm and prevent it. To get to the point where we can build for everyone, we first must understand our representative populations. We need to encourage engineers to have a wider scope of educational training.<a contenteditable="false" data-primary="education of software engineers" data-type="indexterm" id="id-YDTnfRSph6"> </a></p>
|
||||
|
||||
<p>The first order of business is to disrupt the notion that as a person with a computer science degree and/or work experience, you have all the skills you need to become an exceptional engineer. A computer science degree is often a necessary foundation. However, the degree alone (even when coupled with work experience) will not make you an engineer. It is also important to disrupt the idea that only people with computer science degrees can design and build products. Today, <a href="https://oreil.ly/2Bu0H">most programmers do have a computer science degree</a>; they are successful at building code, establishing theories of change, and applying methodologies for problem solving. However, as the aforementioned examples demonstrate, <em>this approach is insufficient for inclusive and equitable engineering</em>.</p>
|
||||
|
||||
<p>Engineers should begin by focusing all work within the framing of the complete ecosystem they seek to influence.<a contenteditable="false" data-primary="users" data-secondary="engineers building software for all users" data-type="indexterm" id="id-YDTVHmCph6"> </a> At minimum, they need to understand the population demographics of their users. Engineers should focus on people who are different than themselves, especially people who might attempt to use their products to cause harm. The most difficult users to consider are those who are disenfranchised by the processes and the environment in which they access technology. To address this challenge, engineering teams need to be representative of their existing and future users. In the absence of diverse representation on engineering teams, individual engineers need to learn how to build for all users.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="building_multicultural_capacity">
|
||||
<h1>Building Multicultural Capacity</h1>
|
||||
|
||||
<p>One mark of an exceptional engineer is the ability to understand how products can advantage and disadvantage different<a contenteditable="false" data-primary="multicultural capacity, building" data-type="indexterm" id="ix_mlticu"> </a> groups of human beings.<a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="building multicultural capacity" data-type="indexterm" id="ix_equimltic"> </a> Engineers are expected to have technical aptitude, but they should also have the <em>discernment</em> to know when to build something and when not to. Discernment includes building the capacity to identify and reject features or products that drive adverse outcomes. This is a lofty and difficult goal, because there is an enormous amount of individualism that goes into being a high-performing engineer. Yet to succeed, we must extend our focus beyond our own communities to the next billion users or to current users who might be disenfranchised or left behind by our products.</p>
|
||||
|
||||
<p>Over time, you might build tools that billions of people use daily—tools that influence how people think about the value of human lives, tools that monitor human activity, and tools that capture and persist sensitive data, such as images of their children and loved ones, as well as other types of sensitive data. As an engineer, you might wield more power than you realize: the power to literally change society. It’s critical that on your journey to becoming an exceptional engineer, you understand the innate responsibility needed to exercise power without causing harm. The first step is to recognize the default state of your bias caused by many societal and educational factors. After you recognize this, you’ll be able to consider the often-forgotten use cases or users who can benefit or be harmed by the products you build.</p>
|
||||
|
||||
<p>The industry continues to move forward, building new use cases for artificial intelligence (AI) and machine learning at an ever-increasing speed. To stay competitive, we drive toward scale and efficacy in building a high-talent engineering and technology workforce. Yet we need to pause and consider the fact that today, some people have the ability to design the future of technology and others do not. We need to understand whether the software systems we build will eliminate the potential for entire populations to experience shared prosperity and provide equal access to technology.</p>
|
||||
|
||||
<p>Historically, companies faced with a decision between completing a strategic objective that drives market dominance and revenue and one that potentially slows momentum toward that goal have opted for speed and shareholder value. This tendency is exacerbated by the fact that many companies value individual performance and excellence, yet often fail to effectively drive accountability on product equity across all areas. Focusing on underrepresented users is a clear opportunity to promote equity. To continue to be competitive in the technology sector, we need to learn to engineer for global equity.</p>
|
||||
|
||||
<p>Today, we worry when companies design technology to scan, capture, and identify people walking down the street. We worry about privacy and how governments might use this information now and in the future. Yet most technologists do not have the requisite perspective of underrepresented groups to understand the impact of racial variance in facial recognition or to understand how applying AI can drive harmful and inaccurate results.</p>
|
||||
|
||||
<p>Currently, AI-driven facial-recognition software continues to disadvantage people of color or ethnic minorities.<a contenteditable="false" data-primary="AI (artificial intelligence)" data-secondary="facial-recognition software, disadvantaging some populations" data-type="indexterm" id="id-AAT0HwI5UV"> </a> Our research is not comprehensive enough and does not include a wide enough range of different skin tones. We cannot expect the output to be valid if both the training data and those creating the software represent only a small subsection of people. In those cases, we should be willing to delay development in favor of trying to get more complete and accurate data, and a more comprehensive and inclusive product.</p>
|
||||
|
||||
<p>Data science itself is challenging for humans to evaluate, however. <a contenteditable="false" data-primary="racial bias in facial recognition databases" data-type="indexterm" id="id-p6TqHqtpUl"> </a>Even when we do have representation, a training set can still be biased and produce invalid results. A study completed in<a contenteditable="false" data-primary="law enforcement facial recognition databases, racial bias in" data-type="indexterm" id="id-MZT2SXtxUZ"> </a> 2016 found that more than 117 million American adults are in a law enforcement facial recognition database.<sup><a data-type="noteref" id="ch01fn55-marker" href="ch04.html#ch01fn55">5</a></sup> Due to the disproportionate policing of Black communities and disparate outcomes in arrests, there could be racially biased error rates in utilizing such a database in facial recognition. Although the software is being developed and deployed at ever-increasing rates, the independent testing is not. To correct for this egregious misstep, we need to have the integrity to slow down and ensure that our inputs contain as little bias as possible. Google now offers statistical training within the context of AI to help ensure that datasets are not intrinsically biased.</p>
|
||||
|
||||
<p>Therefore, shifting the focus of<a contenteditable="false" data-primary="education of software engineers" data-secondary="more inclusive education needed" data-type="indexterm" id="id-MZTRHohxUZ"> </a> your industry experience to include more comprehensive, multicultural, race and gender studies education is not only <em>your</em> responsibility, but also the <em>responsibility of your employer.</em> Technology companies must ensure that their employees are continually receiving professional development and that this development is comprehensive and multidisciplinary. The requirement is not that one individual take it upon themselves to learn about other cultures or other demographics alone. Change requires that each of us, individually or as leaders of teams, invest in continuous professional development that builds not just our software development and leadership skills, but also our capacity to understand the diverse experiences throughout humanity.<a contenteditable="false" data-primary="multicultural capacity, building" data-startref="ix_mlticu" data-type="indexterm" id="id-rETdCQh9U7"> </a><a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="building multicultural capacity" data-startref="ix_equimltic" data-type="indexterm" id="id-LETOcbhyUa"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="making_diversity_actionable">
|
||||
<h1>Making Diversity Actionable</h1>
|
||||
|
||||
<p>Systemic equity and fairness are attainable if we are willing to accept that we are all accountable for the<a contenteditable="false" data-primary="diversity" data-secondary="making it actionable" data-type="indexterm" id="id-YDTVHRS3T6"> </a> systemic <a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="making diversity actionable" data-type="indexterm" id="id-ddTLSMS6TG"> </a>discrimination we see in the technology sector. We are accountable for the failures in the system. Deferring or abstracting away personal accountability is ineffective, and depending on your role, it could be irresponsible. It is also irresponsible to fully attribute dynamics at your specific company or within your team to the larger societal issues that contribute to inequity. A favorite line among diversity proponents and detractors alike goes something like this: “We are working hard to fix (insert systemic discrimination topic), but accountability is hard. How do we combat (insert hundreds of years) of historical discrimination?” This line of inquiry is a detour to a more philosophical or academic conversation and away from focused efforts to improve work conditions or outcomes. <a contenteditable="false" data-primary="multicultural capacity, building" data-secondary="how inequalities in society impact workplaces" data-type="indexterm" id="id-GLTJfRSmTm"> </a>Part of building multicultural capacity requires a more comprehensive understanding of how systems of inequality in society impact the workplace, especially in the technology sector.</p>
|
||||
|
||||
<p>If you are an engineering manager working on hiring more people from underrepresented groups, deferring to the historical impact of discrimination in the world is a useful academic exercise.<a contenteditable="false" data-primary="hiring of software engineers" data-secondary="making diversity actionable" data-type="indexterm" id="id-ddTnHXf6TG"> </a> However, it is critical to move beyond the academic conversation to a focus on quantifiable and actionable steps that you can take to drive equity and fairness. For example, as a hiring software engineer manager, you’re accountable for ensuring that your candidate slates are balanced. Are there women or other underrepresented groups in the pool of candidates’ reviews? After you hire someone, what opportunities for growth have you provided, and is the distribution of opportunities equitable? Every technology lead or software engineering manager has the means to augment equity on their teams. It is important that we acknowledge that, although there are significant systemic challenges, we are all part of the system. It is our problem to fix.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="reject_singular_approaches">
|
||||
<h1>Reject Singular Approaches</h1>
|
||||
|
||||
<p>We cannot perpetuate solutions that present a single philosophy or methodology for fixing inequity in the technology sector. <a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="rejecting singular approaches" data-type="indexterm" id="id-ddTnHMSJsG"> </a>Our problems are complex and multifactorial. Therefore, we must disrupt singular approaches to advancing representation in the workplace, even if they are promoted by people we admire or who have institutional power.</p>
|
||||
|
||||
<p>One singular narrative held dear in the technology industry is that lack of representation in the workforce can be addressed solely by fixing the hiring pipelines. Yes, that is a fundamental step, but that is not the immediate issue we need to fix. We need to recognize systemic inequity in progression and retention while simultaneously focusing on more representative hiring and educational disparities across lines of race, gender, and socioeconomic and immigration status, for example.</p>
|
||||
|
||||
<p>In the technology industry, many people from underrepresented groups are passed over daily for opportunities and advancement. Attrition among Black+ Google employees <a href="https://oreil.ly/JFbTR">outpaces attrition from all other groups</a> and confounds progress on representation goals. If we want to drive change and increase representation, we need to evaluate whether we’re creating an ecosystem in which all aspiring engineers and other technology professionals can thrive.</p>
|
||||
|
||||
<p>Fully understanding an entire problem space is critical to determining how to fix it. This holds true for everything from a critical data migration to the hiring of a representative workforce. For example, if you are an engineering manager who wants to hire more women, don’t just focus on building a pipeline. Focus on other aspects of the hiring, retention, and progression ecosystem and how inclusive it might or might not be to women. Consider whether your recruiters are demonstrating the ability to identify strong candidates who are women as well as men. If you manage a diverse engineering team, focus on psychological safety and invest in increasing multicultural capacity on the team so that new team members feel welcome.</p>
|
||||
|
||||
<p>A common methodology today is to build for the majority use case first, leaving improvements and features that address edge cases for later. But this approach is flawed; it gives users who are already advantaged in access to technology a head start, which increases inequity. <a contenteditable="false" data-primary="users" data-secondary="relegating consideration of user groups to late in development" data-type="indexterm" id="id-p6TqHludsl"> </a>Relegating the consideration of all user groups to the point when design has been nearly completed is to lower the bar of what it means to be an excellent engineer. Instead, by building in inclusive design from the start and raising development standards for development to make tools delightful and accessible for people who struggle to access technology, we enhance the experience for <em>all</em> users.</p>
|
||||
|
||||
<p>Designing for the user who is least like you is not just wise, it’s a best practice. There are pragmatic and immediate next steps that all technologists, regardless of domain, should consider when developing products that avoid disadvantaging or underrepresenting users. It begins with more comprehensive user-experience research. This research should be done with user groups that are multilingual and multicultural and that span multiple countries, socioeconomic class, abilities, and age ranges. Focus on the most difficult or least represented use case first.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="challenge_established_processes">
|
||||
<h1>Challenge Established Processes</h1>
|
||||
|
||||
<p>Challenging yourself to build more equitable systems goes beyond designing more inclusive product specifications.<a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="challenging established processes" data-type="indexterm" id="id-GLTmHRSYFm"> </a> Building equitable systems sometimes means challenging established processes that drive invalid results.</p>
|
||||
|
||||
<p>Consider a recent case evaluated for equity implications. At Google, several engineering teams worked to build a global hiring requisition system. The system supports both external hiring and internal mobility. The engineers and product managers involved did a great job of listening to the requests of what they considered to be their core user group: recruiters.<a contenteditable="false" data-primary="performance of software engineers" data-secondary="flaws in performance ratings" data-type="indexterm" id="id-8wTbHnfPF1"> </a> The recruiters were focused on minimizing wasted time for hiring managers and applicants, and they presented the development team with use cases focused on scale and efficiency for those people. To drive efficiency, the recruiters asked the engineering team to include a feature that would highlight performance ratings—specifically lower ratings—to the hiring manager and recruiter as soon as an internal transfer expressed interest in a job.</p>
|
||||
|
||||
<p>On its face, expediting the evaluation process and helping jobseekers save time is a great goal. So where is the potential equity concern? The following equity questions were raised:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Are developmental assessments a predictive measure of performance?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Are the performance assessments being presented to prospective managers free of individual bias?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Are performance assessment scores standardized across organizations?</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>If the answer to any of these questions is “no,” presenting performance ratings could still drive inequitable, and therefore invalid, results.</p>
|
||||
|
||||
<p>When an exceptional engineer questioned whether past performance was in fact predictive of future performance, the reviewing team decided to conduct a thorough review. In the end, it was determined that candidates who had received a poor performance rating were likely to overcome the poor rating if they found a new team. In fact, they were just as likely to receive a satisfactory or exemplary performance rating as candidates who had never received a poor rating. In short, performance ratings are indicative only of how a person is performing in their given role <em>at the time they are being evaluated</em>. Ratings, although an important way to measure performance during a specific period, are not predictive of future performance and should not be used to gauge readiness for a future role or qualify an internal candidate for a different team. (They can, however, be used to evaluate whether an employee is properly or improperly slotted on their current team; therefore, they can provide an opportunity to evaluate how to better support an internal candidate moving forward.)</p>
|
||||
|
||||
<p>This analysis definitely took up significant project time, but the positive trade-off was a more equitable internal mobility process.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="values_versus_outcomes">
|
||||
<h1>Values Versus Outcomes</h1>
|
||||
|
||||
<p>Google has a strong track record of investing in hiring. <a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="values versus outcomes" data-type="indexterm" id="id-8wTbHDSki1"> </a>As the previous example illustrates, we also continually evaluate our processes in order to improve equity and inclusion.<a contenteditable="false" data-primary="values versus outcomes in equitable engineering" data-type="indexterm" id="id-AATYSKS0iV"> </a> More broadly, our core values are based on respect and an unwavering commitment to a diverse and inclusive workforce. Yet, year after year, we have also missed our mark on hiring a representative workforce that reflects our users around the globe. The struggle to improve our equitable outcomes persists despite the policies and programs in place to help support inclusion initiatives and promote excellence in hiring and progression. The failure point is not in the values, intentions, or investments of the company, but rather in the application of those policies at the <em>implementation</em> level.</p>
|
||||
|
||||
<p>Old habits are hard to break. The users you might be used to designing for today—the ones you are used to getting feedback from—might not be representative of all the users you need to reach. We see this play out frequently across all kinds of products, from wearables that do not work for women’s bodies to video-conferencing software that does not work well for people with darker skin tones.</p>
|
||||
|
||||
<p>So, what’s the way out?</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p><strong>Take a hard look in the mirror.</strong> At Google, we have the brand slogan, “Build For Everyone.” How can we <a contenteditable="false" data-primary="building for everyone" data-type="indexterm" id="id-9aTXSGHmHWcbix"> </a>build for everyone when we do not have a representative workforce or engagement model that centralizes community feedback first? We can’t. The truth is that we have at times very publicly failed to protect our most vulnerable users from racist, antisemitic, and homophobic content.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Don’t build for everyone. Build <em>with</em> everyone.</strong> We are not building for everyone yet. That work does not happen in a vacuum, and it certainly doesn’t happen when the technology is still not representative of the population as a whole. That said, we can’t pack up and go home. So how do we build for everyone? We build with our users. We need to engage our users across the spectrum of humanity and be intentional about putting the most vulnerable communities at the center of our design. They should not be an afterthought.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Design for the user who will have the most difficulty using your product.</strong> Building for those with additional challenges will make the product better for everyone. Another way of thinking about this is: don’t trade equity for short-term velocity.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Don’t assume equity; measure equity throughout your systems.</strong> Recognize that decision makers are also subject to bias and might be undereducated about the causes of inequity. You might not have the expertise to identify or measure the scope of an equity issue. Catering to a single userbase might mean disenfranchising another; these trade-offs can be difficult to spot and impossible to reverse. Partner with individuals or teams that are subject matter experts in diversity, equity, and inclusion.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Change is possible.</strong> The problems we’re facing with technology today, from surveillance to disinformation to online harassment, are genuinely overwhelming. We can’t solve these with the failed approaches of the past or with just the skills we already have. We need to change.</p>
|
||||
</li>
|
||||
</ol>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stay_curiouscomma_push_forward">
|
||||
<h1>Stay Curious, Push Forward</h1>
|
||||
|
||||
<p>The path to equity is long and complex. <a contenteditable="false" data-primary="equitable and inclusive engineering" data-secondary="staying curious, and pushing forward" data-type="indexterm" id="id-AAT0HKSmHV"> </a>However, we can and should transition from simply building tools and services to growing our understanding of how the products we engineer impact humanity. Challenging our education, influencing our teams and managers, and doing more comprehensive user research are all ways to make progress. Although change is uncomfortable and the path to high performance can be painful, it is possible through collaboration and creativity.</p>
|
||||
|
||||
<p>Lastly, as future exceptional engineers, we should focus first on the users most impacted by bias and discrimination.<a contenteditable="false" data-primary="users" data-secondary="focusing first on users most impacted by bias and discrimination" data-type="indexterm" id="id-p6TqH8fWHl"> </a> Together, we can work to accelerate progress by focusing on Continuous Improvement and owning our failures. Becoming an engineer is an involved and continual process. The goal is to make changes that push humanity forward without further disenfranchising the disadvantaged. As future exceptional engineers, we have faith that we can prevent future failures in the system.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00008">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Developing software, and developing a software organization, is a team effort. As a software organization scales, it must respond and adequately design for its user base, which in the interconnected world of computing today involves everyone, locally and around the world. More effort must be made to make both the development teams that design software and the products that they produce reflect the values of such a diverse and encompassing set of users. And, if an engineering organization wants to scale, it cannot ignore underrepresented groups; not only do such engineers from these groups augment the organization itself, they provide unique and necessary perspectives for the design and implementation of software that is truly useful to the world at large.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00103">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Bias is the default.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Diversity is necessary to design properly for a comprehensive user base.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Inclusivity is critical not just to improving the hiring pipeline for underrepresented groups, but to providing a truly supportive work environment for all people.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Product velocity must be evaluated against providing a product that is truly useful to all users. It’s better to slow down than to release a product that might cause harm to some users.<a contenteditable="false" data-primary="equitable and inclusive engineering" data-startref="ix_equi" data-type="indexterm" id="id-rEToHzHECmS2f6"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn51"><sup><a href="ch04.html#ch01fn51-marker">1</a></sup><a href="https://diversity.google/annual-report">Google’s 2019 Diversity Report</a>.</p><p data-type="footnote" id="ch01fn52"><sup><a href="ch04.html#ch01fn52-marker">2</a></sup>@jackyalcine. 2015. “Google Photos, Y’all Fucked up. My Friend’s Not a Gorilla.” Twitter, June 29, 2015. <a href="https://twitter.com/jackyalcine/status/615329515909156865"><em>https://twitter.com/jackyalcine/status/615329515909156865</em></a>.</p><p data-type="footnote" id="ch01fn53"><sup><a href="ch04.html#ch01fn53-marker">3</a></sup>Many reports in 2018–2019 pointed to a lack of diversity across tech. Some notables include <a href="https://oreil.ly/P9ocC">the National Center for Women & Information Technology</a>, and <a href="https://oreil.ly/Y1pUW">Diversity in Tech</a>.</p><p data-type="footnote" id="ch01fn54"><sup><a href="ch04.html#ch01fn54-marker">4</a></sup>Tom Simonite, “When It Comes to Gorillas, Google Photos Remains Blind,” <em>Wired</em>, January 11, 2018.</p><p data-type="footnote" id="ch01fn55"><sup><a href="ch04.html#ch01fn55-marker">5</a></sup>Stephen Gaines and Sara Williams. “The Perpetual Lineup: Unregulated Police Face Recognition in America.” <em>Center on Privacy & Technology at Georgetown Law</em>, October 18, 2016.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
397
clones/abseil.io/resources/swe-book/html/ch05.html
Normal file
|
@ -0,0 +1,397 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="how_to_lead_a_team">
|
||||
<h1>How to Lead a Team</h1>
|
||||
|
||||
<p class="byline">Written by Brian Fitzpatrick</p>
|
||||
|
||||
<p class="byline">Edited by Riona MacNamara</p>
|
||||
|
||||
<p>We’ve covered a lot of ground so far on the <a contenteditable="false" data-primary="leading a team" data-type="indexterm" id="ix_lead"> </a>culture and composition of teams writing software, and in this chapter, we’ll take a look at the person ultimately responsible for making it all work.<a contenteditable="false" data-primary="teams" data-secondary="leading" data-seealso="leading a team" data-type="indexterm" id="id-Rbc3h5I8"> </a></p>
|
||||
|
||||
<p>No team can function well without a leader, especially at Google, where engineering is almost exclusively a team endeavor. At Google, we recognize two different leadership roles. A <em>Manager</em> is a leader of people, whereas a <em>Tech Lead</em> leads technology efforts. Although the responsibilities of these two roles require similar planning skills, they require quite different people skills.</p>
|
||||
|
||||
<p>A boat without a captain is nothing more than a floating waiting room: unless someone grabs the rudder and starts the engine, it’s just going to drift along aimlessly with the current. A piece of software is just like that boat: if no one pilots it, you’re left with a group of engineers burning up valuable time, just sitting around waiting for something to happen (or worse, still writing code that you don’t need). Although this chapter is about people management and technical leadership, it is still worth a read if you’re an individual contributor because it will likely help you understand your own leaders a bit better.</p>
|
||||
|
||||
<section data-type="sect1" id="managers_and_tech_leads_left_parenthesi">
|
||||
<h1>Managers and Tech Leads (and Both)</h1>
|
||||
|
||||
<p>Whereas every engineering team generally has a leader, they acquire those leaders in different ways.<a contenteditable="false" data-primary="managers and tech leads" data-type="indexterm" id="ix_mgrTL"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="managers and tech leads" data-type="indexterm" id="ix_leadmgrTL"> </a> This is certainly true at Google; sometimes an experienced manager comes in to run a team, and sometimes an individual contributor is promoted into a leadership position (usually of a smaller team).</p>
|
||||
|
||||
<p>In nascent teams, both roles will sometimes be filled by the same person: a <em>Tech Lead Manager</em> (TLM). On larger teams, an experienced people manager will step in to take on the management role while a senior engineer with extensive experience will step into the tech lead role. Even though manager and tech lead each play an important part in the growth and productivity of an engineering team, the people skills required to succeed in each role are wildly different.</p>
|
||||
|
||||
<section data-type="sect2" id="the_engineering_manager">
|
||||
<h2>The Engineering Manager</h2>
|
||||
|
||||
<p>Many companies bring in trained people managers who might know little to nothing about software engineering to run their engineering teams.<a contenteditable="false" data-primary="leading a team" data-secondary="managers and tech leads" data-tertiary="engineering manager" data-type="indexterm" id="id-g5cPHbhxIbTE"> </a><a contenteditable="false" data-primary="engineering managers" data-type="indexterm" id="id-p1c0hehVIrTD"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="engineering manager" data-type="indexterm" id="id-Nqc7t0hrIRT6"> </a> Google decided early on, however, that its software engineering managers should have an engineering background. This meant hiring experienced managers who used to be software engineers, or training software engineers to be managers (more on this later).</p>
|
||||
|
||||
<p>At the highest level, an engineering manager is responsible for the performance, productivity, and happiness of every person on their team—including their tech lead—while still making sure that the needs of the business are met by the product for which they are responsible. Because the needs of the business and the needs of individual team members don’t always align, this can often place a manager in a difficult position.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_tech_lead">
|
||||
<h2>The Tech Lead</h2>
|
||||
|
||||
<p>The tech lead (TL) of a team—who<a contenteditable="false" data-primary="managers and tech leads" data-secondary="tech lead" data-type="indexterm" id="id-p1cOHehefrTD"> </a> will often report <a contenteditable="false" data-primary="tech lead (TL)" data-type="indexterm" id="id-NqcAh0h6fRT6"> </a>to the manager <a contenteditable="false" data-primary="leading a team" data-secondary="managers and tech leads" data-tertiary="tech lead" data-type="indexterm" id="id-ddcktMhOfdTP"> </a>of that team—is responsible for (surprise!) the technical aspects of the product, including technology decisions and choices, architecture, priorities, velocity, and general project management (although on larger teams they might have program managers helping out with this). The TL will usually work hand in hand with the engineering manager to ensure that the team is adequately staffed for their product and that engineers are set to work on tasks that best match their skill sets and skill levels. Most TLs are also individual contributors, which often forces them to choose between doing something quickly themselves or delegating it to a team member to do (sometimes) more slowly. The latter is most often the correct decision for the TL as they grow the size and capability of their team.<a contenteditable="false" data-primary="TL" data-see="tech lead" data-type="indexterm" id="id-agcGImhPf0T1"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_tech_lead_manager">
|
||||
<h2>The Tech Lead Manager</h2>
|
||||
|
||||
<p>On small and nascent teams for which engineering managers need a strong technical skill set, the <a contenteditable="false" data-primary="managers and tech leads" data-secondary="tech lead manager (TLM)" data-type="indexterm" id="id-NqcvH0hyCRT6"> </a>default <a contenteditable="false" data-primary="tech lead manager (TLM)" data-type="indexterm" id="id-ddc7hMhLCdTP"> </a>is often <a contenteditable="false" data-primary="leading a team" data-secondary="managers and tech leads" data-tertiary="tech lead manager" data-type="indexterm" id="id-agc0tmhYC0T1"> </a>to have a TLM: a single person who can handle both the people and technical needs of their team. Sometimes, a TLM is a more senior person, but more often than not, the role is taken on by someone who was, until recently, an individual contributor.<a contenteditable="false" data-primary="TLM" data-see="tech lead manager" data-type="indexterm" id="id-LJcLIQhJCNTe"> </a></p>
|
||||
|
||||
<p>At Google, it’s customary for larger, well-established teams to have a pair of leaders—one TL and one engineering manager—working together as partners. The theory is that it’s really difficult to do both jobs at the same time (well) without completely burning out, so it’s better to have two specialists crushing each role with dedicated focus.</p>
|
||||
|
||||
<p>The job of TLM is a tricky one and often requires the TLM to learn how to balance individual work, delegation, and people management. As such, it usually requires a high degree of mentoring and assistance from more experienced TLMs. (In fact, we recommend that in addition to taking a number of classes that Google offers on this subject, a newly minted TLM seek out a senior mentor who can advise them regularly as they grow into the role.)</p>
|
||||
|
||||
<aside data-type="sidebar" id="case-study-influencing-without-authority-P1HyfDC6Tk">
|
||||
<h5>Case Study: Influencing Without Authority</h5>
|
||||
|
||||
<p>It’s generally accepted that you can get folks who report to you to do the work that you need<a contenteditable="false" data-primary="leading a team" data-secondary="managers and tech leads" data-tertiary="case study, influencing without authority" data-type="indexterm" id="id-5XcnHahEfRC9TJ"> </a> done for<a contenteditable="false" data-primary="managers and tech leads" data-secondary="case study, influencing without authority" data-type="indexterm" id="id-vecxh3hWfNCgTx"> </a> your products, but it’s<a contenteditable="false" data-primary="influencing without authority (case study)" data-type="indexterm" id="id-DKcWt7h0fWCYT8"> </a> different when you need to get people outside of your organization—or heck, even outside of your product area sometimes—to do something that you think needs to be done. This “influence without authority” is one of the most powerful leadership traits that you can develop.</p>
|
||||
|
||||
<p>For example, for years, Jeff Dean, senior engineering fellow and possibly the most well-known Googler <em>inside</em> of Google, led only a fraction of Google’s engineering team, but his influence on technical decisions and direction reaches to the ends of the entire engineering organization and beyond (thanks to his writing and speaking outside of the company).</p>
|
||||
|
||||
<p>Another example is a team that I started called The Data Liberation Front: with a team of less than a half-dozen engineers, we managed to get more than 50 Google products to export their data through a product that we launched called Google Takeout. At the time, there was no formal directive from the executive level at Google for all products to be a part of Takeout, so how did we get hundreds of engineers to contribute to this effort? By identifying a strategic need for the company, showing how it linked to the mission and existing priorities of the company, and working with a small group of engineers to develop a tool that allowed teams to quickly and easily integrate with Takeout.<a contenteditable="false" data-primary="leading a team" data-secondary="managers and tech leads" data-startref="ix_leadmgrTL" data-type="indexterm" id="id-DKc0HGI0fWCYT8"> </a><a contenteditable="false" data-primary="managers and tech leads" data-startref="ix_mgrTL" data-type="indexterm" id="id-KWcJh9IAfgCYTJ"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="moving_from_an_individual_contributor_r">
|
||||
<h1>Moving from an Individual Contributor Role <span class="keep-together">to a Leadership Role</span></h1>
|
||||
|
||||
<p>Whether or not they’re officially<a contenteditable="false" data-primary="leading a team" data-secondary="moving from individual contributor to leadership role" data-type="indexterm" id="ix_leadmv"> </a> appointed, someone needs to <a contenteditable="false" data-primary="managers and tech leads" data-secondary="moving from individual contributor to leadership role" data-type="indexterm" id="ix_mgrTLmv"> </a>get into the driver’s seat if your product is ever going to go anywhere, and if you’re the motivated, impatient type, that person might be you. You might find yourself sucked into helping your team resolve conflicts, make decisions, and coordinate people. It happens all the time, and often by accident. Maybe you never intended to become a “leader,” but somehow it happened anyway.<a contenteditable="false" data-primary="“manageritis”" data-primary-sortas="manageritis" data-type="indexterm" id="id-g5c0tbhque"> </a> Some people refer to this affliction as “manageritis.”</p>
|
||||
|
||||
<p>Even if you’ve sworn to yourself that you’ll never become a manager, at some point in your career, you’re likely to find yourself in a leadership position, especially if you’ve been successful in your role. The rest of this chapter is intended to help you understand what to do when this happens.</p>
|
||||
|
||||
<p>We’re not here to attempt to convince you to become a manager, but rather to help show why the best leaders work to serve their team using the principles of humility, respect, and trust. Understanding the ins and outs of leadership is a vital skill for influencing the direction of your work. If you want to steer the boat for your project and not just go along for the ride, you need to know how to navigate, or you’ll run yourself (and your project) onto a sandbar.</p>
|
||||
|
||||
<section data-type="sect2" id="the_only_thing_to_fear_issemicolonwellc">
|
||||
<h2>The Only Thing to Fear Is…Well, Everything</h2>
|
||||
|
||||
<p>Aside from the general sense of malaise that most people feel when they hear the word "manager," there are a number of reasons that most people don’t want to become managers.<a contenteditable="false" data-primary="leading a team" data-secondary="moving from individual contributor to leadership role" data-tertiary="reasons people don't want to be managers" data-type="indexterm" id="id-NqcvH0h6fgu6"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="moving from individual contributor to leadership role" data-tertiary="reasons people don't want to be managers" data-type="indexterm" id="id-ddc7hMhOfkuP"> </a> The biggest reason you’ll hear in the software development world is that you spend much less time writing code. This is true whether you become a TL or an engineering manager, and I’ll talk more about this later in the chapter, but first, let’s cover some more reasons why most of us avoid becoming managers.</p>
|
||||
|
||||
<p>If you’ve spent the majority of your career writing code, you typically end a day with something you can point to—whether it’s code, a design document, or a pile of bugs you just closed—and say, “That’s what I did today.” But at the end of a busy day of “management,” you’ll usually find yourself thinking, “I didn’t do a damned thing today.” It’s the equivalent of spending years counting the number of apples you picked each day and changing to a job growing bananas, only to say to yourself at the end of each day, “I didn’t pick any apples,” happily ignoring the flourishing banana trees sitting next to you. Quantifying management work is more difficult than counting widgets you turned out, but just making it possible for your team to be happy and productive is a big measure of your job. Just don’t fall into the trap of counting apples when you’re growing bananas.<sup><a data-type="noteref" id="ch01fn57-marker" href="ch05.html#ch01fn57">1</a></sup></p>
|
||||
|
||||
<p>Another big reason for not becoming a manager is often unspoken<a contenteditable="false" data-primary="“Peter Principle”" data-primary-sortas="Peter" data-type="indexterm" id="id-agcvH1IPfKu1"> </a> but rooted in the famous “Peter Principle,” which states that “In a hierarchy every employee tends to rise to his level of incompetence.” Google generally avoids this by requiring that a person perform the job <em>above</em> their current level for a period of time (i.e., to “exceeds expectations” at their current level) before being promoted to that level. Most people have had a manager who was incapable of doing their job or was just really bad at managing people,<sup><a data-type="noteref" id="ch01fn58-marker" href="ch05.html#ch01fn58">2</a></sup> and we know some people who have worked only for bad managers. If you’ve been exposed only to crappy managers for your entire career, why would you <em>ever</em> want to be a manager? Why would you want to be promoted to a role that you don’t feel able to do?</p>
|
||||
|
||||
<p>There are, however, great reasons to consider becoming a TL or manager. First, it’s a way to scale yourself. Even if you’re great at writing code, there’s still an upper limit to the amount of code you can write. Imagine how much code a team of great engineers could write under your leadership! Second, you might just be really good at it—many people who find themselves sucked into the leadership vacuum of a project discover that they’re exceptionally skilled at providing the kind of guidance, help, and air cover a team or a company needs. Someone has to lead, so why not you?</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="servant_leadership">
|
||||
<h2>Servant Leadership</h2>
|
||||
|
||||
<p>There seems to be a sort of disease that strikes managers in which they forget about all the<a contenteditable="false" data-primary="managers and tech leads" data-secondary="moving from individual contributor to leadership role" data-tertiary="servant leadership" data-type="indexterm" id="id-ddcwHMhLCkuP"> </a> awful things their managers <a contenteditable="false" data-primary="servant leadership" data-type="indexterm" id="id-agcBhmhYCKu1"> </a>did to them and suddenly <a contenteditable="false" data-primary="leading a team" data-secondary="moving from individual contributor to leadership role" data-tertiary="servant leadership" data-type="indexterm" id="id-LJc5tQhJCJue"> </a>begin doing these same things to “manage” the people that report to them. The symptoms of this disease include, but are by no means limited to, micromanaging, ignoring low performers, and hiring pushovers. Without prompt treatment, this disease can kill an entire team. The best advice I received when I first became a manager at Google was from Steve Vinter, an engineering director at the time. He said, “Above all, resist the urge to manage.” One of the greatest urges of the newly minted manager is to actively “manage” their employees because that’s what a manager does, right? This typically has disastrous consequences.</p>
|
||||
|
||||
<p>The cure for the “management” disease is a liberal application of “servant leadership,” which is a nice way of saying the most important thing you can do as a leader is to serve your team, much like a butler or majordomo tends to the health and well-being of a household. As a servant leader, you should strive to create an atmosphere of humility, respect, and trust. This might mean removing bureaucratic obstacles that a team member can’t remove by themselves, helping a team achieve consensus, or even buying dinner for the team when they’re working late at the office. The servant leader fills in the cracks to smooth the way for their team and advises them when necessary, but still isn’t afraid of getting their hands dirty. The only managing that a servant leader does is to manage both the technical and social health of the team; as tempting as it might be to focus on purely the technical health of the team, the social health of the team is just as important (but often infinitely more difficult to manage).<a contenteditable="false" data-primary="managers and tech leads" data-secondary="moving from individual contributor to leadership role" data-startref="ix_mgrTLmv" data-type="indexterm" id="id-agcvHKtYCKu1"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="moving from individual contributor to leadership role" data-startref="ix_leadmv" data-type="indexterm" id="id-LJcRhptJCJue"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="the_engineering_manage">
|
||||
<h1>The Engineering Manager</h1>
|
||||
|
||||
<p>So, what is<a contenteditable="false" data-primary="engineering managers" data-type="indexterm" id="ix_engmgr"> </a> actually expected of a manager at <a contenteditable="false" data-primary="managers and tech leads" data-secondary="engineering manager" data-type="indexterm" id="ix_mgrTLmgr"> </a>a modern <a contenteditable="false" data-primary="leading a team" data-secondary="engineering manager" data-type="indexterm" id="ix_leadengmgr"> </a>software company? Before the computing age, “management” and “labor” might have taken on almost antagonistic roles, with the manager wielding all of the power and labor requiring collective action to achieve its own ends. But that isn’t how modern software companies work.</p>
|
||||
|
||||
<section data-type="sect2" id="manager_is_a_four-letter_word">
|
||||
<h2>Manager Is a Four-Letter Word</h2>
|
||||
|
||||
<p>Before talking about the core responsibilities of an engineering manager at Google, let’s review the history of managers.<a contenteditable="false" data-primary="leading a team" data-secondary="engineering manager" data-tertiary="history of managers" data-type="indexterm" id="id-p1cOHehlt3UD"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="engineering manager" data-tertiary="history of managers" data-type="indexterm" id="id-NqcAh0h9tdU6"> </a><a contenteditable="false" data-primary="engineering managers" data-secondary="manager as four-letter word" data-type="indexterm" id="id-ddcktMhAtgUP"> </a> The present-day concept of the pointy-haired manager is partially a carryover, first from military hierarchy and later adopted by the Industrial Revolution—more than a hundred years ago! Factories began popping up everywhere, and they required (usually unskilled) workers to keep the machines going. Consequently, these workers required supervisors to manage them, and because it was easy to replace these workers with other people who were desperate for a job, the managers had little motivation to treat their employees well or improve conditions for them. Whether humane or not, this method worked well for many years when the employees had nothing more to do than perform rote tasks.</p>
|
||||
|
||||
<p>Managers frequently treated employees in the same way that cart drivers would treat their mules: they motivated them by alternately<a contenteditable="false" data-primary="carrot-and-stick method of management" data-type="indexterm" id="id-NqcvHQt9tdU6"> </a> leading them forward with a carrot, and, when that didn’t work, whipping them with a stick. This carrot-and-stick method of management survived the transition from the factory<sup><a data-type="noteref" id="ch01fn59-marker" href="ch05.html#ch01fn59">3</a></sup> to the modern office, where the stereotype of the tough-as-nails manager-as-mule-driver flourished in the middle part of the twentieth century when employees would work at the same job for years and years.</p>
|
||||
|
||||
<p>This continues today in some industries—even in industries that require creative thinking and problem solving—despite numerous studies suggesting that the anachronistic carrot and stick is ineffective and harmful to the productivity of creative people. Whereas the assembly-line worker of years past could be trained in days and replaced at will, software engineers working on large codebases can take months to get up to speed on a new team. Unlike the replaceable assembly-line worker, these people need nurturing, time, and space to think and create.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="todayapostrophes_engineering_manager">
|
||||
<h2>Today’s Engineering Manager</h2>
|
||||
|
||||
<p>Most people still use the title “manager” despite the fact that it’s often an anachronism.<a contenteditable="false" data-primary="leading a team" data-secondary="engineering manager" data-tertiary="today's manager" data-type="indexterm" id="id-NqcvH0hrIdU6"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="engineering manager" data-tertiary="contemporary" data-type="indexterm" id="id-ddc7hMhgIgUP"> </a><a contenteditable="false" data-primary="engineering managers" data-secondary="contemporary managers" data-type="indexterm" id="id-agc0tmhWI9U1"> </a> The title itself often encourages new managers to <em>manage</em> their reports. Managers can wind up acting like parents,<sup><a data-type="noteref" id="ch01fn60-marker" href="ch05.html#ch01fn60">4</a></sup> and consequently employees react like children. To frame this in the context of humility, respect, and trust: if a manager makes it obvious that they trust their employee, the employee feels positive pressure to live up to that trust. It’s that simple. A good manager forges the way for a team, looking out for their safety and well-being, all while making sure their needs are met. If there’s one thing you remember from this chapter, make it this:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>Traditional managers worry about how to get things done, whereas great managers worry about what things get done (and trust their team to figure out how to do it).</p>
|
||||
</blockquote>
|
||||
|
||||
<p>A new engineer, Jerry, joined my team a few years ago. Jerry’s last manager (at a different company) was adamant that he be at his desk from 9:00 to 5:00 every day, and assumed that if he wasn’t there, he wasn’t working enough (which is, of course, a ridiculous assumption). On his first day working with me, Jerry came to me at 4:40 p.m. and stammered out an apology that he had to leave 15 minutes early because he had an appointment that he had been unable to reschedule. I looked at him, smiled, and told him flat out, “Look, as long as you get your job done, I don’t care what time you leave the office.” Jerry stared blankly at me for a few seconds, nodded, and went on his way. I treated Jerry like an adult; he always got his work done, and I never had to worry about him being at his desk, because he didn’t need a babysitter to get his work done. If your employees are so uninterested in their job that they actually need traditional-manager babysitting to be convinced to work, <em>that</em> is your real problem.</p>
|
||||
|
||||
<aside data-type="sidebar" id="failure_is_an_option">
|
||||
<h5>Failure Is an Option</h5>
|
||||
|
||||
<p>Another way to catalyze your team is to make them feel safe and secure so that they can take <a contenteditable="false" data-primary="failures" data-secondary="failure is an option" data-type="indexterm" id="id-5XcnHahEfVI0UJ"> </a>greater risks<a contenteditable="false" data-primary="managers and tech leads" data-secondary="engineering manager" data-tertiary="failure as an option" data-type="indexterm" id="id-vecxh3hWfEI9Ux"> </a> by building<a contenteditable="false" data-primary="engineering managers" data-secondary="letting the team know failure is an option" data-type="indexterm" id="id-DKcWt7h0feIlU8"> </a> psychological safety—meaning <a contenteditable="false" data-primary="leading a team" data-secondary="engineering manager" data-tertiary="failure as an option" data-type="indexterm" id="id-KWc7IkhAfQIoUJ"> </a>that your team members feel like they can be themselves without fear of negative repercussions from you or their team members.<a contenteditable="false" data-primary="psychological safety" data-secondary="catalyzing your team by building" data-type="indexterm" id="id-xbc9fAhofmIYUm"> </a> Risk is a fascinating thing; most humans <a contenteditable="false" data-primary="risks" data-secondary="making failure an option" data-type="indexterm" id="id-nBc1C5hxf9IlUb"> </a>are terrible at evaluating risk, and most companies try to avoid risk at all costs. As a result, the usual modus operandi is to work conservatively and focus on smaller successes, even when taking a bigger risk might mean exponentially greater success. A common saying at Google is that if you try to achieve an impossible goal, there’s a good chance you’ll fail, but if you fail trying to achieve the impossible, you’ll most likely accomplish far more than you would have accomplished had you merely attempted something you knew you could complete. A good way to build a culture in which risk taking is accepted is to let your team know that it’s OK to fail.</p>
|
||||
|
||||
<p>So, let’s get that out of the way: it’s OK to fail. In fact, we like to think of failure as a way of learning a lot really quickly (provided that you’re not repeatedly failing at the same thing). In addition, it’s important to see failure as an opportunity to learn and not to point fingers or assign blame. Failing fast is good because there’s not a lot at stake. Failing slowly can also teach a valuable lesson, but it is more painful because more is at risk and more can be lost (usually engineering time). Failing in a manner that affects customers is probably the least desirable failure that we encounter, but it’s also one in which we have the greatest amount of structure in place to learn from failures. As mentioned earlier, every time there is a major production failure at Google, we perform a postmortem.<a contenteditable="false" data-primary="blameless postmortems" data-type="indexterm" id="id-vec3HbtWfEI9Ux"> </a><a contenteditable="false" data-primary="postmortems, blameless" data-type="indexterm" id="id-DKclhxt0feIlU8"> </a> This procedure is a way to document the events that led to the actual failure and to develop a series of steps that will prevent it from happening in the future. This is neither an opportunity to point fingers, nor is it intended to introduce unnecessary bureaucratic checks; rather, the goal is to strongly focus on the core of the problem and fix it once and for all. It’s very difficult, but quite effective (and cathartic).</p>
|
||||
|
||||
<p>Individual successes and failures are a bit different. It’s one thing to laud individual successes, but looking to assign individual blame in the case of failure is a great way to divide a team and discourage risk taking across the board. It’s alright to fail, but fail as a team and learn from your failures. If an individual succeeds, praise them in front of the team. If an individual fails, give constructive criticism in private.<sup><a data-type="noteref" id="ch01fn61-marker" href="ch05.html#ch01fn61">5</a></sup> Whatever the case, take advantage of the opportunity and apply a liberal helping of humility, respect, and trust to help your team learn from its failures.<a contenteditable="false" data-primary="managers and tech leads" data-secondary="engineering manager" data-startref="ix_mgrTLmgr" data-type="indexterm" id="id-KWcJh9IAfQIoUJ"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="engineering manager" data-startref="ix_leadengmgr" data-type="indexterm" id="id-xbc3t2IofmIYUm"> </a><a contenteditable="false" data-primary="engineering managers" data-startref="ix_engmgr" data-type="indexterm" id="id-nBcvIXIxf9IlUb"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="antipatterns">
|
||||
<h1>Antipatterns</h1>
|
||||
|
||||
<p>Before we <a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-type="indexterm" id="ix_mgrTLAP"> </a>go over a litany <a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-type="indexterm" id="ix_leadAP"> </a>of “design patterns” for successful TLs and engineering managers, we’re going to review a collection of the patterns that you <em>don’t</em> want to follow if you want to be a successful manager.<a contenteditable="false" data-primary="engineering managers" data-seealso="leading a team; managers and tech leads" data-type="indexterm" id="id-ddcvIMhQSQ"> </a> We’ve observed these destructive patterns in a handful of bad managers that we’ve encountered in our careers, and in more than a few cases, ourselves.</p>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect2" id="antipattern_hire_pushovers">
|
||||
<h2 class="less_space">Antipattern: Hire Pushovers</h2>
|
||||
|
||||
<p>If you’re a manager and you’re feeling insecure in your role (for whatever reason), one way to make sure no one questions your authority or threatens your job is to hire people you can push around.<a contenteditable="false" data-primary="hiring of software engineers" data-secondary="hiring pushovers (antipattern)" data-type="indexterm" id="id-NqcvH0h9tOS6"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-tertiary="hiring pushovers" data-type="indexterm" id="id-ddc7hMhAtYSP"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-tertiary="hiring pushovers" data-type="indexterm" id="id-agc0tmhqtWS1"> </a> You can achieve this by hiring people who aren’t as smart or ambitious as you are, or just people who are more insecure than you. Even though this will cement your position as the team leader and decision maker, it will mean a lot more work for you. Your team won’t be able to make a move without you leading them like dogs on a leash. If you build a team of pushovers, you probably can’t take a vacation; the moment you leave the room, productivity comes to a screeching halt. But surely this is a small price to pay for feeling secure in your job, right?</p>
|
||||
|
||||
<p>Instead, you should strive to hire people who are smarter than you and can replace you. This can be difficult because these very same people will challenge you on a regular basis (in addition to letting you know when you make a mistake). These very same people will also consistently impress you and make great things happen. They’ll be able to direct themselves to a much greater extent, and some will be eager to lead the team, as well. You shouldn’t see this as an attempt to usurp your power; instead, look at it as an opportunity for you to lead an additional team, investigate new opportunities, or even take a vacation without worrying about checking in on the team every day to make sure it’s getting its work done. It’s also a great chance to learn and grow—it’s a lot easier to expand your expertise when surrounded by people who are smarter than you.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="antipattern_ignore_low_performers">
|
||||
<h2>Antipattern: Ignore Low Performers</h2>
|
||||
|
||||
<p>Early in <a contenteditable="false" data-primary="performance of software engineers" data-secondary="ignoring low performers" data-type="indexterm" id="id-ddcwHMhgIYSP"> </a>my career<a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-tertiary="ignoring low performers" data-type="indexterm" id="id-agcBhmhWIWS1"> </a> as a manager at Google, the time <a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-tertiary="ignoring low performers" data-type="indexterm" id="id-LJc5tQhoIQSe"> </a>came for me to hand out bonus letters to my team, and I grinned as I told my manager, “I love being a manager!” Without missing a beat, my manager, a long-time industry veteran, replied, “Sometimes you get to be the tooth fairy, other times you have to be the dentist.”</p>
|
||||
|
||||
<p>It’s never any fun to pull teeth. We’ve seen team leaders do all the right things to build incredibly strong teams only to have these teams fail to excel (and eventually fall apart) because of just one or two low performers. We understand that the human aspect is the most challenging part of writing software, but the most difficult part of dealing with humans is handling someone who isn’t meeting expectations. Sometimes, people miss expectations because they’re not working long enough or hard enough, but the most difficult cases are when someone just isn’t capable of doing their job no matter how long or hard they work.</p>
|
||||
|
||||
<p>Google’s Site Reliability Engineering (SRE) team has a motto: “Hope is not a strategy.” And<a contenteditable="false" data-primary="“Hope is not a strategy”" data-primary-sortas="Hope" data-type="indexterm" id="id-LJcgHwIoIQSe"> </a> nowhere is hope more overused as a strategy than in dealing with a low performer. Most team leaders grit their teeth, avert their eyes, and just <em>hope</em> that the low performer either magically improves or just goes away. Yet it is extremely rare that this person does either.</p>
|
||||
|
||||
<p>While the leader is hoping and the low performer isn’t improving (or leaving), high performers on the team waste valuable time pulling the low performer along, and team morale leaks away into the ether. You can be sure that the team knows the low performer is there even if you’re ignoring them—in fact, the team is <em>acutely</em> aware of who the low performers are, because they have to carry them.</p>
|
||||
|
||||
<p>Ignoring low performers is not only a way to keep new high performers from joining your team, but it’s also a way to encourage existing high performers to leave. You eventually wind up with an entire team of low performers because they’re the only ones who can’t leave of their own volition. Lastly, you aren’t even doing the low performer any favors by keeping them on the team; often, someone who wouldn’t do well on your team could actually have plenty of impact somewhere else.</p>
|
||||
|
||||
<p>The benefit of dealing with a low performer as quickly as possible is that you can put yourself in the position of helping them up or out. If you immediately deal with a low performer, you’ll often find that they merely need some encouragement or direction to slip into a higher state of productivity. If you wait too long to deal with a low performer, their relationship with the team is going to be so sour and you’re going to be so frustrated that you’re not going to be able to help them.</p>
|
||||
|
||||
<p>How do you effectively coach a low performer? <a contenteditable="false" data-primary="coaching a low performer" data-type="indexterm" id="id-KWcaH3uxI1Sr"> </a>The best analogy is to imagine that you’re helping a limping person learn to walk again, then jog, then run alongside the rest of the team. <a contenteditable="false" data-primary="social interaction" data-secondary="coaching a low performer" data-type="indexterm" id="id-xbcKhXuMIYSv"> </a>It almost always requires temporary micromanagement, but still a whole lot of humility, respect, and trust—particularly respect. Set up a specific time frame (say, two months) and some very specific goals you expect them to achieve in that period. Make the goals small, incremental, and measurable so that there’s an opportunity for lots of small successes. Meet with the team member every week to check on progress, and be sure you set really explicit expectations around each upcoming milestone so that it’s easy to measure success or failure. If the low performer can’t keep up, it will become quite obvious to both of you early in the process. At this point, the person will often acknowledge that things aren’t going well and decide to quit; in other cases, determination will kick in and they’ll “up their game” to meet expectations. Either way, by working directly with the low performer, you’re catalyzing important and necessary changes.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="antipattern_ignore_human_issues">
|
||||
<h2>Antipattern: Ignore Human Issues</h2>
|
||||
|
||||
<p>A manager has two major areas of focus for their team: the social and the technical. It’s rather<a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-tertiary="ignoring human issues" data-type="indexterm" id="id-agcvHmhPfWS1"> </a> common<a contenteditable="false" data-primary="human issues, ignoring in a team" data-type="indexterm" id="id-LJcRhQh2fQSe"> </a> for managers to <a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-tertiary="ignoring human issues" data-type="indexterm" id="id-5Xc7tahEfASM"> </a>be stronger in the technical side at Google, and because most managers are promoted from a technical job (for which the primary goal of their job was to solve technical problems), they can tend to ignore human issues. It’s tempting to focus all of your energy on the technical side of your team because, as an individual contributor, you spend the vast majority of your time solving technical problems. When you were a student, your classes were all about learning the technical ins and outs of your work. Now that you’re a manager, however, you ignore the human element of your team at your own peril.</p>
|
||||
|
||||
<p>Let’s begin with an example of a leader ignoring the human element in his team. Years ago, Jake had his first child. Jake and Katie had worked together for years, both remotely and in the same office, so in the weeks following the arrival of the new baby, Jake worked from home. This worked out great for the couple, and Katie was totally fine with it because she was already used to working remotely with Jake. They were their usual productive selves until their manager, Pablo (who worked in a different office), found out that Jake was working from home for most of the week. Pablo was upset that Jake wasn’t going into the office to work with Katie, despite the fact that Jake was just as productive as always and that Katie was fine with the situation. Jake attempted to explain to Pablo that he was just as productive as he would be if he came into the office and that it was much easier on him and his wife for him to mostly work from home for a few weeks. Pablo’s response: “Dude, people have kids all the time. You need to go into the office.” Needless to say, Jake (normally a mild-mannered engineer) was enraged and lost a lot of respect for Pablo.</p>
|
||||
|
||||
<p>There are numerous ways in which Pablo could have handled this differently: he could have showed some understanding that Jake wanted to spend more time at home with his wife and, if his productivity and team weren’t being affected, just let him continue to do so for a while. He could have negotiated that Jake go into the office for one or two days a week until things settled down. Regardless of the end result, a little bit of empathy would have gone a long way toward keeping Jake happy in this situation.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="antipattern_be_everyoneapostrophes_frie">
|
||||
<h2>Antipattern: Be Everyone’s Friend</h2>
|
||||
|
||||
<p>The first foray that most people have into leadership of any sort is when they become the <a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-tertiary="being everyone's friend" data-type="indexterm" id="id-LJcgHQhJCQSe"> </a>manager <a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-tertiary="being everyone's friend" data-type="indexterm" id="id-5Xc1hahrCASM"> </a>or TL of a team of which they were formerly members. Many leads don’t want to lose the friendships they’ve cultivated with their teams, so they will sometimes work extra hard to maintain friendships with their team members after becoming a team lead. This can be a recipe for disaster and for a lot of broken friendships. Don’t confuse friendship with leading with a soft touch: when you hold power over someone’s career, they might feel pressure to artificially reciprocate gestures of <span class="keep-together">friendship.</span></p>
|
||||
|
||||
<p>Remember that you can lead a team and build consensus without being a close friend of your team (or a monumental hard-ass). Likewise, you can be a tough leader without tossing your existing friendships to the wind. We’ve found that having lunch with your team can be an effective way to stay socially connected to them without making them uncomfortable—this gives you a chance to have informal conversations outside the normal work environment.</p>
|
||||
|
||||
<p>Sometimes, it can be tricky to move into a management role over someone who has been a good friend and a peer. If the friend who is being managed is not self-managing and is not a hard worker, it can be stressful for everyone. We recommend that you avoid getting into this situation whenever possible, but if you can’t, pay extra attention to your relationship with those folks.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="antipattern_compromise_the_hiring_bar">
|
||||
<h2>Antipattern: Compromise the Hiring Bar</h2>
|
||||
|
||||
<p>Steve Jobs once said: “A people hire other A people; B people hire C people.” It’s incredibly<a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-tertiary="compromising the hiring bar" data-type="indexterm" id="id-5XcnHahkTASM"> </a> easy to<a contenteditable="false" data-primary="hiring of software engineers" data-secondary="compromising the hiring bar (antipattern)" data-type="indexterm" id="id-vecxh3hMTRS5"> </a> fall <a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-tertiary="compromising the hiring bar" data-type="indexterm" id="id-DKcWt7hoTNSG"> </a>victim to this adage, and even more so when you’re trying to hire quickly.<a contenteditable="false" data-primary="Jobs, Steve" data-type="indexterm" id="id-KWc7IkhdT1Sr"> </a> A common approach I’ve seen outside of Google is that a team needs to hire 5 engineers, so it sifts through a pile of applications, interviews 40 or 50 people, and picks the best 5 candidates regardless of whether they meet the hiring bar.</p>
|
||||
|
||||
<p>This is one of the fastest ways to build a mediocre team.</p>
|
||||
|
||||
<p>The cost of finding the appropriate person—whether by paying recruiters, paying for advertising, or pounding the pavement for references—pales in comparison to the cost of dealing with an employee who you never should have hired in the first place. This “cost” manifests itself in lost team productivity, team stress, time spent managing the employee up or out, and the paperwork and stress involved in firing the employee. That’s assuming, of course, that you try to avoid the monumental cost of just leaving them on the team. If you’re managing a team for which you don’t have a say over hiring and you’re unhappy with the hires being made for your team, you need to fight tooth and nail for higher-quality engineers. If you’re still handed substandard engineers, maybe it’s time to look for another job. Without the raw materials for a great team, you’re doomed.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="antipattern_treat_your_team_like_childr">
|
||||
<h2>Antipattern: Treat Your Team Like Children</h2>
|
||||
|
||||
<p>The best way to show<a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-tertiary="treating your team like children" data-type="indexterm" id="id-vec3H3hPuRS5"> </a> your team that<a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-tertiary="treating your team like children" data-type="indexterm" id="id-DKclh7h9uNSG"> </a> you don’t trust it is to treat team members like kids—people tend to act the way you treat them, so <a contenteditable="false" data-primary="trust" data-secondary="treating your team like children (antipattern)" data-type="indexterm" id="id-KWcVtkhqu1Sr"> </a>if you treat them like children or prisoners, don’t be surprised when that’s how they behave. You can manifest this behavior by micromanaging them or simply by being disrespectful of their abilities and giving them no opportunity to be responsible for their work. If it’s permanently necessary to micromanage people because you don’t trust them, you have a hiring failure on your hands. Well, it’s a failure unless your goal was to build a team that you can spend the rest of your life babysitting. If you hire people worthy of trust and show these people you trust them, they’ll usually rise to the occasion (sticking with the basic premise, as we mentioned earlier, that you’ve hired good people).</p>
|
||||
|
||||
<p>The results of this level of trust go all the way to more mundane things like office and computer supplies. As another example, Google provides employees with cabinets stocked with various and sundry office supplies (e.g., pens, notebooks, and other “legacy” implements of creation) that are free to take as employees need them. The IT department runs numerous “Tech Stops” that provide self-service areas that are like a mini electronics store. These contain lots of computer accessories and doodads (power supplies, cables, mice, USB drives, etc.) that would be easy to just grab and walk off with en masse, but because Google employees are being trusted to check these items out, they feel a responsibility to Do The Right Thing. Many people from typical corporations react in horror to hearing this, exclaiming that surely Google is hemorrhaging money due to people “stealing” these items. That’s certainly possible, but what about the costs of having a workforce that behaves like children or that has to waste valuable time formally requesting cheap office supplies? Surely<a contenteditable="false" data-primary="managers and tech leads" data-secondary="antipatterns" data-startref="ix_mgrTLAP" data-type="indexterm" id="id-DKc0Hxt9uNSG"> </a> that’s more <a contenteditable="false" data-primary="leading a team" data-secondary="antipatterns" data-startref="ix_leadAP" data-type="indexterm" id="id-KWcJhGtqu1Sr"> </a>expensive than the price of a few pens and USB cables. </p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="positive_patterns">
|
||||
<h1>Positive Patterns</h1>
|
||||
|
||||
<p>Now <a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-type="indexterm" id="ix_mgrTLPP"> </a>that we’ve covered antipatterns, let’s turn to <a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-type="indexterm" id="ix_leadPP"> </a>positive patterns for successful leadership and management that we’ve learned from our experiences at Google, from watching other successful leaders and, most of all, from our own leadership mentors. These patterns are not only those that we’ve had great success implementing, but the patterns that we’ve always respected the most in the leaders who we follow.</p>
|
||||
|
||||
<section data-type="sect2" id="lose_the_ego-id00072">
|
||||
<h2>Lose the Ego</h2>
|
||||
|
||||
<p>We talked about “losing the ego” a few chapters ago when we first examined humility, respect, and<a contenteditable="false" data-primary="ego, losing" data-type="indexterm" id="id-ddcwHMhAtycP"> </a> trust, but it’s <a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="losing the ego" data-type="indexterm" id="id-agcBhmhqtxc1"> </a>especially important<a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="losing the ego" data-type="indexterm" id="id-LJc5tQhXtqce"> </a> when you’re a team leader. This pattern is frequently misunderstood as encouraging people to be doormats and let others walk all over them, but that’s not the case at all. Of course, there’s a fine line between being humble and letting others take advantage of you, but humility is not the same as lacking confidence. You can still have self-confidence and opinions without being an egomaniac. Big personal egos are difficult to handle on any team, especially in the team’s leader. Instead, you should work to cultivate a strong collective team ego and identity.</p>
|
||||
|
||||
<p>Part of “losing the ego” is trust: you need to trust your team. <a contenteditable="false" data-primary="trust" data-secondary="trusting your team and losing the ego" data-type="indexterm" id="id-agcvHKtqtxc1"> </a>That means respecting the abilities and prior accomplishments of the team members, even if they’re new to your team.</p>
|
||||
|
||||
<p>If you’re not micromanaging your team, you can be pretty certain the folks working in the trenches know the details of their work better than you do. This means that although you might be the one driving the team to consensus and helping to set the direction, the nuts and bolts of how to accomplish your goals are best decided by the people who are putting the product together. This gives them not only a greater sense of ownership, but also a greater sense of accountability and responsibility for the success (or failure) of their product. If you have a good team and you let it set the bar for the quality and rate of its work, it will accomplish more than by you standing over team members with a carrot and a stick.</p>
|
||||
|
||||
<p>Most people new to a leadership role feel an enormous responsibility to get everything right, to know everything, and to have all the answers. We can assure you that you will not get everything right, nor will you have all the answers, and if you act like you do, you’ll quickly lose the respect of your team. A lot of this comes down to having a basic sense of security in your role. Think back to when you were an individual contributor; you could smell insecurity a mile away. Try to appreciate inquiry: when someone questions a decision or statement you made, remember that this person is usually just trying to better understand you. If you encourage inquiry, you’re much more likely to get the kind of constructive criticism that will make you a better leader of a better team. Finding people who will give you good constructive criticism is incredibly difficult, and it’s even more difficult to get this kind of criticism from people who “work for you.” Think about the big picture of what you’re trying to accomplish as a team, and accept feedback and criticism openly; avoid the urge to be territorial.</p>
|
||||
|
||||
<p>The last part <a contenteditable="false" data-primary="apologizing for mistakes" data-type="indexterm" id="id-vec3HoC8tgc5"> </a>of losing the ego is a simple one, but many engineers would rather be boiled in oil than do it: apologize when you make a mistake. And we don’t mean you should just sprinkle “I’m sorry” throughout your conversation like salt on popcorn—you need to sincerely mean it. You are absolutely going to make mistakes, and whether or not you admit it, your team is going to know you’ve made a mistake. Your team members will know regardless of whether they talk to you (and one thing is guaranteed: they <em>will</em> talk about it with one another). Apologizing doesn’t cost money. People have enormous respect for leaders who apologize when they screw up, and contrary to popular belief, apologizing doesn’t make you vulnerable. In fact, you’ll usually gain respect from people when you apologize, because apologizing tells people that you are level headed, good at assessing situations, and—coming back to humility, respect, and trust—humble.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="be_a_zen_master">
|
||||
<h2>Be a Zen Master</h2>
|
||||
|
||||
<p>As an engineer, you’ve likely developed an excellent sense of skepticism and cynicism, but <a contenteditable="false" data-primary="Zen master, being" data-type="indexterm" id="id-agcvHmhWIxc1"> </a>this can be <a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="being a Zen master" data-type="indexterm" id="id-LJcRhQhoIqce"> </a>a liability when you’re trying to lead a team.<a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="being a Zen master" data-type="indexterm" id="id-5Xc7tahGIdcM"> </a> This is not to say that you should be naively optimistic at every turn, but you would do well to be less vocally skeptical while still letting your team know you’re aware of the intricacies and obstacles involved in your work. Mediating your reactions and maintaining your calm is more important as you lead more people, because your team will (both unconsciously and consciously) look to you for clues on how to act and react to whatever is going on around you.</p>
|
||||
|
||||
<p>A simple way to visualize this effect is to see your company’s organization chart as a chain of gears, with the individual contributor as a tiny gear with just a few teeth all the way at one end, and each successive manager above them as another gear, ending with the CEO as the largest gear with many hundreds of teeth. This means that every time that individual’s “manager gear” (with maybe a few dozen teeth) makes a single revolution, the “individual’s gear” makes two or three revolutions. And the CEO can make a small movement and send the hapless employee, at the end of a chain of six or seven gears, spinning wildly! The farther you move up the chain, the faster you can set the gears below you spinning, whether or not you intend to.</p>
|
||||
|
||||
<p>Another way of thinking about this is the maxim that the leader is always on stage. This means that if you’re in an overt leadership position, you are always being watched: not just when you run a meeting or give a talk, but even when you’re just sitting at your desk answering emails. Your peers are watching you for subtle clues in your body language, your reactions to small talk, and your signals as you eat lunch. Do they read confidence or fear? As a leader, your job is to inspire, but inspiration is a 24/7 job. Your visible attitude about absolutely everything—no matter how trivial—is unconsciously noticed and spreads infectiously to your team.</p>
|
||||
|
||||
<p>One of the early managers at Google, Bill Coughran, a VP of engineering, had truly mastered the ability to maintain calm at all times. No matter what blew up, no matter what crazy thing happened, no matter how big the firestorm, Bill would never panic. Most of the time he’d place one arm across his chest, rest his chin in his hand, and ask questions about the problem, usually to a completely panicked engineer. This had the effect of calming them and helping them to focus on solving the problem instead of running around like a chicken with its head cut off. Some of us used to joke that if someone came in and told Bill that 19 of the company’s offices had been attacked by space aliens, Bill’s response would be, “Any idea why they didn’t make it an even 20?”</p>
|
||||
|
||||
<p>This brings us to another Zen management trick: asking questions. <a contenteditable="false" data-primary="asking questions" data-secondary="Zen management technique" data-type="indexterm" id="id-DKc0HOCLIxcG"> </a>When a team member asks you for advice, it’s usually pretty exciting because you’re finally getting the chance to fix something. That’s exactly what you did for years before moving into a leadership position, so you usually go leaping into solution mode, but that is the last place you should be. The person asking for advice typically doesn’t want <em>you</em> to solve their problem, but rather to help them solve it, and the easiest way to do this is to ask this person questions. This isn’t to say that you should replace yourself with a Magic 8 Ball, which would be maddening and unhelpful. Instead, you can apply some humility, respect, and trust and try to help the person solve the problem on their own by trying to refine and explore the problem. This will usually lead the employee to the answer,<sup><a data-type="noteref" id="ch01fn62-marker" href="ch05.html#ch01fn62">6</a></sup> and it will be that person’s answer, which leads back to the ownership and responsibility we went over earlier in this chapter. Whether or not you have the answer, using this technique will almost always leave the employee with the impression that you did. Tricky, eh? Socrates would be proud of you.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="be_a_catalyst">
|
||||
<h2>Be a Catalyst</h2>
|
||||
|
||||
<p>In chemistry, a catalyst is something that accelerates a chemical reaction, but which itself is not consumed in the reaction.<a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="being a catalyst" data-type="indexterm" id="id-LJcgHQh2fqce"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="being a catalyst" data-type="indexterm" id="id-5Xc1hahEfdcM"> </a><a contenteditable="false" data-primary="catalyst, being" data-type="indexterm" id="id-vecPt3hWfgc5"> </a> One of the ways in which catalysts (e.g., enzymes) work is to bring reactants into close proximity: instead of bouncing around randomly in a solution, the reactants are much more likely to favorably interact with one another when the catalyst helps bring them together. This is a role you’ll often need to play as a leader, and there are a number of ways you can go about it.</p>
|
||||
|
||||
<p>One of the most common things a team leader does is to build consensus.<a contenteditable="false" data-primary="consensus, building" data-type="indexterm" id="id-5XcnH9tEfdcM"> </a> This might mean that you drive the process from start to finish, or you just give it a gentle push in the right direction to speed it up. Working to build team consensus is a leadership skill that is often used by unofficial leaders because it’s one way you can lead without any actual authority. If you have the authority, you can direct and dictate direction, but that’s less effective overall than building consensus.<sup><a data-type="noteref" id="ch01fn63-marker" href="ch05.html#ch01fn63">7</a></sup> If your team is looking to move quickly, sometimes it will voluntarily concede authority and direction to one or more team leads. Even though this might look like a dictatorship or oligarchy, when it’s done voluntarily, it’s a form of consensus.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="remove_roadblocks">
|
||||
<h2>Remove Roadblocks</h2>
|
||||
|
||||
<p>Sometimes, your team already has consensus about what you need to do, but it hit a roadblock and became stuck.<a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="removing roadblocks" data-type="indexterm" id="id-5XcnHahrCdcM"> </a><a contenteditable="false" data-primary="roadblocks, removing" data-type="indexterm" id="id-vecxh3hACgc5"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="removing roadblocks" data-type="indexterm" id="id-DKcWt7h2CxcG"> </a> This could be a technical or organizational roadblock, but jumping in to help the team get moving again is a common leadership technique. There are some roadblocks that, although virtually impossible for your team members to get past, will be easy for you to handle, and helping your team understand that you’re glad (and able) to help out with these roadblocks is valuable.</p>
|
||||
|
||||
<p>One time, a team spent several weeks trying to work past an obstacle with Google’s legal department. When the team finally reached its collective wits’ end and went to its manager with the problem, the manager had it solved in less than two hours simply because he knew the right person to contact to discuss the matter. Another time, a team needed some server resources and just couldn’t get them allocated. Fortunately, the team’s manager was in communication with other teams across the company and managed to get the team exactly what it needed that very afternoon. Yet another time, one of the engineers was having trouble with an arcane bit of Java code. Although the team’s manager was not a Java expert, she was able to connect the engineer to another engineer who knew exactly what the problem was. You don’t need to know all the answers to help remove roadblocks, but it usually helps to know the people who do. In many cases, knowing the right person is more valuable than knowing the right answer.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="be_a_teacher_and_a_mentor">
|
||||
<h2>Be a Teacher and a Mentor</h2>
|
||||
|
||||
<p>One of the most difficult things to do as a TL is to watch a more junior team member spend 3 hours working on something that you know you can knock out in 20 minutes.<a contenteditable="false" data-primary="teacher and mentor, being" data-type="indexterm" id="id-vec3H3hMTgc5"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="being a teacher and mentor" data-type="indexterm" id="id-DKclh7hoTxcG"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="being a teacher and mentor" data-type="indexterm" id="id-KWcVtkhdTXcr"> </a> Teaching people and giving them a chance to learn on their own can be incredibly difficult at first, but it’s a vital component of effective leadership.<a contenteditable="false" data-primary="mentorship" data-secondary="being a teacher and mentor for your team" data-type="indexterm" id="id-xbcXIAh2Tdcv"> </a> This is especially important for new hires who, in addition to learning your team’s technology and codebase, are learning your team’s culture and the appropriate level of responsibility to assume. A good mentor must balance the trade-offs of a mentee’s time learning versus their time contributing to their product as part of an effective effort to scale the team as it grows.</p>
|
||||
|
||||
<p>Much like the role of manager, most people don’t apply for the role of mentor—they usually become one when a leader is looking for someone to mentor a new team member. It doesn’t take a lot of formal education or preparation to be a mentor. Primarily, you need three things: experience with your team’s processes and systems, the ability to explain things to someone else, and the ability to gauge how much help your mentee needs. The last thing is probably the most important—giving your mentee enough information is what you’re supposed to be doing, but if you overexplain things or ramble on endlessly, your mentee will probably tune you out rather than politely tell you they got it.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="set_clear_goals">
|
||||
<h2>Set Clear Goals</h2>
|
||||
|
||||
<p>This is one of those patterns that, as obvious as it sounds, is solidly ignored by an enormous number of leaders. <a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="setting clear goals" data-type="indexterm" id="id-DKc0H7h9uxcG"> </a><a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="setting clear goals" data-type="indexterm" id="id-KWcJhkhquXcr"> </a><a contenteditable="false" data-primary="goals" data-secondary="team leader setting clear goals" data-type="indexterm" id="id-xbc3tAhqudcv"> </a>If you’re going to get your team moving rapidly in one direction, you need to make sure that every team member understands and agrees on what the direction is. Imagine your product is a big truck (and not a series of tubes). Each team member has in their hand a rope tied to the front of the truck, and as they work on the product, they’ll pull the truck in their own direction. If your intention is to pull the truck (or product) northbound as quickly as possible, you can’t have team members pulling every which way—you want them all pulling the truck north. If you’re going to have clear goals, you need to set clear priorities and help your team decide how it should make trade-offs when the time comes.</p>
|
||||
|
||||
<p>The easiest way to set a clear goal and get your team pulling the product in the same direction is to create a concise mission statement for the team. After you’ve helped the team define its direction and goals, you can step back and give it more autonomy, periodically checking in to make sure everyone is still on the right track. This not only frees up your time to handle other leadership tasks, it also drastically increases the efficiency of your team. Teams can (and do) succeed without clear goals, but they typically waste a great deal of energy as each team member pulls the product in a slightly different direction. This frustrates you, slows progress for the team, and forces you to use more and more of your own energy to correct the course.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="be_honest">
|
||||
<h2>Be Honest</h2>
|
||||
|
||||
<p>This doesn’t mean that we’re assuming you are lying to your team, but it merits a mention because <a contenteditable="false" data-primary="honesty, being honest with your team" data-type="indexterm" id="id-KWcaHkhDUXcr"> </a>you’ll inevitably <a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="being honest" data-type="indexterm" id="id-xbcKhAheUdcv"> </a>find yourself in a <a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="being honest" data-type="indexterm" id="id-nBcet5hdUpcB"> </a>position in which you can’t tell your team something or, even worse, you need to tell everyone something they don’t want to hear. One manager we know tells new team members, “I won’t lie to you, but I will tell you when I can’t tell you something or if I just don’t know.”</p>
|
||||
|
||||
<p>If a team member approaches you about something you can’t share, it’s OK to just tell them you know the answer but are not at liberty to say anything. Even more common is when a team member asks you something you don’t know the answer to: you can tell that person you don’t know. This is another one of those things that seems blindingly obvious when you read it, but many people in a manager role feel that if they don’t know the answer to something, it proves that they’re weak or out of touch. In reality, the only thing it proves is that they’re human.</p>
|
||||
|
||||
<p>Giving hard feedback is…well, hard.<a contenteditable="false" data-primary="feedback" data-secondary="giving hard feedback to team members" data-type="indexterm" id="id-nBcdHXIdUpcB"> </a> The first time you need to tell one of your reports that they made a mistake or didn’t do their job as well as expected can be incredibly stressful. Most management texts advise that you use the “compliment sandwich” to soften the blow when delivering hard feedback. A compliment sandwich looks something like this:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>You’re a solid member of the team and one of our smartest engineers. That being said, your code is convoluted and almost impossible for anyone else on the team to understand. But you’ve got great potential and a wicked cool T-shirt collection.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>Sure, this softens the blow, but with this sort of beating around the bush, most people will walk out of this meeting thinking only, “Sweet! I’ve got cool T-shirts!” We <em>strongly</em> advise against using the compliment sandwich, not because we think you should be unnecessarily cruel or harsh, but because most people won’t hear the critical message, which is that something needs to change. It’s possible to employ respect here: be kind and empathetic when delivering constructive criticism without resorting to the compliment sandwich. In fact, kindness and empathy are critical if you want the recipient to hear the criticism and not immediately go on the defensive.</p>
|
||||
|
||||
<p>Years ago, a colleague picked up a team member, Tim, from another manager who insisted that Tim was impossible to work with. He said that Tim never responded to feedback or criticism and instead just kept doing the same things he’d been told he shouldn’t do. Our colleague sat in on a few of the manager’s meetings with Tim to watch the interaction between the manager and Tim, and noticed that the manager made extensive use of the compliment sandwich so as not to hurt Tim’s feelings. When they brought Tim onto their team, they sat down with him and very clearly explained that Tim needed to make some changes to work more effectively with the team:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>We’re quite sure that you’re not aware of this, but the way that you’re interacting with the team is alienating and angering them, and if you want to be effective, you need to refine your communication skills, and we’re committed to helping you do that.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>They didn’t give Tim any compliments or candy-coat the issue, but just as important, they weren’t mean—they just laid out the facts as they saw them based on Tim’s performance with the previous team. Lo and behold, within a matter of weeks (and after a few more “refresher” meetings), Tim’s performance improved dramatically. Tim just needed very clear feedback and direction.</p>
|
||||
|
||||
<p>When you’re providing direct feedback or criticism, your delivery is key to making sure that your message is heard and not deflected. If you put the recipient on the defensive, they’re not going to be thinking of how they can change, but rather how they can argue with you to show you that you’re wrong. Our colleague Ben once managed an engineer who we’ll call Dean. Dean had extremely strong opinions and would argue with the rest of the team about anything. It could be something as big as the team’s mission or as small as the placement of a widget on a web page; Dean would argue with the same conviction and vehemence either way, and he refused to let anything slide. After months of this behavior, Ben met with Dean to explain to him that he was being too combative. Now, if Ben had just said, “Dean, stop being such a jerk,” you can be pretty sure Dean would have disregarded it entirely. Ben thought hard about how he could get Dean to understand how his actions were adversely affecting the team, and he came up with the following metaphor:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>Every time a decision is made, it’s like a train coming through town—when you jump in front of the train to stop it, you slow the train down and potentially annoy the engineer driving the train. A new train comes by every 15 minutes, and if you jump in front of every train, not only do you spend a lot of your time stopping trains, but eventually one of the engineers driving the train is going to get mad enough to run right over you. So, although it’s OK to jump in front of some trains, pick and choose the ones you want to stop to make sure you’re stopping only the trains that really matter.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>This anecdote not only injected a bit of humor into the situation, but also made it easier for Ben and Dean to discuss the effect that Dean’s “train stopping” was having on the team in addition to the energy Dean was spending on it.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="track_happiness">
|
||||
<h2>Track Happiness</h2>
|
||||
|
||||
<p>As a leader, one way you can make your team more productive (and less likely to leave) in the <a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-tertiary="tracking happiness" data-type="indexterm" id="id-xbc1HAhDSdcv"> </a>long term is to take <a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-tertiary="tracking happiness" data-type="indexterm" id="id-nBc2h5hJSpcB"> </a>some time to gauge their happiness.<a contenteditable="false" data-primary="happiness, tracking for your team" data-type="indexterm" id="id-A5c1t2hxSWcg"> </a> The best leaders we’ve worked with have all been amateur psychologists, looking in on their team members’ welfare from time to time, making sure they get recognition for what they do, and trying to make certain they are happy with their work. One TLM we know makes a spreadsheet of all the grungy, thankless tasks that need to be done and makes certain these tasks are evenly spread across the team. Another TLM watches the hours his team is working and uses comp time and fun team outings to avoid burnout and exhaustion. Yet another starts one-on-one sessions with his team members by dealing with their technical issues as a way to break the ice, and then takes some time to make sure each engineer has everything they need to get their work done. After they’ve warmed up, he talks to the engineer for a bit about how they’re enjoying the work and what they’re looking forward to next.</p>
|
||||
|
||||
<p>A good simple way to track your team’s happiness<sup><a data-type="noteref" id="ch01fn64-marker" href="ch05.html#ch01fn64">8</a></sup> is to ask the team member at the end of each one-on-one meeting, “What do you need?” This simple question is a great way to wrap up and make sure each team member has what they need to be productive and happy, although you might need to carefully probe a bit to get details. If you ask this every time you have a one-on-one, you’ll find that eventually your team will remember this and sometimes even come to you with a laundry list of things it needs to make everyone’s job better.<a contenteditable="false" data-primary="leading a team" data-secondary="positive patterns" data-startref="ix_leadPP" data-type="indexterm" id="id-A5ckhMtxSWcg"> </a><a contenteditable="false" data-primary="managers and tech leads" data-secondary="positive patterns" data-startref="ix_mgrTLPP" data-type="indexterm" id="id-Bvc2tytgS3c3"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="the_unexpected_question">
|
||||
<h1>The Unexpected Question</h1>
|
||||
|
||||
<p>Shortly after I started at Google, I had my first meeting<a contenteditable="false" data-primary="leading a team" data-secondary="asking team members if they need anything" data-type="indexterm" id="id-NqcvH0hpsr"> </a> with then-CEO Eric Schmidt, and at<a contenteditable="false" data-primary="asking questions" data-secondary="asking team members if they need anything" data-type="indexterm" id="id-ddc7hMh0sQ"> </a> the end Eric asked me, “Is there anything you need?” I had prepared a million defensive responses to difficult questions or challenges but was completely unprepared for this. So I sat there, dumbstruck and staring. You can be sure I had something ready the next time I was asked that question!</p>
|
||||
|
||||
<p>It can also be worthwhile as a leader to pay some attention to your team’s happiness outside the office.<a contenteditable="false" data-primary="happiness, tracking for your team" data-secondary="outside the office and in their careers" data-type="indexterm" id="id-ddcwH2t0sQ"> </a> Our colleague Mekka starts his one-on-ones by asking his reports to rate their happiness on a scale of 1 to 10, and oftentimes his reports will use this as a way to discuss happiness in <em>and</em> outside of the office. Be wary of assuming that people have no life outside of work—having unrealistic expectations about the amount of time people can put into their work will cause people to lose respect for you, or worse, to burn out. We’re not advocating that you pry into your team members’ personal lives, but being sensitive to personal situations that your team members are going through can give you a lot of insight as to why they might be more or less productive at any given time. Giving a little extra slack to a team member who is currently having a tough time at home can make them a lot more willing to put in longer hours when your team has a tight deadline to hit later.</p>
|
||||
|
||||
<p>A big part of tracking your team members’ happiness is tracking their careers.<a contenteditable="false" data-primary="careers, tracking for team members" data-type="indexterm" id="id-agcvH1INsG"> </a> If you ask a team member where they see their career in five years, most of the time you’ll get a shrug and a blank look. When put on the spot, most people won’t say much about this, but there are usually a few things that everyone would like to do in the next five years: be promoted, learn something new, launch something important, and work with smart people. Regardless of whether they verbalize this, most people are thinking about it. If you’re going to be an effective leader, you should be thinking about how you can help make all those things happen and let your team know you’re thinking about this. The most important part of this is to take these implicit goals and make them explicit so that when you’re giving career advice you have a real set of metrics with which to evaluate situations and opportunities.</p>
|
||||
|
||||
<p>Tracking happiness comes down to not just monitoring careers, but also giving your team members opportunities to improve themselves, be recognized for the work they do, and have a little fun along the way.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="other_tips_and_tricks">
|
||||
<h1>Other Tips and Tricks</h1>
|
||||
|
||||
<p>Following are other miscellaneous tips and tricks that we at Google recommend<a contenteditable="false" data-primary="leading a team" data-secondary="other tips and tricks for" data-type="indexterm" id="id-ddcwHMhBFQ"> </a> when you’re in a leadership position:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Delegate, but get your hands dirty</dt>
|
||||
<dd>When moving from an individual contributor role to a leadership role, achieving a balance is one of the most difficult things to do. Initially, you’re inclined to do all of the work yourself, and after being in a leadership role for a long time, it’s easy to get into the habit of doing none of the work yourself. If you’re new to a leadership role, you probably need to work hard to delegate work to other engineers on your team, even if it will take them a lot longer than you to accomplish that work. Not only is this one way for you to maintain your sanity, but also it’s how the rest of your team will learn. If you’ve been leading teams for a while or if you pick up a new team, one of the easiest ways to gain the team’s respect and get up to speed on what they’re doing is to get your hands dirty—usually by taking on a grungy task that no one else wants to do. You can have a resume and a list of achievements a mile long, but nothing lets a team know how skillful and dedicated (and humble) you are like jumping in and actually doing some hard work.</dd>
|
||||
<dt>Seek to replace yourself</dt>
|
||||
<dd>Unless you want to keep doing the exact same job for the rest of your career, seek to replace yourself. This starts, as we mentioned earlier, with the hiring process: if you want a member of your team to replace you, you need to hire people capable of replacing you, which we usually sum up by saying that you need to “hire people smarter than you.” After you have team members capable of doing your job, you need to give them opportunities to take on more responsibilities or occasionally lead the team. If you do this, you’ll quickly see who has the most aptitude to lead as well as who wants to lead the team. Remember that some people prefer to just be high-performing individual contributors, and that’s OK. We’ve always been amazed at companies that take their best engineers and—against their wishes—throw these engineers into management roles. This usually subtracts a great engineer from your team and adds a subpar manager.</dd>
|
||||
<dt>Know when to make waves</dt>
|
||||
<dd>You will (inevitably and frequently) have difficult situations crop up in which every cell in your body is screaming at you to do nothing about it. It might be the engineer on your team whose technical chops aren’t up to par. It might be the person who jumps in front of every train. It might be the unmotivated employee who is working 30 hours a week. “Just wait a bit and it will get better,” you’ll tell yourself. “It will work itself out,” you’ll rationalize. Don’t fall into this trap—these are the situations for which you need to make the biggest waves and you need to make them now. Rarely will these problems work themselves out, and the longer you wait to address them, the more they’ll adversely affect the rest of the team and the more they’ll keep you up at night thinking about them. By waiting, you’re only delaying the inevitable and causing untold damage in the process. So act, and act quickly.</dd>
|
||||
<dt>Shield your team from chaos</dt>
|
||||
<dd>When you step into a leadership role, the first thing you’ll usually discover is that outside your team is <a contenteditable="false" data-primary="chaos and uncertainty, shielding your team from" data-type="indexterm" id="id-nBcdH0uptoFB"> </a>a world of chaos and uncertainty (or even insanity) that you never saw when you were an individual contributor. When I first became a manager back in the 1990s (before going back to being an individual contributor), I was taken aback by the sheer volume of uncertainty and organizational chaos that was happening in my company. I asked another manager what had caused this sudden rockiness in the otherwise calm company, and the other manager laughed hysterically at my naivete: the chaos had always been present, but my previous manager had shielded me and the rest of my team from it.</dd>
|
||||
<dt>Give your team air cover</dt>
|
||||
<dd>Whereas it’s important that you keep your team informed about what’s going on “above” them in the company, it’s just as important that you defend them from a lot of the uncertainty and frivolous demands that can be imposed upon you from outside your team. Share as much information as you can with your team, but don’t distract them with organizational craziness that is extremely unlikely to ever actually affect them.</dd>
|
||||
<dt>Let your team know when they’re doing well</dt>
|
||||
<dd>Many new team leads can get so caught up in dealing with the shortcomings of their team members that they neglect to provide positive feedback often enough. Just as you let someone know when they screw up, be sure to let them know when they do well, and be sure to let them (and the rest of the team) know when they knock one out of the park.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Lastly, here’s something the best leaders know and use often when they have adventurous team members who want to try new things:</p>
|
||||
|
||||
<dl>
|
||||
<dt>It’s easy to say “yes” to something that’s easy to undo</dt>
|
||||
<dd>If you have a team member who wants to take a day or two to try using a new tool or library<sup><a data-type="noteref" id="ch01fn65-marker" href="ch05.html#ch01fn65">9</a></sup> that could speed up your product (and you’re not on a tight deadline), it’s easy to say, “Sure, give it a shot.” If, on the other hand, they want to do something like launch a product that you’re going to have to support for the next 10 years, you’ll likely want to give it a bit more thought. Really good leaders have a good sense for when something can be undone, but more things are undoable than you think (and this applies to both technical and nontechnical decisions).</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="people_are_like_plants">
|
||||
<h1>People Are Like Plants</h1>
|
||||
|
||||
<p>My wife is the youngest of six children, and her mother was faced with the difficult task of figuring out how to raise six very different children, each of whom needed different things.<a contenteditable="false" data-primary="leading a team" data-secondary="fulfilling different needs of team members" data-type="indexterm" id="id-agcvHmhRiG"> </a> I asked my mother-in-law how she managed this (see what I did there?), and she responded that kids are like plants: some are like cacti and need little water but lots of sunshine, others are like African violets and need diffuse light and moist soil, and still others are like tomatoes and will truly excel if you give them a little fertilizer. If you have six kids and give each one the same amount of water, light, and fertilizer, they’ll all get equal treatment, but the odds are good that <em>none</em> of them will get what they actually need.</p>
|
||||
|
||||
<p>And so your team members are also like plants: some need more light, and some need more water (and some need more…fertilizer). It’s your job as their leader to determine who needs what and then give it to them—except instead of light, water, and fertilizer, your team needs varying amounts of motivation and direction.</p>
|
||||
|
||||
<p>To get all of your team members what they need, you need to motivate the ones who are in a rut and provide stronger direction to those who are distracted or uncertain of what to do. <a contenteditable="false" data-primary="motivating your team" data-type="indexterm" id="id-5XcnHoIliN"> </a>Of course, there are those who are “adrift” and need both motivation and direction. So, with this combination of motivation and direction, you can make your team happy and productive. And you don’t want to give them too much of either—because if they don’t need motivation or direction and you try giving it to them, you’re just going to annoy them.</p>
|
||||
|
||||
<p class="pagebreak-before">Giving<a contenteditable="false" data-primary="direction, giving to team members" data-type="indexterm" id="id-vec3HgfNi2"> </a> direction is fairly straightforward—it requires a basic understanding of what needs to be done, some simple organizational skills, and enough coordination to break it down into manageable tasks. With these tools in hand, you can provide sufficient guidance for an engineer in need of directional help. Motivation, however, is a bit more sophisticated and merits some explanation.</p>
|
||||
|
||||
<section data-type="sect2" id="intrinsic_versus_extrinsic_motivation">
|
||||
<h2>Intrinsic Versus Extrinsic Motivation</h2>
|
||||
|
||||
<p>There are two types<a contenteditable="false" data-primary="extrinsic versus intrinsic motivation" data-type="indexterm" id="id-KWcaHkhnCVir"> </a> of motivation: <em>extrinsic</em>, which originates<a contenteditable="false" data-primary="intrinsic versus extrinsic motivation" data-type="indexterm" id="id-nBcet5hyCWiB"> </a> from outside forces (such as monetary compensation), and <em>intrinsic</em>, which<a contenteditable="false" data-primary="leading a team" data-secondary="fulfilling different needs of team members" data-tertiary="motivation" data-type="indexterm" id="id-BvcbfYhwCGi3"> </a> comes from within.<a contenteditable="false" data-primary="motivating your team" data-secondary="intrinsic vs. extrinsic motivation" data-type="indexterm" id="id-w0c0C3hECXiv"> </a> In his book <em>Drive</em>,<sup><a data-type="noteref" id="ch01fn66-marker" href="ch05.html#ch01fn66">10</a></sup> Dan Pink explains that the way to make people the happiest and most productive isn’t to motivate them extrinsically (e.g., throw piles of cash at them); rather, you need to work to increase their intrinsic motivation. Dan claims you can increase intrinsic motivation by giving people three things: autonomy, mastery, and purpose.<sup><a data-type="noteref" id="ch01fn67-marker" href="ch05.html#ch01fn67">11</a></sup></p>
|
||||
|
||||
<p>A person<a contenteditable="false" data-primary="autonomy for team members" data-type="indexterm" id="id-xbc1HgtyCbiv"> </a> has autonomy when they have the ability to act on their own without someone micromanaging them.<sup><a data-type="noteref" id="ch01fn68-marker" href="ch05.html#ch01fn68">12</a></sup> With autonomous employees (and Google strives to hire mostly autonomous engineers), you might give them the general direction in which they need to take the product but leave it up to them to decide how to get there. This helps with motivation not only because they have a closer relationship with the product (and likely know better than you how to build it), but also because it gives them a much greater sense of ownership of the product. The bigger their stake is in the success of the product, the greater their interest is in seeing it succeed.</p>
|
||||
|
||||
<p>Mastery in its basest form simply means that you need to give someone the opportunity <a contenteditable="false" data-primary="mastery for team members" data-type="indexterm" id="id-nBcdHXIyCWiB"> </a>to improve existing skills and learn new ones. Giving ample opportunities for mastery not only helps to motivate people, it also makes them better over time, which makes for stronger teams.<sup><a data-type="noteref" id="ch01fn69-marker" href="ch05.html#ch01fn69">13</a></sup> An employee’s skills are like the blade of a knife: you can spend tens of thousands of dollars to find people with the sharpest skills for your team, but if you use that knife for years without sharpening it, you will wind up with a dull knife that is inefficient, and in some cases useless. Google gives ample opportunities for engineers to learn new things and master their craft so as to keep them sharp, efficient, and effective.</p>
|
||||
|
||||
<p class="pagebreak-before">Of course, all the autonomy and mastery in the world isn’t going to help motivate someone if they’re doing work for no reason at all, which is why you need to give their work purpose.<a contenteditable="false" data-primary="purpose for team members" data-type="indexterm" id="id-A5cdHqf0Caig"> </a> Many people work on products that have great significance, but they’re kept at arm’s length from the positive effects their products might have on their company, their customers, or even the world. Even for cases in which the product might have a much smaller impact, you can motivate your team by seeking the reason for their efforts and making this reason clear to them. If you can help them to see this purpose in their work, you’ll see a tremendous increase in their motivation and productivity.<sup><a data-type="noteref" id="ch01fn70-marker" href="ch05.html#ch01fn70">14</a></sup> One manager we know keeps a close eye on the email feedback that Google gets for its product (one of the “smaller-impact” products), and whenever she sees a message from a customer talking about how the company’s product has helped them personally or helped their business, she immediately forwards it to the engineering team. This not only motivates the team, but also frequently inspires team members to think about ways in which they can make their product even better.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00009">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Leading a team is a different task than that of being a software engineer. As a result, good software engineers do not always make good managers, and that’s OK—<span class="keep-together">effective</span> organizations allow productive career paths for both individual contributors and people managers. Although Google has found that software engineering experience itself is invaluable for managers, the most important skills an effective manager brings to the table are social ones. Good managers enable their engineering teams by helping them work well, keeping them focused on proper goals, and insulating them from problems outside the group, all while following the three pillars of humility, trust, and respect.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Don’t "manage" in the traditional sense; focus on leadership, influence, and serving your team.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Delegate where possible; don’t DIY (Do It Yourself).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Pay particular attention to the focus, direction, and velocity of <a contenteditable="false" data-primary="leading a team" data-startref="ix_lead" data-type="indexterm" id="id-DKc0HQHptOh6h8"> </a>your team.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn57"><sup><a href="ch05.html#ch01fn57-marker">1</a></sup>Another difference that takes getting used to is that the things we do as managers typically pay off over a longer timeline.</p><p data-type="footnote" id="ch01fn58"><sup><a href="ch05.html#ch01fn58-marker">2</a></sup>Yet another reason companies shouldn’t force people into management as part of a career path: if an engineer is able to write reams of great code and has no desire at all to manage people or lead a team, by forcing them into a management or TL role, you’re losing a great engineer and gaining a crappy manager. This is not only a bad idea, but it’s actively harmful.</p><p data-type="footnote" id="ch01fn59"><sup><a href="ch05.html#ch01fn59-marker">3</a></sup>For more fascinating information on optimizing the movements of factory workers, read up on Scientific Management or Taylorism, especially its effects on worker morale.</p><p data-type="footnote" id="ch01fn60"><sup><a href="ch05.html#ch01fn60-marker">4</a></sup>If you have kids, the odds are good that you can remember with startling clarity the first time you said something to your child that made you stop and exclaim (perhaps even aloud), “Holy crap, I’ve become my mother.”</p><p data-type="footnote" id="ch01fn61"><sup><a href="ch05.html#ch01fn61-marker">5</a></sup>Public criticism of an individual is not only ineffective (it puts people on the defense), but rarely necessary, and most often is just mean or cruel. You can be sure the rest of the team already knows when an individual has failed, so there’s no need to rub it in.</p><p data-type="footnote" id="ch01fn62"><sup><a href="ch05.html#ch01fn62-marker">6</a></sup>See also “<a href="https://oreil.ly/BKkvk">Rubber duck debugging</a>.”</p><p data-type="footnote" id="ch01fn63"><sup><a href="ch05.html#ch01fn63-marker">7</a></sup>Attempting to achieve 100% consensus can also be harmful. You need to be able to decide to proceed even if not everyone is on the same page or there is still some uncertainty.</p><p data-type="footnote" id="ch01fn64"><sup><a href="ch05.html#ch01fn64-marker">8</a></sup>Google also runs an annual employee survey called “Googlegeist” that rates employee happiness across many dimensions. This provides good feedback but isn’t what we would call “simple.”</p><p data-type="footnote" id="ch01fn65"><sup><a href="ch05.html#ch01fn65-marker">9</a></sup>To gain a better understanding of just how “undoable” technical changes can be, see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>.</p><p data-type="footnote" id="ch01fn66"><sup><a href="ch05.html#ch01fn66-marker">10</a></sup>See <a href="https://oreil.ly/5SDQS">Dan’s fantastic TED talk</a> on this subject.</p><p data-type="footnote" id="ch01fn67"><sup><a href="ch05.html#ch01fn67-marker">11</a></sup>This assumes that the people in question are being paid well enough that income is not a source of stress.</p><p data-type="footnote" id="ch01fn68"><sup><a href="ch05.html#ch01fn68-marker">12</a></sup>This assumes that you have people on your team who don’t need micromanagement.</p><p data-type="footnote" id="ch01fn69"><sup><a href="ch05.html#ch01fn69-marker">13</a></sup>Of course, it also means they’re more valuable and marketable employees, so it’s easier for them to pick up and leave you if they’re not enjoying their work. See the pattern in <a data-type="xref" href="ch05.html#track_happiness">Track Happiness</a>.</p><p data-type="footnote" id="ch01fn70"><sup><a href="ch05.html#ch01fn70-marker">14</a></sup>Adam M. Grant, “The Significance of Task Significance: Job Performance Effects, Relational Mechanisms, and Boundary Conditions,” <em>Journal of Applied Psychology</em>, 93, No. 1 (2018), <a href="https://bit.ly/task_significance"><em class="hyperlink">http://bit.ly/task_significance</em></a>.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
308
clones/abseil.io/resources/swe-book/html/ch06.html
Normal file
|
@ -0,0 +1,308 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="leading_at_scale">
|
||||
<h1>Leading at Scale</h1>
|
||||
|
||||
<p class="byline">Written by Ben Collins-Sussman</p>
|
||||
|
||||
<p class="byline">Edited by Riona MacNamara</p>
|
||||
|
||||
<p>In <a class="xref drop-element-attached-top drop-element-attached-center drop-target-attached-bottom drop-target-attached-center" data-type="xref" href="ch05.html#how_to_lead_a_team">How to Lead a Team</a>, we talked about what it means to go from being an “individual contributor” to being an explicit leader of a team. <a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-type="indexterm" id="ix_leadsc"> </a>It’s a natural progression to go from leading one team to leading a set of related teams, and this chapter talks about how to be effective as you continue along the path of engineering leadership.</p>
|
||||
|
||||
<p>As your role evolves, all the best practices still apply. You’re still a “servant leader”; you’re just serving a larger group. That said, the scope of problems you’re solving becomes larger and more abstract. You’re gradually forced to become “higher level.” That is, you’re less and less able to get into the technical or engineering details of things, and you’re being pushed to go "broad" rather than "deep." At every step, this process is frustrating: you mourn the loss of these details, and you come to realize that your prior engineering expertise is becoming less and less relevant to your job. Instead, your effectiveness depends more than ever on your <em>general</em> technical intuition and ability to galvanize engineers to move in good directions.</p>
|
||||
|
||||
<p>The process is often demoralizing—until one day you notice that you’re actually having much more impact as a leader than you ever had as an individual contributor. It’s a satisfying but bittersweet realization.</p>
|
||||
|
||||
<p>So, assuming that we understand the basics of leadership, what does it take to scale yourself into a <em>really good</em> leader? That’s what we talk about here, using what we call “the three Always of leadership”: Always Be Deciding, Always Be Leaving, Always Be Scaling.</p>
|
||||
|
||||
<section data-type="sect1" id="always_be_deciding">
|
||||
<h1>Always Be Deciding</h1>
|
||||
|
||||
<p>Managing a team of teams means making ever more <a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="Always be deciding" data-type="indexterm" id="id-NmhaHGCjcq"> </a>decisions at ever-higher levels. Your job <a contenteditable="false" data-primary=" “Always of leadership”" data-primary-sortas="Always" data-secondary="Always be deciding" data-type="indexterm" id="id-A8hOC9C1cR"> </a>becomes <a contenteditable="false" data-primary="decisions" data-secondary="making at higher levels of leadership" data-type="indexterm" id="id-BrhQUnCNco"> </a>more about high-level strategy rather than how to solve any specific engineering task. At this level, most of the decisions you’ll make are about finding the correct set of trade-offs.</p>
|
||||
|
||||
<section data-type="sect2" id="the_parable_of_the_airplane">
|
||||
<h2>The Parable of the Airplane</h2>
|
||||
|
||||
<p><a href="http://lindsayjones.com">Lindsay Jones</a> is a friend of ours who is a professional theatrical sound designer and composer. <a contenteditable="false" data-primary="airplane, parable of" data-type="indexterm" id="id-pghDCRCDUGcE"> </a>He spends his life flying around the United States, hopping from production to production, and he’s full of crazy (and true) stories about air travel. Here’s one of our favorite stories:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>It’s 6 a.m., we’re all boarded on the plane and ready to go. The captain comes on the PA system and explains to us that, somehow, someone has overfilled the fuel tank by 10,000 gallons. Now, I’ve flown on planes for a long time, and I didn’t know that such a thing was possible. I mean, if I overfill my car by a gallon, I’m gonna have gas all over my shoes, right?</p>
|
||||
|
||||
<p>Well, so anyway, the captain then says that we have two options: we can either wait for the truck to come suck the fuel back out of the plane, which is going to take over an hour, or twenty people have to get off the plane right now to even out the weight.</p>
|
||||
|
||||
<p>No one moves.</p>
|
||||
|
||||
<p>Now, there’s this guy across the aisle from me in first class, and he is absolutely livid. He reminds me of Frank Burns on <em>M*A*S*H;</em> he’s just super indignant and sputtering everywhere, demanding to know who’s responsible. It’s an amazing showcase, it’s like he’s Margaret Dumont in the Marx Brothers movies.</p>
|
||||
|
||||
<p>So, he grabs his wallet and pulls out this massive wad of cash! And he’s like “I cannot be late for this meeting!! I will give $40 to any person who gets off this plane right now!”</p>
|
||||
|
||||
<p>Sure enough, people take him up on it. He gives out $40 to 20 people (which is $800 in cash, by the way!) and they all leave.</p>
|
||||
|
||||
<p>So, now we’re all set and we head out to the runway, and the captain comes back on the PA again. The plane’s computer has stopped working. No one knows why. Now we gotta get towed back to the gate.</p>
|
||||
|
||||
<p>Frank Burns is apoplectic. I mean, seriously, I thought he was gonna have a stroke. He’s cursing and screaming. Everyone else is just looking at each other.</p>
|
||||
|
||||
<p>We get back to the gate and this guy is demanding another flight. They offer to book him on the 9:30, which is too late. He’s like, “Isn’t there another flight before 9:30?”</p>
|
||||
|
||||
<p>The gate agent is like, “Well, there was another flight at 8, but it’s all full now. They’re closing the doors now.”</p>
|
||||
|
||||
<p>And he’s like, “Full?! Whaddya mean it’s full? There’s not one open seat on that plane?!?!?!”</p>
|
||||
|
||||
<p>The gate agent is like, “No sir, that plane was wide open until 20 passengers showed up out of nowhere and took all the seats. They were the happiest passengers I’ve ever seen, they were laughing all the way down the jet bridge.”</p>
|
||||
|
||||
<p>It was a very quiet ride on the 9:30 flight.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>This story is, of course, about trade-offs.<a contenteditable="false" data-primary="trade-offs" data-secondary="for leaders" data-type="indexterm" id="id-aphrHKfPUbcV"> </a> Although most of this book focuses on various technical trade-offs in engineering systems, it turns out that trade-offs also apply to human behaviors. As a leader, you need to make decisions about what your teams should do each week. Sometimes the trade-offs are obvious (“if we work on this project, it delays that other one…”); sometimes the trade-offs have unforeseeable consequences that can come back to bite you, as in the preceding story.</p>
|
||||
|
||||
<p>At the highest level, your job as a leader—either of a single team or a larger organization—is to guide people toward solving difficult, ambiguous problems. By <em>ambiguous</em>, we mean that the problem has no obvious solution and might even be unsolvable. Either way, the problem needs to be explored, navigated, and (hopefully) wrestled into a state in which it’s under control. If writing code is analogous to chopping down trees, your job as a leader is to “see the forest through the trees” and find a workable path through that forest, directing engineers toward the important trees. There are three main steps to this process. First, you need to identify the <em>blinders</em>; next, you need to identify the <em>trade-offs</em>; and then you need to <em>decide</em> and iterate on a solution.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="identify_the_blinders">
|
||||
<h2>Identify the Blinders</h2>
|
||||
|
||||
<p>When you first approach a problem, you’ll often discover that a group of people has already been<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="identifying the blinders" data-type="indexterm" id="id-pghQHRCQfGcE"> </a> wrestling with it for years. <a contenteditable="false" data-primary="blinders, identifying" data-type="indexterm" id="id-aphNCbCbfbcV"> </a>These folks have been steeped in the problem for so long that they’re wearing “blinders”—that is, they’re no longer able to see the forest. They make a bunch of assumptions about the problem (or solution) without realizing it. “This is how we’ve always done it,” they’ll say, having lost the ability to consider the status quo critically. Sometimes, you’ll discover bizarre coping mechanisms or rationalizations that have evolved to justify the status quo. This is where you—with fresh eyes—have a great advantage. You can see these blinders, ask questions, and then consider new strategies. (Of course, being unfamiliar with the problem isn’t a requirement for good leadership, but it’s often an advantage.)</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="identify_the_key_trade-offs">
|
||||
<h2>Identify the Key Trade-Offs</h2>
|
||||
|
||||
<p>By definition, important and ambiguous problems do <em>not</em> have magic "silver bullet" solutions. <a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="identifying key trade-offs" data-type="indexterm" id="id-y9hXCPCPI8cr"> </a><a contenteditable="false" data-primary="trade-offs" data-secondary="key, identifying" data-type="indexterm" id="id-JghAU3CXIpcM"> </a>There’s no answer that works forever in all situations. There is only the <em>best answer for the moment</em>, and it almost certainly involves making trade-offs in one direction or another. It’s your job to call out the trade-offs, explain them to everyone, and then help decide how to balance them.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="decidecomma_then_iterate">
|
||||
<h2>Decide, Then Iterate</h2>
|
||||
|
||||
<p>After you understand the trade-offs<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="deciding, then iterating" data-type="indexterm" id="id-y9hBHPCAu8cr"> </a> and how they work, you’re empowered. You can use this information to make the best decision for this particular month.<a contenteditable="false" data-primary="decisions" data-secondary="deciding, then iterating" data-type="indexterm" id="id-JghwC3CDupcM"> </a> Next month, you might need to reevaluate and rebalance the trade-offs again; it’s an iterative process.<a contenteditable="false" data-primary="“Always of leadership”" data-primary-sortas="Always" data-secondary="Always be deciding" data-tertiary="decide, then iterate" data-type="indexterm" id="id-EXhQU8CGupcG"> </a> This is what we mean when we say <em>Always Be Deciding</em>.</p>
|
||||
|
||||
<p>There’s a risk here. If you don’t frame your process as continuous rebalancing of trade-offs, your teams are likely to fall into the trap of searching for the perfect solution, which can then lead to what some call “analysis paralysis.” You need to make your teams comfortable with iteration.<a contenteditable="false" data-primary="iteration, making your teams comfortable with" data-type="indexterm" id="id-JghDHDUDupcM"> </a> One way of doing this is to lower the stakes and calm nerves by explaining: “We’re going to try this decision and see how it goes. Next month, we can undo the change or make a different decision.” This keeps folks flexible and in a state of learning from their choices.</p>
|
||||
|
||||
<aside data-type="sidebar" id="case_study_addressing_the_quotation_mar">
|
||||
<h5>Case Study: Addressing the "Latency" of Web Search</h5>
|
||||
|
||||
<p>In managing a team of teams, there’s a natural tendency to move away from a single product and to instead own a whole “class” of products, or perhaps a broader problem that crosses products.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="Addressing Web Search latency (case study)" data-type="indexterm" id="ix_leadscWScs"> </a><a contenteditable="false" data-primary="Web Search latency case study" data-type="indexterm" id="ix_WbSrch"> </a> A good example of this at Google has to do with our oldest product, Web Search.</p>
|
||||
|
||||
<p>For years, thousands of Google engineers have worked on the general problem of making search results better—improving the “quality” of the results page. But it turns out that this quest for quality has a side effect: it gradually makes the product slower. Once upon a time, Google’s search results were not much more than a page of 10 blue links, each representing a relevant website. Over the past decade, however, thousands of tiny changes to improve “quality” have resulted in ever-richer results: images, videos, boxes with Wikipedia facts, even interactive UI elements. This means the servers need to do much more work to generate information: more bytes are being sent over the wire; the client (usually a phone) is being asked to render ever-more-complex HTML and data. Even though the speed of networks and computers have markedly increased over a decade, the speed of the search page has become slower and slower: its <em>latency</em> has increased. This might not seem like a big deal, but the latency of a product has a direct effect (in aggregate) on users’ engagement and how often they use it. Even increases in rendering time as small as 10 ms matter. Latency creeps up slowly. This is not the fault of a specific engineering team, but rather represents a long, collective poisoning of the commons. At some point, the overall latency of Web Search grows until its effect begins to cancel out the improvements in user engagement that came from the improvements to the "quality" of the results.</p>
|
||||
|
||||
<p>A number of leaders struggled with this issue over the years but failed to address the problem systematically.<a contenteditable="false" data-primary="blinders, identifying" data-secondary="in Web Search latency case study" data-type="indexterm" id="id-rMhMHAfVfAuocL"> </a> The blinders everyone wore assumed that the only way to deal with latency was to declare a latency “code yellow”<sup><a data-type="noteref" id="ch01fn71-marker" href="ch06.html#ch01fn71">1</a></sup> every two or three years, during which everyone dropped everything to optimize code and speed up the product. Although this strategy would work temporarily, the latency would begin creeping up again just a month or two later, and soon return to its prior levels.</p>
|
||||
|
||||
<p>So what changed? At some point, we took a step back, identified the blinders, and did a full reevaluation of the trade-offs.<a contenteditable="false" data-primary="trade-offs" data-secondary="in Web Search latency case study" data-type="indexterm" id="id-kXh0HNIZf7uqcR"> </a> It turns out that the pursuit of "quality" has not one, but <em>two</em> different costs. The first cost is to the user: more quality usually means more data being sent out, which means more latency. The second cost is to Google: more quality means doing more work to generate the data, which costs more CPU time in our servers—what we call "serving capacity." Although leadership had often trodden carefully around the trade-off between quality and capacity, it had never treated latency as a full citizen in the calculus. As the old joke goes, “Good, Fast, Cheap—pick two.” A simple way to depict the trade-offs is to draw a triangle of tension between Good (Quality), Fast (Latency), and Cheap (Capacity), as illustrated in <a data-type="xref" href="ch06.html#trade-offs_within_web_searchsemicolon_p">Figure 6-1</a>.</p>
|
||||
|
||||
<figure id="trade-offs_within_web_searchsemicolon_p"><img alt="Trade-offs within Web Search; pick two!" src="images/seag_0601.png">
|
||||
<figcaption><span class="label">Figure 6-1. </span>Trade-offs within Web Search; pick two!</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>That’s exactly what was happening here. It’s easy to improve any one of these traits by deliberately harming at least one of the other two. For example, you can improve quality by putting more data on the search results page—but doing so will hurt capacity and latency. You can also do a direct trade-off between latency and capacity by changing the traffic load on your serving cluster. If you send more queries to the cluster, you get increased capacity in the sense that you get better utilization of the CPUs—more bang for your hardware buck. But higher load increases resource contention within a computer, making the average latency of a query worse. If you deliberately decrease a cluster’s traffic (run it “cooler”), you have less serving capacity overall, but each query becomes faster.</p>
|
||||
|
||||
<p>The main point here is that this insight—a better understanding of <em>all</em> the trade-offs—allowed us to start experimenting with new ways of balancing. Instead of treating latency as an unavoidable and accidental side effect, we could now treat it as a first-class goal along with our other goals. This led to new strategies for us. For example, our data scientists were able to measure exactly how much latency hurt user engagement. This allowed them to construct a metric that pitted quality-driven improvements to short-term user engagement against latency-driven damage to long-term user engagement. This approach allows us to make more data-driven decisions about product changes. For example, if a small change improves quality but also hurts latency, we can quantitatively decide whether the change is worth launching or not. We are <em>always deciding</em> whether our quality, latency, and capacity changes are in balance, and iterating on our decisions every month.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="Addressing Web Search latency (case study)" data-startref="ix_leadscWScs" data-type="indexterm" id="id-5ZhYUJcGfbuNco"> </a><a contenteditable="false" data-primary="Web Search latency case study" data-startref="ix_WbSrch" data-type="indexterm" id="id-wehefPcAfzu8cB"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="always_be_leaving">
|
||||
<h1>Always Be Leaving</h1>
|
||||
|
||||
<p>At face value, <em>Always Be Leaving</em> sounds like terrible advice. Why would a good leader be trying to leave?<a contenteditable="false" data-primary="“Always of leadership”" data-primary-sortas="Always" data-secondary="Always be leaving" data-type="indexterm" id="id-BrhjCnCpSo"> </a><a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="Always be leaving" data-type="indexterm" id="id-pgh1URCOSQ"> </a> In fact, this is a famous quote from Bharat Mediratta, a former Google engineering director. What he meant was that it’s not just your job to solve an ambiguous problem, but to get your organization to solve it <em>by itself</em>, without you present. If you can do that, it frees you up to move to a new problem (or new organization), leaving a trail of self-sufficient success in your wake.</p>
|
||||
|
||||
<p>The antipattern here, of course, is a situation in which you’ve set yourself up to be a single point of failure (SPOF). <a contenteditable="false" data-primary="single point of failure (SPOF)" data-secondary="leader as" data-type="indexterm" id="id-BrhMHdUpSo"> </a>As we <a contenteditable="false" data-primary="bus factor" data-type="indexterm" id="id-pghDC6UOSQ"> </a>noted earlier in this book, Googlers have a term for that, the bus factor: <em>the number of people that need to get hit by a bus before your project is completely doomed.</em></p>
|
||||
|
||||
<p>Of course, the "bus" here is just a metaphor. People become sick; they switch teams or companies; they move away. As a litmus test, think about a difficult problem that your team is making good progress on. Now imagine that you, the leader, disappear. Does your team keep going? Does it continue to be successful? Here’s an even simpler test: think about the last vacation you took that was at least a week long. Did you keep checking your work email? (Most leaders do.) Ask yourself <em>why</em>. Will things fall apart if you don’t pay attention? If so, you have very likely made yourself an SPOF. You need to fix that.</p>
|
||||
|
||||
<section data-type="sect2" id="your_mission_build_a_quotation_markself">
|
||||
<h2>Your Mission: Build a “Self-Driving” Team</h2>
|
||||
|
||||
<p>Coming back to Bharat’s quote: being a successful leader means<a contenteditable="false" data-primary="self-driving team, building" data-type="indexterm" id="ix_selfdr"> </a> building an organization that is able to solve the difficult problem by itself. That organization needs to have a strong set of leaders, healthy engineering processes, and a positive, self-perpetuating culture that persists over time. Yes, this is difficult; but it gets back to the fact that leading a team of teams is often more about organizing <em>people</em> rather than being a technical wizard. Again, there are three main parts to constructing this sort of self-sufficient group: dividing the problem space, delegating subproblems, and iterating as needed.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="dividing_the_problem_space">
|
||||
<h2>Dividing the Problem Space</h2>
|
||||
|
||||
<p>Challenging problems are usually composed of difficult subproblems.<a contenteditable="false" data-primary="problems" data-secondary="dividing the problem space" data-type="indexterm" id="ix_probdiv"> </a> If you’re leading a team of teams, an obvious choice is to put a team in charge of each subproblem. The risk, however, is that the subproblems can change over time, and rigid team boundaries won’t be able to notice or adapt to this fact. If you’re able, consider an organizational structure that is looser—one in which subteams can change size, individuals can migrate between subteams, and the problems assigned to subteams can morph over time. This involves walking a fine line between “too rigid” and “too vague.” On the one hand, you want your subteams to have a clear sense of problem, purpose, and steady accomplishment; on the other hand, people need the freedom to change direction and try new things in response to a changing environment.</p>
|
||||
|
||||
<section data-type="sect3" id="example_subdividing_the_quotation_markl">
|
||||
<h3>Example: Subdividing the “latency problem” of Google Search</h3>
|
||||
|
||||
<p>When approaching the problem of<a contenteditable="false" data-primary="Google Search" data-secondary="subdividing latency problem of" data-type="indexterm" id="id-OjhKHmCvUNu7SY"> </a> Search latency, we realized that the problem could, at a minimum, be subdivided into two general spaces: work that addressed the <em>symptoms</em> of latency, and different work that addressed the <em>causes</em> of latency. It was obvious that we needed to staff many projects to optimize our codebase for speed, but focusing <em>only</em> on speed wouldn’t be enough. There were still thousands of engineers increasing the complexity and "quality" of search results, undoing the speed improvements as quickly as they landed, so we also needed people to focus on a parallel problem space of preventing latency in the first place. We discovered gaps in our metrics, in our latency analysis tools, and in our developer education and documentation. By assigning different teams to work on latency causes and symptoms at the same time, we were able to systematically control latency over the long term. (Also, notice how these teams owned the <em>problems</em>, not specific solutions!)</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="delegating_subproblems_to_leaders">
|
||||
<h3>Delegating subproblems to leaders</h3>
|
||||
|
||||
<p>It’s essentially a cliché for management books to talk about "delegation," but there’s a reason for that: delegation is <em>really difficult</em> to learn. <a contenteditable="false" data-primary="delegation of subproblems to team leaders" data-type="indexterm" id="id-rMh5CLCVfAuASL"> </a>It goes against all our instincts for efficiency and achievement. That difficulty is the reason for the adage, “If you want something done right, do it yourself.”</p>
|
||||
|
||||
<p>That said, if you agree that your mission is to build a self-driving organization, the main mechanism of teaching is through delegation. You must build a set of self-sufficient leaders, and delegation is absolutely the most effective way to train them. You give them an assignment, let them fail, and then try again and try again. Silicon Valley has well-known mantras about “failing fast and iterating.” That philosophy doesn’t just apply to engineering design, but to human learning as well.</p>
|
||||
|
||||
<p>As a leader, your plate is constantly filling up with important tasks that need to be done. Most of these tasks are things that are fairly easy for you to do. Suppose that you’re working diligently through your inbox, responding to problems, and then you decide to put 20 minutes aside to fix a longstanding and nagging issue. But before you carry out the task, be mindful and stop yourself. Ask this critical question: <em>Am I really the only one who can do this work?</em></p>
|
||||
|
||||
<p>Sure, it might be most <em>efficient</em> for you to do it, but then you’re failing to train your leaders. You’re not building a self-sufficient organization. Unless the task is truly time sensitive and on fire, bite the bullet and assign the work to someone else—presumably someone who you know can do it but will probably take much longer to finish. Coach them on the work if need be. You need to create opportunities for your leaders to grow; they need to learn to "level up" and do this work themselves so that you’re no longer in the critical path.</p>
|
||||
|
||||
<p>The corollary here is that you need to be mindful of your own purpose as a leader of leaders. If you find yourself deep in the weeds, you’re doing a disservice to your organization. When you get to work each day, ask yourself a different critical question: <em>What can I do that</em> nobody <em>else on my team can do?</em></p>
|
||||
|
||||
<p>There are a number of good answers. For example, you can protect your teams from organizational politics; you can give them encouragement; you can make sure everyone is treating one another well, creating a culture of humility, trust, and respect. It’s also important to “manage up,” making sure your management chain understands what your group is doing and staying connected to the company at large. But often the most common and important answer to this question is: “I can see the forest through the trees.” In other words, you can <em>define a high-level strategy.</em> Your strategy needs to cover not just overall technical direction, but an organizational strategy as well. You’re building a blueprint for how the ambiguous problem is solved and how your organization can manage the problem over time. You’re continuously mapping out the forest, and then assigning the tree-cutting to others.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="adjusting_and_iterating">
|
||||
<h3>Adjusting and iterating</h3>
|
||||
|
||||
<p>Let’s assume that you’ve now reached the point at which you’ve built a self-sustaining machine. You’re no longer an SPOF. Congratulations! What do you do now?</p>
|
||||
|
||||
<p>Before answering, note that you have actually liberated yourself—you now have the freedom to "Always Be Leaving." This could be the freedom to tackle a new, adjacent problem, or perhaps you could even move yourself to a whole new department and problem space, making room for the careers of the leaders you’ve trained. This is a great way of avoiding personal burnout.</p>
|
||||
|
||||
<p>The simple answer to “what now?” is to <em>direct</em> this machine and keep it healthy. But unless there’s a crisis, you should use a gentle touch. The book <em>Debugging Teams</em><sup><a data-type="noteref" id="ch01fn72-marker" href="ch06.html#ch01fn72">2</a></sup> has a parable about making mindful adjustments:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>There’s a story about a Master of all things mechanical who had long since retired. His former company was having a problem that no one could fix, so they called in the Master to see if he could help find the problem. The Master examined the machine, listened to it, and eventually pulled out a worn piece of chalk and made a small X on the side of the machine. He informed the technician that there was a loose wire that needed repair at that very spot. The technician opened the machine and tightened the loose wire, thus fixing the problem. When the Master’s invoice arrived for $10,000, the irate CEO wrote back demanding a breakdown for this ridiculously high charge for a simple chalk mark! The Master responded with another invoice, showing a $1 cost for the chalk to make the mark, and $9,999 for knowing where to put it.</p>
|
||||
|
||||
<p>To us, this is a story about wisdom: that a single, carefully considered adjustment can have gigantic effects. We use this technique when managing people. We imagine our team as flying around in a great blimp, headed slowly and surely in a certain direction. Instead of micromanaging and trying to make continuous course corrections, we spend most of the week carefully watching and listening. At the end of the week we make a small chalk mark in a precise location on the blimp, then give a small but critical "tap" to adjust the course.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>This is what good management is about: 95% observation and listening, and 5% making critical adjustments in just the right place. Listen to your leaders and skip-reports. Talk to your customers, and remember that often (especially if your team builds engineering infrastructure), your “customers” are not end users out in the world, but your coworkers. Customers’ happiness requires just as much intense listening as your reports’ happiness. What’s working and what isn’t? Is this self-driving blimp headed in the proper direction? Your direction should be iterative, but thoughtful and minimal, making the minimum adjustments necessary to correct course. If you regress into micromanagement, you risk becoming an SPOF again! “Always Be Leaving” is a call to <em>macro</em>management.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="take_care_in_anchoring_a_teamapostrophe">
|
||||
<h3>Take care in anchoring a team’s identity</h3>
|
||||
|
||||
<p>A common mistake is to put a team in charge of a specific product rather than a general problem. <a contenteditable="false" data-primary="teams" data-secondary="anchoring a team's identity" data-type="indexterm" id="id-kXh0HoCgu7ukSR"> </a>A product is a <em>solution</em> to a problem. The life expectancy of solutions can be short, and products can be replaced by better solutions. However, a <em>problem</em>—if chosen well—can be evergreen. Anchoring a team identity to a specific solution (“We are the team that manages the Git repositories”) can lead to all sorts of angst over time. What if a large percentage of your engineers want to switch to a new version control system? The team is likely to "dig in," defend its solution, and resist change, even if this is not the best path for the organization. The team clings to its blinders, because the solution has become part of the team’s identity and self-worth. If the team instead owns the <em>problem</em> (e.g., “We are the team that provides version control to the company”), it is freed up to experiment with different solutions over time.<a contenteditable="false" data-primary="problems" data-secondary="dividing the problem space" data-startref="ix_probdiv" data-type="indexterm" id="id-KohrIDCMuNuzSG"> </a><a contenteditable="false" data-primary="self-driving team, building" data-startref="ix_selfdr" data-type="indexterm" id="id-5ZhDu1CyubuMSo"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="always_be_scaling">
|
||||
<h1>Always Be Scaling</h1>
|
||||
|
||||
<p>A lot of leadership books talk about “scaling” in the context of learning to “maximize your impact”—strategies to grow your team and influence.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="Always be scaling" data-type="indexterm" id="id-BrhMHnC9to"> </a><a contenteditable="false" data-primary="“Always of leadership”" data-primary-sortas="Always" data-secondary="Always be scaling" data-type="indexterm" id="id-pghDCRC8tQ"> </a> We’re not going to discuss those things here beyond what we’ve already mentioned. It’s probably obvious that building a self-driving organization with strong leaders is already a great recipe for growth and success.</p>
|
||||
|
||||
<p>Instead, we’re going to discuss team scaling from a <em>defensive</em> and personal point of view rather than an offensive one. As a leader, <em>your most precious resource is your limited pool of time, attention, and energy.</em> If you aggressively build out your teams’ responsibilities and power without learning to protect your personal sanity in the process, the scaling is doomed to fail. And so we’re going to talk about how to effectively scale <em>yourself</em> through this process.</p>
|
||||
|
||||
<section data-type="sect2" id="the_cycle_of_success">
|
||||
<h2>The Cycle of Success</h2>
|
||||
|
||||
<p>When a team tackles a difficult problem, there’s a standard pattern that emerges, a particular cycle. <a contenteditable="false" data-primary="success, cycle of" data-type="indexterm" id="id-y9hBHPCQf1tr"> </a>It looks like this:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Analysis</dt>
|
||||
<dd>First, you receive the problem and start to wrestle with it. You identify the blinders, find all the trade-offs, and build consensus about how to manage them.</dd>
|
||||
<dt>Struggle</dt>
|
||||
<dd>You start moving on the work, whether or not your team thinks it’s ready. You prepare for failures, retries, and iteration. At this point, your job is mostly about herding cats. Encourage your leaders and experts on the ground to form opinions and then listen carefully and devise an overall strategy, even if you have to "fake it" at first.<sup><a data-type="noteref" id="ch01fn73-marker" href="ch06.html#ch01fn73">3</a></sup></dd>
|
||||
<dt>Traction</dt>
|
||||
<dd>Eventually your team begins to figure things out. You’re making smarter decisions, and real progress is made. Morale improves. You’re iterating on trade-offs, and the organization is beginning to drive itself around the problem. Nice job!</dd>
|
||||
<dt>Reward</dt>
|
||||
<dd>Something unexpected happens. Your manager takes you aside and congratulates you on your success. You discover your reward isn’t just a pat on the back, but a <em>whole new problem</em> to tackle. That’s right: the reward for success is more work…and more responsibility! Often, it’s a problem that is similar or adjacent to the first one, but equally difficult.</dd>
|
||||
</dl>
|
||||
|
||||
<p>So now you’re in a pickle. You’ve been given a new problem, but (usually) not more people. Somehow you need to solve <em>both</em> problems now, which likely means that the original problem still needs to be managed with <em>half</em> as many people in <em>half</em> the time. You need the other half of your people to tackle the new work! We refer to this final step as the <em>compression stage</em>: you’re taking everything you’ve been doing and compressing it down to half the size.</p>
|
||||
|
||||
<p>So really, the cycle of success is more of a spiral (see <a data-type="xref" href="ch06.html#the_spiral_of_success">Figure 6-2</a>). Over months and years, your organization is scaling by tackling new problems and then figuring out how to compress them so that it can take on new, parallel struggles. If you’re lucky, you’re allowed to hire more people as you go. More often than not, though, your hiring doesn’t keep pace with the scaling. Larry Page, one of Google’s founders, would probably refer to this spiral as “uncomfortably exciting.”</p>
|
||||
|
||||
<figure id="the_spiral_of_success"><img alt="The spiral of success" src="images/seag_0602.png">
|
||||
<figcaption><span class="label">Figure 6-2. </span>The spiral of success</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The spiral of success is a conundrum—it’s something that’s difficult to manage, and yet it’s the main paradigm for scaling a team of teams. The act of compressing a problem isn’t just about figuring out how to maximize your team’s efficiency, but also about learning to scale your <em>own</em> time and attention to match the new breadth of responsibility.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="important_versus_urgent">
|
||||
<h2>Important Versus Urgent</h2>
|
||||
|
||||
<p>Think back to a time when you weren’t yet a leader, but still a carefree individual contributor. <a contenteditable="false" data-primary="problems" data-secondary="important vs. urgent" data-type="indexterm" id="id-JghDH3CXIytM"> </a><a contenteditable="false" data-primary="important versus urgent problems" data-type="indexterm" id="id-EXheC8CeI3tG"> </a><a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="important vs. urgent problems" data-type="indexterm" id="ix_leadscimpur"> </a>If you used to be a programmer, your life was likely calmer and more panic-free. You had a list of work to do, and each day you’d methodically work down your list, writing code and debugging problems. Prioritizing, planning, and executing your work was straightforward.</p>
|
||||
|
||||
<p>As you moved into leadership, though, you might have noticed that your main mode of work became less predictable and more about firefighting. That is, your job became less <em>proactive</em> and more <em>reactive.</em> The higher up in leadership you go, the more escalations you receive. You are the "finally" clause in a long list of code blocks! All of your means of communication—email, chat rooms, meetings—begin to feel like a Denial-of-Service attack against your time and attention. In fact, if you’re not mindful, you end up spending 100% of your time in reactive mode. People are throwing balls at you, and you’re frantically jumping from one ball to the next, trying not to let any of them hit the ground.</p>
|
||||
|
||||
<p>A lot of books have discussed this problem.<a contenteditable="false" data-primary="Eisenhower, Dwight D." data-type="indexterm" id="id-OjhKHEfMIKt6"> </a> The management author Stephen Covey is famous for talking about the idea of distinguishing between things that are <em>important</em> versus things that are <em>urgent.</em> In fact, it was US President Dwight D. Eisenhower who popularized this idea in a famous 1954 quote:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>I have two kinds of problems, the urgent and the important. The urgent are not important, and the important are never urgent.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>This tension is one of the biggest dangers to your effectiveness as a leader. If you let yourself slip into pure reactive mode (which happens almost automatically), you spend every moment of your life on <em>urgent</em> things, but almost none of those things are <em>important</em> in the big picture. Remember that your job as a leader is to do things that <em>only you can do</em>, like mapping a path through the forest. Building that meta-strategy is incredibly important, but almost never urgent. It’s always easier to respond to that next urgent email.</p>
|
||||
|
||||
<p>So how can you force yourself to work mostly on important things, rather than urgent things? Here are a few key techniques:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Delegate</dt>
|
||||
<dd>Many of the urgent things you see can be delegated back to other leaders in your organization. You might feel guilty if it’s a trivial task; or you might worry that handing off an issue is inefficient because it might take those other leaders longer to fix. But it’s good training for them, and it frees up your time to work on important things that only you can do.</dd>
|
||||
<dt>Schedule dedicated time</dt>
|
||||
<dd>Regularly block out two hours or more to sit quietly and work <em>only</em> on important-but-not-urgent things—things like team strategy, career paths for your leaders, or how you plan to collaborate with neighboring teams.</dd>
|
||||
<dt>Find a tracking system that works</dt>
|
||||
<dd>There are dozens of systems for tracking and prioritizing work. <a contenteditable="false" data-primary="tracking systems for work" data-type="indexterm" id="id-wehMHduwcVIBtB"> </a>Some are software based (e.g., specific “to-do” tools), some are pen-and-paper based (the “<a href="http://www.bulletjournal.com">Bullet Journal</a>” method), and some systems are agnostic to implementation. In this last category, David Allen’s book, <em>Getting Things Done,</em> is quite popular among engineering managers; it’s an abstract algorithm for working through tasks and maintaining a prized "inbox zero." The point here is to <em>try</em> these different systems and determine what works for you. Some of them will click with you and some will not, but you definitely need to find something more effective than tiny Post-It notes decorating your computer screen.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="important vs. urgent problems" data-startref="ix_leadscimpur" data-type="indexterm" id="id-6LhqI8uocRIlt7"> </a></dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="learn_to_drop_balls">
|
||||
<h2>Learn to Drop Balls</h2>
|
||||
|
||||
<p>There’s one more key technique for managing your time, and on the surface it sounds radical.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="learning to drop balls" data-type="indexterm" id="id-EXhjH8CGu3tG"> </a> For many, it contradicts years of engineering instinct. As an engineer, you pay attention to detail; you make lists, you check things off lists, you’re precise, and you finish what you start. That’s why it feels so good to close bugs in a bug tracker, or whittle your email down to inbox zero. But as a leader of leaders, your time and attention are under constant attack. No matter how much you try to avoid it, you end up dropping balls on the floor—there are just too many of them being thrown at you. It’s overwhelming, and you probably feel guilty about this all the time.</p>
|
||||
|
||||
<p>So, at this point, let’s step back and take a frank look at the situation. If dropping some number of balls is inevitable, isn’t it better to drop certain balls <em>deliberately</em> rather than <em>accidentally</em>? At least then you have some semblance of control.</p>
|
||||
|
||||
<p>Here’s a great way to do that.</p>
|
||||
|
||||
<p>Marie Kondo is an organizational consultant and the author<a contenteditable="false" data-primary="Kondo, Marie" data-type="indexterm" id="id-rMhMHOIYuEtj"> </a> of the extremely popular book <em>The Life-Changing Magic of Tidying Up</em>. Her philosophy is about effectively decluttering all of the junk from your house, but it works for abstract clutter as well.</p>
|
||||
|
||||
<p>Think of your physical possessions as living in three piles. About 20% of your things are just useless—things that you literally never touch anymore, and all very easy to throw away. About 60% of your things are somewhat interesting; they vary in importance to you, and you sometimes use them, sometimes not. And then about 20% of your possessions are exceedingly important: these are the things you use <em>all</em> the time, that have deep emotional meaning, or, in Ms. Kondo’s words, spark deep “joy” just holding them. The thesis of her book is that most people declutter their lives incorrectly: they spend time tossing the bottom 20% in the garbage, but the remaining 80% still feels too cluttered. She argues that the <em>true</em> work of decluttering is about identifying the top 20%, not the bottom 20%. If you can identify only the critical things, you should then toss out the other 80%. It sounds extreme, but it’s quite effective. It is greatly freeing to declutter so radically.</p>
|
||||
|
||||
<p>It turns out that you can also apply this philosophy to your inbox or task list—the barrage of balls being thrown at you. Divide your pile of balls into three groups: the bottom 20% are probably neither urgent nor important and very easy to delete or ignore. There’s a middle 60%, which might contain some bits of urgency or importance, but it’s a mixed bag. At the top, there’s 20% of things that are absolutely, critically important.</p>
|
||||
|
||||
<p>And so now, as you work through your tasks, do <em>not</em> try to tackle the top 80%—you’ll still end up overwhelmed and mostly working on urgent-but-not-important tasks. Instead, mindfully identify the balls that strictly fall in the top 20%—critical things that <em>only you can do</em>—and focus strictly on them. Give yourself explicit permission to drop the other 80%.</p>
|
||||
|
||||
<p>It might feel terrible to do so at first, but as you deliberately drop so many balls, you’ll discover two amazing things. First, even if you don’t delegate that middle 60% of tasks, your subleaders often notice and pick them up automatically. Second, if something in that middle bucket is truly critical, it ends up coming back to you anyway, eventually migrating up into the top 20%. You simply need to <em>trust</em> that things below your top-20% threshold will either be taken care of or evolve appropriately. Meanwhile, because you’re focusing only on the critically important things, you’re able to scale your time and attention to cover your group’s ever-growing responsibilities.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="protecting_your_energy">
|
||||
<h2>Protecting Your Energy</h2>
|
||||
|
||||
<p>We’ve talked about protecting your time and attention—but your personal energy is the other piece of the equation.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-secondary="protecting your energy" data-type="indexterm" id="id-OjhKHmClTKt6"> </a> All of this scaling is simply exhausting. In an environment like this, how do you stay charged and optimistic?</p>
|
||||
|
||||
<p>Part of the answer is that over time, as you grow older, your overall stamina builds up. Early in your career, working eight hours a day in an office can feel like a shock; you come home tired and dazed. But just like training for a marathon, your brain and body build up larger reserves of stamina over time.</p>
|
||||
|
||||
<p>The other key part of the answer is that leaders gradually learn to <em>manage</em> their energy more intelligently. It’s something they learn to pay constant attention to. Typically, this means being aware of how much energy you have at any given moment, and making deliberate choices to “recharge” yourself at specific moments, in specific ways. Here are some great examples of mindful energy management:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Take <span class="plain">real</span> vacations</dt>
|
||||
<dd>A weekend is not a vacation. It takes at least three days to "forget" about your work; it takes at least a week to actually feel refreshed. But if you check your work email or chats, you <em>ruin</em> the recharge. A flood of worry comes back into your mind, and all of the benefit of psychological distancing dissipates. The vacation recharges only if you are truly disciplined about disconnecting.<sup><a data-type="noteref" id="ch01fn74-marker" href="ch06.html#ch01fn74">4</a></sup> And, of course, this is possible only if you’ve built a self-driving organization.</dd>
|
||||
<dt>Make it trivial to disconnect</dt>
|
||||
<dd>When you disconnect, leave your work laptop at the office. If you have work communications on your phone, remove them. For example, if your company uses G Suite (Gmail, Google Calendar, etc.), a great trick is to install these apps in a "work profile" on your phone. This causes a second set of work-badged apps to appear on your phone. For example, you’ll now have two Gmail apps: one for personal email, one for work email. On an Android phone, you can then press a single button to disable the entire work profile at once. All the work apps gray out, as if they were uninstalled, and you can’t “accidentally” check work messages until you re-enable the work profile.</dd>
|
||||
<dt>Take <span class="plain">real</span> weekends, too</dt>
|
||||
<dd>A weekend isn’t as effective as a vacation, but it still has some rejuvenating power. Again, this recharge works only if you disconnect from work communications. Try truly signing out on Friday night, spend the weekend doing things you love, and then sign in again on Monday morning when you’re back in the office.</dd>
|
||||
<dt>Take breaks during the day</dt>
|
||||
<dd>Your brain operates in natural 90-minute cycles.<sup><a data-type="noteref" id="ch01fn75-marker" href="ch06.html#ch01fn75">5</a></sup> Use the opportunity to get up and walk around the office, or spend 10 minutes walking outside. Tiny breaks like this are only tiny recharges, but they can make a tremendous difference in your stress levels and how you feel over the next two hours of work.</dd>
|
||||
<dt>Give yourself permission to take a mental health day</dt>
|
||||
<dd>Sometimes, for no reason, you just have a bad day. You might have slept well, eaten well, exercised—and yet you are still in a terrible mood anyway. If you’re a leader, this is an awful thing. Your bad mood sets the tone for everyone around you, and it can lead to terrible decisions (emails you shouldn’t have sent, overly harsh judgements, etc.). If you find yourself in this situation, just turn around and go home, declaring a sick day. Better to get nothing done that day than to do active damage.</dd>
|
||||
</dl>
|
||||
|
||||
<p>In the end, managing your energy is just as important as managing your time. If you learn to master these things, you’ll be ready to tackle the broader cycle of scaling responsibility and building a self-sufficient team.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00010">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Successful leaders naturally take on more responsibility as they progress (and that’s a good and natural thing). Unless they effectively come up with techniques to properly make decisions quickly, delegate when needed, and manage their increased responsibility, they might end up feeling overwhelmed. Being an effective leader doesn’t mean that you need to make perfect decisions, do everything yourself, or work twice as hard. Instead, strive to always be deciding, always be leaving, and always be scaling.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00106">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Always Be Deciding: Ambiguous problems have no magic answer; they’re all about finding the right <em>trade-offs</em> of the moment, and iterating.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Always Be Leaving: Your job, as a leader, is to build an organization that automatically solves a class of ambiguous problems—over <em>time</em>—without you needing to be present.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Always Be Scaling: Success generates more responsibility over time, and you must proactively manage the <em>scaling</em> of this work in order to protect your scarce resources of personal time, attention, and energy.<a contenteditable="false" data-primary="leadership, scaling into a really good leader" data-startref="ix_leadsc" data-type="indexterm" id="id-EXheClHdUkCPsE"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn71"><sup><a href="ch06.html#ch01fn71-marker">1</a></sup>“Code yellow” is Google’s term for “emergency hackathon to fix a critical problem.” Affected teams are expected to suspend all work and focus 100% attention on the problem until the state of emergency is declared over.</p><p data-type="footnote" id="ch01fn72"><sup><a href="ch06.html#ch01fn72-marker">2</a></sup>Brian W. Fitzpatrick and Ben Collins-Sussman, <a class="orm:hideurl" href="http://shop.oreilly.com/product/0636920042372.do"><em>Debugging Teams: Better Productivity through Collaboration</em></a> (Boston: O’Reilly, 2016).</p><p data-type="footnote" id="ch01fn73"><sup><a href="ch06.html#ch01fn73-marker">3</a></sup>It’s easy for imposter syndrome to kick in at this point. One technique for fighting the feeling that you don’t know what you’re doing is to simply pretend that <em>some</em> expert out there knows exactly what to do, and that they’re simply on vacation and you’re temporarily subbing in for them. It’s a great way to remove the personal stakes and give yourself permission to fail and learn.</p><p data-type="footnote" id="ch01fn74"><sup><a href="ch06.html#ch01fn74-marker">4</a></sup>You need to plan ahead and build around the assumption that your work simply won’t get done during vacation. Working hard (or smart) just before and after your vacation mitigates this issue.</p><p data-type="footnote" id="ch01fn75"><sup><a href="ch06.html#ch01fn75-marker">5</a></sup>You can read more about BRAC at <a href="https://en.wikipedia.org/wiki/Basic_rest-activity_cycle"><em class="hyperlink">https://en.wikipedia.org/wiki/Basic_rest-activity_cycle</em></a>.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
421
clones/abseil.io/resources/swe-book/html/ch07.html
Normal file
|
@ -0,0 +1,421 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="measuring_engineering_productivity">
|
||||
<h1>Measuring Engineering Productivity</h1>
|
||||
|
||||
<p class="byline">Written by Ciera Jaspen</p>
|
||||
|
||||
<p class="byline">Edited by Riona Macnamara</p>
|
||||
|
||||
<p>Google is a data-driven company.<a contenteditable="false" data-primary="engineering productivity, measuring" data-type="indexterm" id="ix_engprd"> </a> We back up most of our products and design decisions with hard data. The culture of data-driven decision making, using appropriate metrics, has some drawbacks, but overall, relying on data tends to make most decisions objective rather than subjective, which is often a good thing. Collecting and analyzing data on the human side of things, however, has its own challenges.<a contenteditable="false" data-primary="measurements" data-seealso="engineering productivity, measuring" data-type="indexterm" id="id-RRS9ILhO"> </a> Specifically, within software engineering, Google has found that having a team of specialists focus on engineering productivity itself to be very valuable and important as the company scales and can leverage insights from such a team.</p>
|
||||
|
||||
<section data-type="sect1" id="why_should_we_measure_engineering_produ">
|
||||
<h1>Why Should We Measure Engineering Productivity?</h1>
|
||||
|
||||
<p>Let’s presume that you <a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="reasons for" data-type="indexterm" id="ix_engprdrsn"> </a>have a thriving business (e.g., you run an online search engine), and you want to increase your business’s scope (enter into the enterprise application market, or the cloud market, or the mobile market). Presumably, to increase the scope of your business, you’ll need to also increase the size of your engineering organization. However, as organizations grow in size linearly, communication costs grow quadratically.<sup><a data-type="noteref" id="ch01fn76-marker" href="ch07.html#ch01fn76">1</a></sup> Adding more people will be necessary to increase the scope of your business, but the communication overhead costs will not scale linearly as you add additional personnel. As a result, you won’t be able to scale the scope of your business linearly to the size of your engineering organization.</p>
|
||||
|
||||
<p>There is another way to address our scaling problem, though: <em>we could make each individual more productive</em>. If we can increase the <a contenteditable="false" data-primary="individual engineers, increasing productivity of" data-type="indexterm" id="id-7RSpI2HDU4"> </a>productivity of individual engineers in the organization, we can increase the scope of our business without the commensurate increase in communication overhead.</p>
|
||||
|
||||
<p>Google has had to grow quickly into new businesses, which has meant learning how to make our engineers more productive. To do this, we needed to understand what makes them productive, identify inefficiencies in our engineering processes, and fix the identified problems. Then, we would repeat the cycle as needed in a continuous improvement loop. By doing this, we would be able to scale our engineering organization with the increased demand on it.</p>
|
||||
|
||||
<p>However, this improvement cycle <em>also</em> takes human resources. It would not be worthwhile to improve the productivity of your engineering organization by the equivalent of 10 engineers per year if it took 50 engineers per year to understand and fix productivity blockers. <em>Therefore, our goal is to not only improve software engineering productivity, but to do so efficiently.</em></p>
|
||||
|
||||
<p>At Google, we addressed these trade-offs by creating a team of researchers dedicated to understanding engineering productivity. Our research team includes people from the software engineering research field and generalist software engineers, but we also include social scientists from a variety of fields, including cognitive psychology and behavioral economics. The addition of people from the social sciences allows us to not only study the software artifacts that engineers produce, but to also understand the human side of software development, including personal motivations, incentive structures, and strategies for managing complex tasks. The goal of the team is to take a data-driven approach to measuring and improving engineering productivity.</p>
|
||||
|
||||
<p>In this chapter, we walk through how our research team achieves this goal. This begins with the triage process: there are many parts of software development that we <em>can</em> measure, but what <em>should</em> we measure? After a project is selected, we walk through how the research team identifies meaningful metrics that will identify the problematic parts of the process. Finally, we look at how Google uses these metrics to track improvements to productivity.</p>
|
||||
|
||||
<p>For this chapter, we follow one concrete example posed by the C++ and Java language teams at Google: readability. For most of Google’s existence, these teams have managed the readability process at Google. (For more on readability, see <a data-type="xref" href="ch03.html#knowledge_sharing">Knowledge Sharing</a>.) The readability process was put in place in the early days of Google, before automatic formatters (<a data-type="xref" href="ch08.html#style_guides_and_rules">Style Guides and Rules</a>) and linters that block submission were commonplace (<a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a>). The process itself is expensive to run because it requires hundreds of engineers performing readability reviews for other engineers in order to grant readability to them. Some engineers viewed it as an archaic hazing process that no longer held utility, and it was a favorite topic to argue about around the lunch table. The concrete question from the language teams was this: is the time spent on the readability process worthwhile?<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="reasons for" data-startref="ix_engprdrsn" data-type="indexterm" id="id-2mSnhwf9UD"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="triage_is_it_even_worth_measuringquesti">
|
||||
<h1>Triage: Is It Even Worth Measuring?</h1>
|
||||
|
||||
<p>Before we decide how to measure the productivity of engineers, we need to know when a metric is even worth measuring.<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="assessing worth of measuring" data-type="indexterm" id="ix_engprdmeas"> </a><a contenteditable="false" data-primary="metrics" data-secondary="assessing worth of measuring" data-type="indexterm" id="ix_mtrmeas"> </a> The measurement itself is expensive: it takes people to measure the process, analyze the results, and disseminate them to the rest of the company. Furthermore, the measurement process itself might be onerous and slow down the rest of the engineering organization. Even if it is not slow, tracking progress might change engineers' behavior, possibly in ways that mask the underlying issues. We need to measure and estimate smartly; although we don’t want to guess, we shouldn’t waste time and resources measuring unnecessarily.</p>
|
||||
|
||||
<p>At Google, we’ve come up with a series of questions to help teams determine whether it’s even worth measuring productivity in the first place. We first ask people to describe what they want to measure in the form of a concrete question; we find that the more concrete people can make this question, the more likely they are to derive benefit from the process. When the readability team approached us, its question was simple: are the costs of an engineer going through the readability process worth the benefits they might be deriving for the company?</p>
|
||||
|
||||
<p>We then ask them to consider the following aspects of their question:</p>
|
||||
|
||||
<dl>
|
||||
<dt>What result are you expecting, and why?</dt>
|
||||
<dd>
|
||||
<p>Even though we might like to pretend that we are neutral investigators, we are not. We do have preconceived notions about what ought to happen. By acknowledging this at the outset, we can try to address these biases and prevent post hoc explanations of the results.</p>
|
||||
|
||||
<p>When this question was posed to the readability team, it noted that it was not sure. People were certain the costs had been worth the benefits at one point in time, but with the advent of autoformatters and static analysis tools, no one was entirely certain. There was a growing belief that the process now served as a hazing ritual. Although it might still provide engineers with benefits (and they had survey data showing that people did claim these benefits), it was not clear whether it was worth the time commitment of the authors or the reviewers of the code.</p>
|
||||
</dd>
|
||||
<dt>If the data supports your expected result, what action will be taken?</dt>
|
||||
<dd>
|
||||
<p>We ask this because if no action will be taken, there is no point in measuring. Notice that an action might in fact be “maintain the status quo” if there is a planned change that will occur if we didn’t have this result.</p>
|
||||
|
||||
<p>When asked about this, the answer from the readability team was straightforward: if the benefit was enough to justify the costs of the process, they would link to the research and the data on the FAQ about readability and advertise it to set expectations.</p>
|
||||
</dd>
|
||||
<dt>If we get a negative result, will appropriate action be taken?</dt>
|
||||
<dd>
|
||||
<p>We ask this question because in many cases, we find that a negative result will not change a decision. There might be other inputs into a decision that would override any negative result. If that is the case, it might not be worth measuring in the first place. This is the question that stops most of the projects that our research team takes on; we learn that the decision makers were interested in knowing the results, but for other reasons, they will not choose to change course.</p>
|
||||
|
||||
<p>In the case of readability, however, we had a strong statement of action from the team. It committed that, if our analysis showed that the costs either outweighed the benefit or the benefits were negligible, the team would kill the process. As different programming languages have different levels of maturity in formatters and static analyses, this evaluation would happen on a per-language basis.</p>
|
||||
</dd>
|
||||
<dt>Who is going to decide to take action on the result, and when would they do it?</dt>
|
||||
<dd>
|
||||
<p>We ask this to ensure that the person requesting the measurement is the one who is empowered to take action (or is doing so directly on their behalf). Ultimately, the goal of measuring our software process is to help people make business decisions. It’s important to understand who that individual is, including what form of data convinces them. Although the best research includes a variety of approaches (everything from structured interviews to statistical analyses of logs), there might be limited time in which to provide decision makers with the data they need. In those cases, it might be best to cater to the decision maker. Do they tend to make decisions by empathizing through the stories that can be retrieved from interviews?<sup><a data-type="noteref" id="ch01fn77-marker" href="ch07.html#ch01fn77">2</a></sup> Do they trust survey results or logs data? Do they feel comfortable with complex statistical analyses? If the decider doesn’t believe the form of the result in principle, there is again no point in measuring the process.</p>
|
||||
|
||||
<p>In the case of readability, we had a clear decision maker for each programming language. Two language teams, Java and C++, actively reached out to us for assistance, and the others were waiting to see what happened with those languages first.<sup><a data-type="noteref" id="ch01fn78-marker" href="ch07.html#ch01fn78">3</a></sup> The decision makers trusted engineers’ self-reported experiences for understanding happiness and learning, but the decision makers wanted to see “hard numbers” based on logs data for velocity and code quality. This meant that we needed to include both qualitative and quantitative analysis for these metrics. There was not a hard deadline for this work, but there was an internal conference that would make for a useful time for an announcement if there was going to be a change. That deadline gave us several months in which to complete the work.</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
||||
<p>By asking these questions, we find that in many cases, measurement is simply not worthwhile…and that’s OK! There are many good reasons to not measure the impact of a tool or process on productivity. Here are some examples that we’ve seen:</p>
|
||||
|
||||
<dl>
|
||||
<dt>You can’t afford to change the process/tools right now</dt>
|
||||
<dd>There might be time constraints or financial constraints that prevent this. For example, you might determine that if only you switched to a faster build tool, it would save hours of time every week. However, the switchover will mean pausing development while everyone converts over, and there’s a major funding deadline approaching such that you cannot afford the interruption. Engineering trade-offs are not evaluated in a vacuum—in a case like this, it’s important to realize that the broader context completely justifies delaying action on a result.</dd>
|
||||
<dt>Any results will soon be invalidated by other factors</dt>
|
||||
<dd>
|
||||
<p>Examples here might include measuring the software process of an organization just before a planned reorganization. Or measuring the amount of technical debt for a deprecated system.</p>
|
||||
|
||||
<p>The decision maker has strong opinions, and you are unlikely to be able to provide a large enough body of evidence, of the right type, to change their beliefs.</p>
|
||||
|
||||
<p>This comes down to knowing your audience. Even at Google, we sometimes find people who have unwavering beliefs on a topic due to their past experiences. We have found stakeholders who never trust survey data because they do not believe self-reports. We’ve also found stakeholders who are swayed best by a compelling narrative that was informed by a small number of interviews. And, of course, there are stakeholders who are swayed only by logs analysis. In all cases, we attempt to triangulate on the truth using mixed methods, but if a stakeholder is limited to believing only in methods that are not appropriate for the problem, there is no point in doing the work.</p>
|
||||
</dd>
|
||||
<dt>The results will be used only as vanity metrics to support something you were going to do anyway</dt>
|
||||
<dd>This is perhaps the most common reason we tell people at Google not to measure a software process. Many times, people have planned a decision for multiple reasons, and improving the software development process is only one benefit of several. For example, the release tool team at Google once requested a measurement to a planned change to the release workflow system. Due to the nature of the change, it was obvious that the change would not be worse than the current state, but they didn’t know if it was a minor improvement or a large one. We asked the team: if it turns out to only be a minor improvement, would you spend the resources to implement the feature anyway, even if it didn’t look to be worth the investment? The answer was yes! The feature happened to improve productivity, but this was a side effect: it was also more performant and lowered the release tool team’s maintenance burden.</dd>
|
||||
<dt>The only metrics available are not precise enough to measure the problem and can be confounded by other factors</dt>
|
||||
<dd>In some cases, the metrics needed (see the upcoming section on how to identify metrics) are simply unavailable. In these cases, it can be tempting to measure using other metrics that are less precise (lines of code written, for example). However, any results from these metrics will be uninterpretable. If the metric confirms the stakeholders’ preexisting beliefs, they might end up proceeding with their plan without consideration that the metric is not an accurate measure. If it does not confirm their beliefs, the imprecision of the metric itself provides an easy explanation, and the stakeholder might, again, proceed with their plan.</dd>
|
||||
</dl>
|
||||
|
||||
<p>When you are successful at measuring your software process, you aren’t setting out to prove a hypothesis correct or incorrect; <em>success means giving a stakeholder the data they need to make a decision</em>. If that stakeholder won’t use the data, the project is always a failure. We should only measure a software process when a concrete decision will be made based on the outcome. For the readability team, there was a clear decision to be made. If the metrics showed the process to be beneficial, they would publicize the result. If not, the process would be abolished. Most important, the readability team had the authority to make this decision.<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="assessing worth of measuring" data-startref="ix_engprdmeas" data-type="indexterm" id="id-NRS9IMfDcd"> </a><a contenteditable="false" data-primary="metrics" data-secondary="assessing worth of measuring" data-startref="ix_mtrmeas" data-type="indexterm" id="id-2mSdHwfqcD"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="selecting_meaningful_metrics_with_goals">
|
||||
<h1>Selecting Meaningful Metrics with Goals and Signals</h1>
|
||||
|
||||
<p>After we decide to measure a software process, we need to determine what metrics to use. <a contenteditable="false" data-primary="metrics" data-secondary="meaningful, selecting with goals and signals" data-type="indexterm" id="ix_mtrGSM"> </a><a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="selecting meaningful metrics with goals and signals" data-type="indexterm" id="ix_engprdGS"> </a>Clearly, lines of code (LOC) won’t do,<sup><a data-type="noteref" id="ch01fn79-marker" href="ch07.html#ch01fn79">4</a></sup> but how do we actually measure engineering productivity?</p>
|
||||
|
||||
<p>At Google, we use the Goals/Signals/Metrics (GSM) framework <a contenteditable="false" data-primary="signals" data-secondary="Goals/Signals/Metrics (GSM) framework" data-type="indexterm" id="id-PRSqCJHRTJ"> </a>to guide metrics <span class="keep-together">creation.</span><a contenteditable="false" data-primary="Goals/Signals/Metrics (GSM) framework" data-type="indexterm" id="ix_GSM"> </a></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A <em>goal</em> is a desired end result. It’s phrased in terms of what you want to understand at a high level and should not contain references to specific ways to measure it.<a contenteditable="false" data-primary="goals" data-secondary="defined" data-type="indexterm" id="id-aGSOILCDCxhoT4"> </a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A <em>signal</em> is how you might know that you’ve achieved the end result.<a contenteditable="false" data-primary="signals" data-secondary="defined" data-type="indexterm" id="id-MRS7IRCEImhJTD"> </a> Signals are things we would <em>like</em> to measure, but they might not be measurable themselves.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A <em>metric</em> is proxy for a signal.<a contenteditable="false" data-primary="metrics" data-secondary="in GSM framework" data-type="indexterm" id="id-GRSbINCRHKhzTZ"> </a> It is the thing we actually can measure. It might not be the ideal measurement, but it is something that we believe is close enough.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>The GSM framework encourages several desirable properties when creating metrics. First, by creating goals<a contenteditable="false" data-primary="streetlight effect" data-type="indexterm" id="id-aGSwCkUQTp"> </a> first, then signals, and finally metrics, it prevents the <em>streetlight effect</em>. The term comes from the full phrase “looking for your keys under the streetlight”: if you look only where you can see, you might not be looking in the right place. With metrics, this occurs when we use the metrics that we have easily accessible and that are easy to measure, regardless of whether those metrics suit our needs. Instead, GSM forces us to think about which metrics will actually help us achieve our goals, rather than simply what we have readily available.</p>
|
||||
|
||||
<p>Second, GSM helps prevent both metrics creep and metrics bias by encouraging us to come up with the appropriate set of metrics, using a principled approach, <em>in advance</em> of actually measuring the result. Consider the case in which we select metrics without a principled approach and then the results do not meet our stakeholders' expectations. At that point, we run the risk that stakeholders will propose that we use different metrics that they believe will produce the desired result. And because we didn’t select based on a principled approach at the start, there’s no reason to say that they’re wrong! Instead, GSM encourages us to select metrics based on their ability to measure the original goals. Stakeholders can easily see that these metrics map to their <span class="keep-together">original</span> goals and agree, in advance, that this is the best set of metrics for measuring the outcomes.</p>
|
||||
|
||||
<p>Finally, GSM can show us where we have measurement coverage and where we do not. When we run through the GSM process, we list all our goals and create signals for each one. As we will see in the examples, not all signals are going to be measurable—and that’s OK! With GSM, at least we have identified what is not measurable. By identifying these missing metrics, we can assess whether it is worth creating new metrics or even worth measuring at all.</p>
|
||||
|
||||
<p>The important <a contenteditable="false" data-primary="traceability, maintaining for metrics" data-type="indexterm" id="id-NRS1CMfJTd"> </a>thing is to maintain <em>traceability</em>. For each metric, we should be able to trace back to the signal that it is meant to be a proxy for and to the goal it is trying to measure. This ensures that we know which metrics we are measuring and why we are measuring them.<a contenteditable="false" data-primary="metrics" data-secondary="meaningful, selecting with goals and signals" data-startref="ix_mtrGSM" data-type="indexterm" id="id-BRSXHmfOTM"> </a><a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="selecting meaningful metrics with goals and signals" data-startref="ix_engprdGS" data-type="indexterm" id="id-nLS9hZfaTM"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="goals">
|
||||
<h1>Goals</h1>
|
||||
|
||||
<p>A goal should be written in terms of a desired property, without reference to any <span class="keep-together">metric.</span><a contenteditable="false" data-primary="Goals/Signals/Metrics (GSM) framework" data-secondary="goals" data-type="indexterm" id="id-dBSpIwIJfG"> </a><a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="goals" data-type="indexterm" id="id-aGSzHmIwfp"> </a> By themselves, these goals are not measurable, but a good set of goals is something that everyone can agree on before proceeding onto signals and then metrics.</p>
|
||||
|
||||
<p>To make this work, we need to have identified the correct set of goals to measure in the first place. This would seem straightforward: surely the team knows the goals of their work! <a contenteditable="false" data-primary="trade-offs" data-secondary="in engineering productivity" data-type="indexterm" id="id-dBSLCBHJfG"> </a>However, our research team has found that in many cases, people forget to include all the possible <em>trade-offs within productivity</em>, which could lead to <span class="keep-together">mismeasurement.</span></p>
|
||||
|
||||
<p>Taking the readability example, let’s assume that the team was so focused on making the readability process fast and easy that it had forgotten the goal about code quality. The team set up tracking measurements for how long it takes to get through the review process and how happy engineers are with the process. One of our teammates proposes the following:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>I can make your review velocity very fast: just remove code reviews entirely.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>Although this is obviously an extreme example, teams forget core trade-offs all the time when measuring: they become so focused on improving velocity that they forget to measure quality (or vice versa). To combat this, our research team divides productivity into five core components. These five components are in trade-off with one another, and we encourage teams to consider goals in each of these components to ensure that they are not inadvertently improving one while driving others downward.<a contenteditable="false" data-primary="QUANTS in engineering productivity metrics" data-type="indexterm" id="id-GRSKCEcYfO"> </a> To help people remember all five components, we use the mnemonic “QUANTS”:</p>
|
||||
|
||||
<dl class="pagebreak-before">
|
||||
<dt><strong>Qu</strong>ality of the code</dt>
|
||||
<dd>What is the quality of the code produced?<a contenteditable="false" data-primary="quality of code" data-type="indexterm" id="id-2mSNC3I8TOfj"> </a><a contenteditable="false" data-primary="code" data-secondary="quality of" data-type="indexterm" id="id-BRSjIOIOTpfj"> </a> Are the test cases good enough to prevent regressions? How good is an architecture at mitigating risk and changes?</dd>
|
||||
<dt><strong>A</strong>ttention from engineers</dt>
|
||||
<dd>How frequently do engineers reach a state of flow?<a contenteditable="false" data-primary="attention from engineers (QUANTS)" data-type="indexterm" id="id-nLSMC2haTjf8"> </a> How much are they distracted by notifications? Does a tool encourage engineers to context switch?</dd>
|
||||
<dt>I<strong>n</strong>tellectual complexity</dt>
|
||||
<dd>How much cognitive load is required to complete a task?<a contenteditable="false" data-primary="intellectual complexity (QUANTS)" data-type="indexterm" id="id-XRSRCdcMTJfE"> </a> What is the inherent complexity of the problem being solved? Do engineers need to deal with unnecessary complexity?</dd>
|
||||
<dt><strong>T</strong>empo and velocity</dt>
|
||||
<dd>How quickly can engineers accomplish their tasks?<a contenteditable="false" data-primary="tempo and velocity (QUANTS)" data-type="indexterm" id="id-gXSPCWfpTgfx"> </a> How fast can they push their releases out? How many tasks do they complete in a given timeframe?</dd>
|
||||
<dt><strong>S</strong>atisfaction</dt>
|
||||
<dd>How happy are engineers with their tools? <a contenteditable="false" data-primary="satisfaction (QUANTS)" data-type="indexterm" id="id-ERSzCKuNT8fP"> </a>How well does a tool meet engineers' needs? How satisfied are they with their work and their end product? Are engineers feeling burned out?</dd>
|
||||
</dl>
|
||||
|
||||
<p>Going back to the readability example, our research team worked with the readability team to identify several productivity goals of the readability process:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Quality of the code</dt>
|
||||
<dd>Engineers write higher-quality code as a result of the readability process; they write more consistent code as a result of the readability process; and they contribute to a culture of code health as a result of the readability process.</dd>
|
||||
<dt>Attention from engineers</dt>
|
||||
<dd>We did not have any attention goal for readability. This is OK! Not all questions about engineering productivity involve trade-offs in all five areas.</dd>
|
||||
<dt>Intellectual complexity</dt>
|
||||
<dd>Engineers learn about the Google codebase and best coding practices as a result of the readability process, and they receive mentoring during the readability <span class="keep-together">process.</span></dd>
|
||||
<dt>Tempo and velocity</dt>
|
||||
<dd>Engineers complete work tasks faster and more efficiently as a result of the readability process.</dd>
|
||||
<dt>Satisfaction</dt>
|
||||
<dd>Engineers see the benefit of the readability process and have positive feelings about participating in it.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="signals">
|
||||
<h1>Signals</h1>
|
||||
|
||||
<p>A signal is the way in which we will know we’ve achieved our goal.<a contenteditable="false" data-primary="Goals/Signals/Metrics (GSM) framework" data-secondary="signals" data-type="indexterm" id="id-dBSLCwIztG"> </a> Not all signals are measurable, but that’s acceptable at this stage.<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="signals" data-type="indexterm" id="id-aGSOImI0tp"> </a> There is not a 1:1 relationship between signals and goals. Every goal should have at least one signal, but they might have more. Some goals might also share a signal. <a data-type="xref" href="ch07.html#signals_and_goals">Table 7-1</a> shows some example signals for the goals of the readability process measurement.</p>
|
||||
|
||||
<table class="border" id="signals_and_goals">
|
||||
<caption><span class="label">Table 7-1. </span>Signals and goals</caption>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Goals</th>
|
||||
<th>Signals</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Engineers write higher-quality code as a result of the readability process.</td>
|
||||
<td>Engineers who have been granted readability judge their code to be of higher quality than engineers who have not been granted readability.<br>
|
||||
The readability process has a positive impact on code quality.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Engineers learn about the Google codebase and best coding practices as a result of the readability process.</td>
|
||||
<td>Engineers report learning from the readability process.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Engineers receive mentoring during the readability process.</td>
|
||||
<td>Engineers report positive interactions with experienced Google engineers who serve as reviewers during the readability process.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Engineers complete work tasks faster and more efficiently as a result of the readability process.</td>
|
||||
<td>Engineers who have been granted readability judge themselves to be more productive than engineers who have not been granted readability.<br>
|
||||
Changes written by engineers who have been granted readability are faster to review than changes written by engineers who have not been granted readability.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Engineers see the benefit of the readability process and have positive feelings about participating in it.</td>
|
||||
<td>Engineers view the readability process as being worthwhile.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="metrics">
|
||||
<h1>Metrics</h1>
|
||||
|
||||
<p>Metrics are where we finally determine how we will measure the signal.<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="metrics" data-type="indexterm" id="id-aGSwCmIYup"> </a><a contenteditable="false" data-primary="metrics" data-secondary="in GSM framework" data-type="indexterm" id="id-MRS7IjI3u2"> </a><a contenteditable="false" data-primary="Goals/Signals/Metrics (GSM) framework" data-secondary="metrics" data-type="indexterm" id="id-GRSEHjIwuO"> </a> Metrics are not the signal themselves; they are the measurable proxy of the signal. Because they are a proxy, they might not be a perfect measurement. For this reason, some signals might have multiple metrics as we try to triangulate on the underlying signal.</p>
|
||||
|
||||
<p>For example, to measure whether engineers' code is reviewed faster after readability, we might use a combination of both survey data and logs data. Neither of these metrics really provide the underlying truth. (Human perceptions are fallible, and logs metrics might not be measuring the entire picture of the time an engineer spends reviewing a piece of code or can be confounded by factors unknown at the time, like the size or difficulty of a code change.) However, if these metrics show different results, it signals that possibly one of them is incorrect and we need to explore further. If they are the same, we have more confidence that we have reached some kind of truth.</p>
|
||||
|
||||
<p>Additionally, some signals might not have any associated metric because the signal might simply be unmeasurable at this time. Consider, for example, measuring code quality. Although academic literature has proposed many proxies for code quality, none of them have truly captured it. For readability, we had a decision of either using a poor proxy and possibly making a decision based on it, or simply acknowledging that this is a point that cannot currently be measured. Ultimately, we decided not to capture this as a quantitative measure, though we did ask engineers to self-rate their code quality.</p>
|
||||
|
||||
<p>Following the GSM framework is a great way to clarify the goals for why you are measuring your software process and how it will actually be measured. However, it’s still possible that the metrics selected are not telling the complete story because they are not capturing the desired signal. At Google, we use qualitative data to validate our metrics and ensure that they are capturing the intended signal.<a contenteditable="false" data-primary="Goals/Signals/Metrics (GSM) framework" data-startref="ix_GSM" data-type="indexterm" id="id-NRS1CdU4ud"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="using_data_to_validate_metrics">
|
||||
<h1>Using Data to Validate Metrics</h1>
|
||||
|
||||
<p>As an example, we once created a metric for measuring each engineer’s median build latency; the<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="validating metrics with data" data-type="indexterm" id="ix_engprdval"> </a> goal <a contenteditable="false" data-primary="metrics" data-secondary="using data to validate" data-type="indexterm" id="ix_mtrdata"> </a>was to capture the “typical experience” of engineers’ build latencies. We then ran an <em>experience sampling study</em>. In this style of study, engineers are interrupted in context of doing a task of interest to answer a few questions. After an engineer started a build, we automatically sent them a small survey about their experiences and expectations of build latency. However, in a few cases, the engineers responded that they had not started a build! It turned out that automated tools were starting up builds, but the engineers were not blocked on these results and so it didn’t “count” toward their “typical experience.” We then adjusted the metric to exclude such builds.<sup><a data-type="noteref" id="ch01fn80-marker" href="ch07.html#ch01fn80">5</a></sup></p>
|
||||
|
||||
<p>Quantitative metrics are useful because they give you power and scale. You can measure the experience of engineers across the entire company over a large period of time and have confidence in the results. However, they don’t provide any context or narrative. Quantitative metrics don’t explain why an engineer chose to use an antiquated tool to accomplish their task, or why they took an unusual workflow, or why they circumvented a standard process.<a contenteditable="false" data-primary="qualitative metrics" data-type="indexterm" id="id-GRSKCMHXSO"> </a> Only qualitative studies can provide this information, and only qualitative studies can then provide insight on the next steps to improve a process.</p>
|
||||
|
||||
<p class="pagebreak-before">Consider now the signals presented in <a data-type="xref" href="ch07.html#goalscomma_signalscomma_and_metrics">Table 7-2</a>. What metrics might you create to measure each of those? Some of these signals might be measurable by analyzing tool and code logs. Others are measurable only by directly asking engineers. Still others might not be perfectly measurable—how do we truly measure code quality, for <span class="keep-together">example?</span></p>
|
||||
|
||||
<p>Ultimately, when evaluating the impact of readability on productivity, we ended up with a combination of metrics from three sources. First, we had a survey that was specifically about the readability process. <a contenteditable="false" data-primary="sampling bias" data-type="indexterm" id="id-2mSNCjUESD"> </a><a contenteditable="false" data-primary="recency bias" data-type="indexterm" id="id-BRSjIgU1SM"> </a><a contenteditable="false" data-primary="recall bias" data-type="indexterm" id="id-nLSJHMUESM"> </a>This survey was given to people after they completed the process; this allowed us to get their immediate feedback about the process. This hopefully avoids recall bias,<sup><a data-type="noteref" id="ch01fn81-marker" href="ch07.html#ch01fn81">6</a></sup> but it does introduce both recency bias<sup><a data-type="noteref" id="ch01fn82-marker" href="ch07.html#ch01fn82">7</a></sup> and sampling bias.<sup><a data-type="noteref" id="ch01fn83-marker" href="ch07.html#ch01fn83">8</a></sup> Second, we used a large-scale quarterly survey to track items that were not specifically about readability; instead, they were purely about metrics that we expected readability should affect. Finally, we used fine-grained logs metrics from our developer tools to determine how much time the logs claimed it took engineers to complete specific tasks.<sup><a data-type="noteref" id="ch01fn84-marker" href="ch07.html#ch01fn84">9</a></sup> <a data-type="xref" href="ch07.html#goalscomma_signalscomma_and_metrics">Table 7-2</a> presents the complete list of metrics with their corresponding signals and goals.<a contenteditable="false" data-primary="QUANTS in engineering productivity metrics" data-secondary="in readability process study" data-type="indexterm" id="id-ERSNtRUMSD"> </a><a contenteditable="false" data-primary="Goals/Signals/Metrics (GSM) framework" data-secondary="use for metrics in readability process study" data-type="indexterm" id="id-KRSjuVUgS4"> </a></p>
|
||||
|
||||
<table class="border" id="goalscomma_signalscomma_and_metrics">
|
||||
<caption><span class="label">Table 7-2. </span>Goals, signals, and metrics</caption>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>QUANTS</th>
|
||||
<th>Goal</th>
|
||||
<th>Signal</th>
|
||||
<th>Metric</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><strong>Qu</strong>ality of the code</td>
|
||||
<td>Engineers write higher-quality code as a result of the readability process.</td>
|
||||
<td>Engineers who have been granted readability judge their code to be of higher quality than engineers who have not been granted readability.</td>
|
||||
<td>Quarterly Survey: Proportion of engineers who report being satisfied with the quality of their own code</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>The readability process has a positive impact on code quality.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that readability reviews have no impact or negative impact on code quality</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that participating in the readability process has improved code quality for their team</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td>Engineers write more consistent code as a result of the readability process.</td>
|
||||
<td>Engineers are given consistent feedback and direction in code reviews by readability reviewers as a part of the readability process.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting inconsistency in readability reviewers’ comments and readability criteria.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td>Engineers contribute to a culture of code health as a result of the readability process.</td>
|
||||
<td>Engineers who have been granted readability regularly comment on style and/or readability issues in code reviews.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that they regularly comment on style and/or readability issues in code reviews</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>A</strong>ttention from engineers</td>
|
||||
<td>n/a</td>
|
||||
<td>n/a</td>
|
||||
<td>n/a</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>I<strong>n</strong>tellectual complexity</td>
|
||||
<td>Engineers learn about the Google codebase and best coding practices as a result of the readability process.</td>
|
||||
<td>Engineers report learning from the readability process.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that they learned about four relevant topics</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that learning or gaining expertise was a strength of the readability process</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td>Engineers receive mentoring during the readability process.</td>
|
||||
<td>Engineers report positive interactions with experienced Google engineers who serve as reviewers during the readability process.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that working with readability reviewers was a strength of the readability process</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>T</strong>empo/velocity</td>
|
||||
<td>Engineers are more productive as a result of the readability process.</td>
|
||||
<td>Engineers who have been granted readability judge themselves to be more productive than engineers who have not been granted readability.</td>
|
||||
<td>Quarterly Survey: Proportion of engineers reporting that they’re highly productive</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Engineers report that completing the readability process positively affects their engineering velocity.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that <em>not</em> having readability reduces team engineering velocity</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Changelists (CLs) written by engineers who have been granted readability are faster to review than CLs written by engineers who have not been granted readability.</td>
|
||||
<td>Logs data: Median review time for CLs from authors with readability and without readability</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>CLs written by engineers who have been granted readability are easier to shepherd through code review than CLs written by engineers who have not been granted readability.</td>
|
||||
<td>Logs data: Median shepherding time for CLs from authors with readability and without readability</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>CLs written by engineers who have been granted readability are faster to get through code review than CLs written by engineers who have not been granted readability.</td>
|
||||
<td>Logs data: Median time to submit for CLs from authors with readability and without readability</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>The readability process does not have a negative impact on engineering velocity.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that the readability process negatively impacts their velocity</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that readability reviewers responded promptly</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that timeliness of reviews was a strength of the readability process</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>S</strong>atisfaction</td>
|
||||
<td>Engineers see the benefit of the readability process and have positive feelings about participating in it.</td>
|
||||
<td>Engineers view the readability process as being an overall positive experience.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that their experience with the readability process was positive overall</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Engineers view the readability process as being worthwhile</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that the readability process is worthwhile</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that the quality of readability reviews is a strength of the process</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that thoroughness is a strength of the process</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Engineers do not view the readability process as frustrating.</td>
|
||||
<td>Readability Survey: Proportion of engineers reporting that the readability process is uncertain, unclear, slow, or frustrating</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td> </td>
|
||||
<td>Quarterly Survey: Proportion of engineers reporting that they’re satisfied with their own engineering velocity</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="taking_action_and_tracking_results">
|
||||
<h1>Taking Action and Tracking Results</h1>
|
||||
|
||||
<p>Recall our<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="validating metrics with data" data-startref="ix_engprdval" data-type="indexterm" id="id-GRSKCjIqsO"> </a> original <a contenteditable="false" data-primary="metrics" data-secondary="using data to validate" data-startref="ix_mtrdata" data-type="indexterm" id="id-NRS9IZIRsd"> </a>goal in this chapter: we want to take action and improve productivity.<a contenteditable="false" data-primary="engineering productivity, measuring" data-secondary="taking action and tracking results after performing research" data-type="indexterm" id="id-2mSdH3IPsD"> </a> After performing research on a topic, the team at Google always prepares a list of recommendations for how we can continue to improve.<a contenteditable="false" data-primary="recommendations on research findings" data-type="indexterm" id="id-BRSLhOIzsM"> </a> We might suggest new features to a tool, improving latency of a tool, improving documentation, removing obsolete processes, or even changing the incentive structures for the engineers. Ideally, these recommendations are “tool driven”: it does no good to tell engineers to change their process or way of thinking if the tools do not support them in doing so. We instead always assume that engineers will make the appropriate trade-offs if they have the proper data available and the suitable tools at their disposal.</p>
|
||||
|
||||
<p>For readability, our study showed that it was overall worthwhile: engineers who had achieved readability were satisfied with the process and felt they learned from it. Our logs showed that they also had their code reviewed faster and submitted it faster, even accounting for no longer needing as many reviewers. Our study also showed places for improvement with the process: engineers identified pain points that would have made the process faster or more pleasant. The language teams took these recommendations and improved the tooling and process to make it faster and to be more transparent so that engineers would have a more pleasant experience.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00011">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>At Google, we’ve found that staffing a team of engineering productivity specialists has widespread benefits to software engineering; rather than relying on each team to chart its own course to increase productivity, a centralized team can focus on broad-based solutions to complex problems. Such “human-based” factors are notoriously difficult to measure, and it is important for experts to understand the data being analyzed given that many of the trade-offs involved in changing engineering processes are difficult to measure accurately and often have unintended consequences. Such a team must remain data driven and aim to eliminate subjective bias.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00108">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Before measuring productivity, ask whether the result is actionable, regardless of whether the result is positive or negative. If you can’t do anything with the result, it is likely not worth measuring.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Select meaningful metrics using the GSM framework. A good metric is a reasonable proxy to the signal you’re trying to measure, and it is traceable back to your original goals.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Select metrics that cover all parts of productivity (QUANTS). By doing this, you ensure that you aren’t improving one aspect of productivity (like developer velocity) at the cost of another (like code quality).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Qualitative metrics are metrics, too! Consider having a survey mechanism for tracking longitudinal metrics about engineers’ beliefs. Qualitative metrics should also align with the quantitative metrics; if they do not, it is likely the quantitative metrics that are incorrect.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Aim to create recommendations that are built into the developer workflow and incentive structures. Even though it is sometimes necessary to recommend additional training or documentation, change is more likely to occur if it is built into the developer’s daily habits.<a contenteditable="false" data-primary="engineering productivity, measuring" data-startref="ix_engprd" data-type="indexterm" id="id-XRSRCmCwU4I8ij"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn76"><sup><a href="ch07.html#ch01fn76-marker">1</a></sup>Frederick P. Brooks, <em>The Mythical Man-Month: Essays on Software Engineering</em> (New York: Addison-Wesley, 1995).</p><p data-type="footnote" id="ch01fn77"><sup><a href="ch07.html#ch01fn77-marker">2</a></sup>It’s worth pointing out here that our industry currently disparages “anecdata,” and everyone has a goal of being “data driven.” Yet anecdotes continue to exist because they are powerful. An anecdote can provide context and narrative that raw numbers cannot; it can provide a deep explanation that resonates with others because it mirrors personal experience. Although our researchers do not make decisions on anecdotes, we do use and encourage techniques such as structured interviews and case studies to deeply understand phenomena and provide context to quantitative data.</p><p data-type="footnote" id="ch01fn78"><sup><a href="ch07.html#ch01fn78-marker">3</a></sup>Java and C++ have the greatest amount of tooling support. Both have mature formatters and static analysis tools that catch common mistakes. Both are also heavily funded internally. Even though other language teams, like Python, were interested in the results, clearly there was not going to be a benefit for Python to remove readability if we couldn’t even show the same benefit for Java or C++.</p><p data-type="footnote" id="ch01fn79"><sup><a href="ch07.html#ch01fn79-marker">4</a></sup>“From there it is only a small step to measuring ‘programmer productivity’ in terms of ‘number of lines of code produced per month.’ This is a very costly measuring unit because it encourages the writing of insipid code, but today I am less interested in how foolish a unit it is from even a pure business point of view. My point today is that, if we wish to count lines of code, we should not regard them as ‘lines produced’ but as ‘lines spent’: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.” Edsger Dijkstra, on the cruelty of really teaching computing science, <a href="https://oreil.ly/ABAX1">EWD Manuscript 1036</a>.</p><p data-type="footnote" id="ch01fn80"><sup><a href="ch07.html#ch01fn80-marker">5</a></sup>It has routinely been our experience at Google that when the quantitative and qualitative metrics disagree, it was because the quantitative metrics were not capturing the expected result.</p><p data-type="footnote" id="ch01fn81"><sup><a href="ch07.html#ch01fn81-marker">6</a></sup>Recall bias is the bias from memory. People are more likely to recall events that are particularly interesting or frustrating.</p><p data-type="footnote" id="ch01fn82"><sup><a href="ch07.html#ch01fn82-marker">7</a></sup>Recency bias is another form of bias from memory in which people are biased toward their most recent experience. In this case, as they just successfully completed the process, they might be feeling particularly good about it.</p><p data-type="footnote" id="ch01fn83"><sup><a href="ch07.html#ch01fn83-marker">8</a></sup>Because we asked only those people who completed the process, we aren’t capturing the opinions of those who did not complete the process.</p><p data-type="footnote" id="ch01fn84"><sup><a href="ch07.html#ch01fn84-marker">9</a></sup>There is a temptation to use such metrics to evaluate individual engineers, or perhaps even to identify high and low performers. Doing so would be counterproductive, though. If productivity metrics are used for performance reviews, engineers will be quick to game the metrics, and they will no longer be useful for measuring and improving productivity across the organization. The only way to make these measurements work is to let go of the idea of measuring individuals and embrace measuring the aggregate effect.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
443
clones/abseil.io/resources/swe-book/html/ch08.html
Normal file
375
clones/abseil.io/resources/swe-book/html/ch09.html
Normal file
|
@ -0,0 +1,375 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="code_review-id00002">
|
||||
<h1>Code Review</h1>
|
||||
|
||||
<p class="byline">Written by Tom Manshreck and Caitlin Sadowski</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>Code review is a process in which code is reviewed by someone other than the author, often before the introduction of that code into a codebase. <a contenteditable="false" data-primary="code reviews" data-type="indexterm" id="ix_cdrev_chapter"> </a>Although that is a simple definition, implementations of the process of code review vary widely throughout the software industry. Some organizations have a select group of “gatekeepers” across the codebase that review changes. Others delegate code review processes to smaller teams, allowing different teams to require different levels of code review. At Google, essentially every change is reviewed before being committed, and every engineer is responsible for initiating reviews and reviewing changes.</p>
|
||||
|
||||
<p>Code reviews generally require a combination of a process and a tool supporting that process. At Google, we use<a contenteditable="false" data-primary="Critique code review tool" data-type="indexterm" id="id-RgTvHOuV"> </a> a custom code review tool, Critique, to support our process.<sup><a data-type="noteref" id="ch01fn103-marker" href="ch09.html#ch01fn103">1</a></sup> Critique is an important enough tool at Google to warrant its own chapter in this book. This chapter focuses on the process of code review as it is practiced at Google rather than the specific tool, both because these foundations are older than the tool and because most of these insights can be adapted to whatever tool you might use for code review.</p>
|
||||
|
||||
<div data-type="note" id="id-ROIYh1"><h6>Note</h6>
|
||||
<p>For more information on Critique, see <a data-type="xref" href="ch19.html#critique_googleapostrophes_code_review">Critique: Google’s Code Review Tool</a>.</p>
|
||||
</div>
|
||||
|
||||
<p>Some<a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-type="indexterm" id="id-NoTrHyfO"> </a> of the benefits of code review, such as detecting bugs in code before they enter a codebase, are well established<sup><a data-type="noteref" id="ch01fn104-marker" href="ch09.html#ch01fn104">2</a></sup> and somewhat obvious (if imprecisely measured). Other benefits, however, are more subtle. Because the code review process at Google is so ubiquitous and extensive, we’ve noticed many of these more subtle effects, including psychological ones, which provide many benefits to an organization over time and scale.</p>
|
||||
|
||||
<section data-type="sect1" id="code_review_flow-id00005">
|
||||
<h1>Code Review Flow</h1>
|
||||
|
||||
<p>Code reviews can happen at many stages of software development. <a contenteditable="false" data-primary="code reviews" data-secondary="steps in" data-type="indexterm" id="id-MzTQHRtqUQ"> </a>At Google, code reviews take place before a change can be committed to the codebase; this stage is also known as a <em>precommit review</em>. The primary end goal of a code review is to get another engineer to consent to the change, which we denote by tagging the change as “looks good to me” (LGTM). <a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-type="indexterm" id="id-YNTYcOtYUQ"> </a>We use this LGTM as a necessary permissions “bit” (combined with other bits noted below) to allow the change to be committed.</p>
|
||||
|
||||
<p>A typical code review at Google goes through the following steps:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>A user writes a change to the codebase in their workspace. This <em>author</em> then creates a snapshot of the change: a patch and corresponding description that are uploaded to the code review tool. This change produces a <em>diff</em> against the codebase, which is used to evaluate what code has changed.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The author can use this initial patch to apply automated review comments or do self-review. When the author is satisfied with the diff of the change, they mail the change to one or more reviewers. This process notifies those reviewers, asking them to view and comment on the snapshot.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Reviewers</em> open the change in the code review tool and post comments on the diff. Some comments request explicit resolution. Some are merely informational.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The author modifies the change and uploads new snapshots based on the feedback and then replies back to the reviewers. Steps 3 and 4 may be repeated multiple times.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>After the reviewers are happy with the latest state of the change, they agree to the change and accept it by marking it as "looks good to me" (LGTM). Only one LGTM is required by default, although convention might request that all reviewers agree to the change.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>After a change is marked LGTM, the author is allowed to commit the change to the codebase, provided they <em>resolve all comments</em> and that the change is <em>approved</em>. We’ll cover approval in the next section.</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>We’ll go over this process in more detail later in this chapter.</p>
|
||||
|
||||
<aside data-type="sidebar" id="code_is_a_liability">
|
||||
<h5>Code Is a Liability</h5>
|
||||
|
||||
<p>It’s important to remember (and accept) that code itself is a liability. <a contenteditable="false" data-primary="code" data-secondary="code as a liability, not an asset" data-type="indexterm" id="id-l1TJHwtlhbUM"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="code as a liability" data-type="indexterm" id="id-6ET6totvhEUA"> </a>It might be a necessary liability, but by itself, code is simply a maintenance task to someone somewhere down the line. Much like the fuel that an airplane carries, it has weight, though it is, of course, <a href="https://oreil.ly/TmoWX">necessary for that airplane to fly</a>.</p>
|
||||
|
||||
<p>New features are often necessary, of course, but care should be taken before developing code in the first place to ensure that any new feature is warranted. Duplicated code not only is a wasted effort, it can actually cost more in time than not having the code at all; changes that could be easily performed under one code pattern often require more effort when there is duplication in the codebase. Writing entirely new code is so frowned upon that some of us have a saying: “If you’re writing it from scratch, you’re doing it wrong!”</p>
|
||||
|
||||
<p>This is especially true of library or utility code. Chances are, if you are writing a utility, someone else somewhere in a codebase the size of Google’s has probably done something similar. Tools such as those discussed in <a data-type="xref" href="ch17.html#code_search">Code Search</a> are therefore critical for both finding such utility code and preventing the introduction of duplicate code. Ideally, this research is done beforehand, and a design for anything new has been communicated to the proper groups before any new code is written.</p>
|
||||
|
||||
<p>Of course, new projects happen, new techniques are introduced, new components are needed, and so on. All that said, a code review is not an occasion to rehash or debate previous design decisions. Design decisions often take time, requiring the circulation of design proposals, debate on the design in API reviews or similar meetings, and perhaps the development of prototypes. As much as a code review of entirely new code should not come out of the blue, the code review process itself should also not be viewed as an opportunity to revisit previous decisions.</p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="how_code_review_works_at_google">
|
||||
<h1>How Code Review Works at Google</h1>
|
||||
|
||||
<p>We’ve pointed out roughly how the typical code review process works, but the devil is in the details.<a contenteditable="false" data-primary="code reviews" data-secondary="how they work at Google" data-type="indexterm" id="ix_cdrevhow"> </a> This section outlines in detail how code review works at Google and how these practices allow it to scale properly over time.</p>
|
||||
|
||||
<p>There are three aspects of review <a contenteditable="false" data-primary="approvals for code changes at Google" data-type="indexterm" id="id-YNTDHnc9CQ"> </a>that require “approval” for any given change at <span class="keep-together">Google:</span></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A correctness and comprehension check from another engineer that the code is appropriate and does what the author claims it does. This is often a team member, though it does not need to be.<a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="correctness and comprehension checks" data-type="indexterm" id="id-JBT6HxH7H5SpCe"> </a> This is reflected in the LGTM permissions “bit,” which will be set after a peer reviewer agrees that the code “looks good” to them.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Approval from one of the code owners that the code is appropriate for this particular part of the codebase (and can be checked into a particular directory). This approval might be implicit if the author is such an owner. Google’s codebase is a tree structure with hierarchical owners of particular directories. (See <a data-type="xref" href="ch16.html#version_control_and_branch_management">Version Control and Branch Management</a>). Owners act as gatekeepers for their particular directories. <a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="code owner's approval and" data-type="indexterm" id="id-l1TqtEH2t1SWCB"> </a>A change might be proposed by any engineer and LGTM'ed by any other engineer, but an owner of the directory in question must also <em>approve</em> this addition to their part of the codebase. Such an owner might be a tech lead or other engineer deemed expert in that particular area of the codebase. It’s generally up to each team to decide how broadly or narrowly to assign ownership privileges.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Approval from someone with language “readability”<sup><a data-type="noteref" id="ch01fn106-marker" href="ch09.html#ch01fn106">3</a></sup> that the code conforms to the language’s style and<a contenteditable="false" data-primary="readability" data-secondary="approval for code changes at Google" data-type="indexterm" id="id-6ET6t4H9clSOCd"> </a> best practices, checking whether the code is written in the manner we expect. This approval, again, might be implicit if the author has such readability. These engineers are pulled from a company-wide pool of engineers who have been granted readability in that programming language.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Although this level of control sounds onerous—and, admittedly, it sometimes is—most reviews have one person assuming all three roles, which speeds up the process quite a bit. Importantly, the author can also assume the latter two roles, needing only an LGTM from another engineer to check code into their own codebase, provided they already have readability in that language (which owners often do).</p>
|
||||
|
||||
<p>These requirements allow the code review process to be quite flexible. A tech lead who is an owner of a project and has that code’s language readability can submit a code change with only an LGTM from another engineer.<a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="tech leads submitting code change after" data-type="indexterm" id="id-l1TJHvhXC3"> </a> An intern without such authority can submit the same change to the same codebase, provided they get approval from an owner with language readability. The three aforementioned permission “bits” can be combined in any combination. An author can even request more than one LGTM from separate people by explicitly tagging the change as wanting an LGTM from all reviewers.</p>
|
||||
|
||||
<p>In practice, most code reviews that require more than one approval usually go through a two-step process: gaining an LGTM from a peer engineer, and then seeking approval from appropriate code owner/readability reviewer(s). This allows the two roles to focus on different aspects of the code review and saves review time. The primary reviewer can focus on code correctness and the general validity of the code change; the code owner can focus on whether this change is appropriate for their part of the codebase without having to focus on the details of each line of code. An approver is often looking for something different than a peer reviewer, in other words. After all, someone is trying to check in code to their project/directory. They are more concerned with questions such as: “Will this code be easy or difficult to maintain?” “Does it add to my technical debt?” “Do we have the expertise to maintain it within our team?”</p>
|
||||
|
||||
<p>If all three of these types of reviews can be handled by one reviewer, why not just have those types of reviewers handle all code reviews? The short answer is scale. Separating the three roles adds flexibility to the code review process. If you are working with a peer on a new function within a utility library, you can get someone on your team to review the code for code correctness and comprehension. After several rounds (perhaps over several days), your code satisfies your peer reviewer and you get an LGTM. Now, you need only get an <em>owner</em> of the library (and owners often have appropriate readability) to approve the change.<a contenteditable="false" data-primary="code reviews" data-secondary="how they work at Google" data-startref="ix_cdrevhow" data-type="indexterm" id="id-e8TKtnU3Cn"> </a></p>
|
||||
|
||||
<aside data-type="sidebar" id="ownership">
|
||||
<h5>Ownership</h5>
|
||||
|
||||
<p class="byline">Hyrum Wright</p>
|
||||
|
||||
<p>When working on a small team in<a contenteditable="false" data-primary="code reviews" data-secondary="ownership of code" data-type="indexterm" id="ix_cdrevown"> </a> a dedicated repository, it’s common to grant the entire team access to everything in the repository. <a contenteditable="false" data-primary="ownership of code" data-type="indexterm" id="ix_own"> </a>After all, you know the other engineers, the domain is narrow enough that each of you can be experts, and small numbers constrain the effect of potential errors.</p>
|
||||
|
||||
<p>As the team grows larger, this approach can fail to scale. The result is either a messy repository split or a different approach to recording who has what knowledge and responsibilities in different parts of the repository. At Google, we call this set of knowledge and responsibilities <em>ownership</em> and the people to exercise them <em>owners</em>. This concept is different than possession of a collection of source code, but rather implies a sense of stewardship to act in the company’s best interest with a section of the codebase. (Indeed, “stewards” would almost certainly be a better term if we had it to do over again.)</p>
|
||||
|
||||
<p>Specially named OWNERS files list usernames of people who have ownership responsibilities for a directory and its children. These files may also contain references to other OWNERS files or external access control lists, but eventually they resolve to a list of individuals. Each subdirectory may also contain a separate OWNERS file, and the relationship is hierarchically additive: a given file is generally owned by the union of the members of all the OWNERS files above it in the directory tree. OWNERS files may have as many entries as teams like, but we encourage a relatively small and focused list to ensure responsibility is clear.</p>
|
||||
|
||||
<p>Ownership of Google’s code conveys approval rights for code within one’s purview, but these rights also come with a set of responsibilities, such as understanding the code that is owned or knowing how to find somebody who does. Different teams have different criteria for granting ownership to new members, but we generally encourage them not to use ownership as a rite of initiation and encourage departing members to yield ownership as soon as is practical.</p>
|
||||
|
||||
<p>This distributed ownership structure enables many of the other practices we’ve outlined in this book. For example, the set of people in the root OWNERS file can act as global approvers for large-scale changes (see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>) without having to bother local teams. Likewise, OWNERS files act as a kind of documentation, making it easy for people and tools to find those responsible for a given piece of code just by walking up the directory tree. When new projects are created, there’s no central authority that has to register new ownership privileges: a new OWNERS file is sufficient.</p>
|
||||
|
||||
<p>This ownership mechanism is simple, yet powerful, and has scaled well over the past two decades. It is one of the ways that Google ensures that tens of thousands of engineers can operate efficiently on billions of lines of code<a contenteditable="false" data-primary="ownership of code" data-startref="ix_own" data-type="indexterm" id="id-KvTLHAU3CyCE"> </a> in a single <a contenteditable="false" data-primary="code reviews" data-secondary="ownership of code" data-startref="ix_cdrevown" data-type="indexterm" id="id-V0TbtQUKCzCQ"> </a>repository.</p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="code_review_benefits">
|
||||
<h1>Code Review Benefits</h1>
|
||||
|
||||
<p>Across the industry, code review itself is not controversial, although it is far from a universal practice.<a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-type="indexterm" id="ix_cdrevben"> </a> Many (maybe even most) other companies and open source projects have some form of code review, and most view the process as important as a sanity check on the introduction of new code into a codebase. Software engineers understand some of the more obvious benefits of code review, even if they might not personally think it applies in all cases. But at Google, this process is generally more thorough and wide spread than at most other companies.</p>
|
||||
|
||||
<p>Google’s culture, like that of a lot of software companies, is based on giving engineers wide latitude in how they do their jobs.<a contenteditable="false" data-primary="software engineers" data-secondary="code reviews and" data-type="indexterm" id="id-JBT6HXczIg"> </a> There is a recognition that strict processes tend not to work well for a dynamic company needing to respond quickly to new technologies, and that bureaucratic rules tend not to work well with creative professionals. Code review, however, is a mandate, one of the few blanket processes in which all software engineers at Google must participate. Google requires code review for almost<sup><a data-type="noteref" id="ch01fn107-marker" href="ch09.html#ch01fn107">4</a></sup> every code change to the codebase, no matter how small. This mandate does have a cost and effect on engineering velocity given that it does slow down the introduction of new code into a codebase and can impact time-to-production for any given code change. (Both of these are common complaints by software engineers of strict code review processes.) Why, then, do we require this process? Why do we believe that this is a long-term benefit?</p>
|
||||
|
||||
<p>A well-designed code review process and a culture of taking code review seriously provides the following benefits:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Checks code correctness</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Ensures the code change is comprehensible to other engineers</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Enforces consistency across the codebase</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Psychologically promotes team ownership</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Enables knowledge sharing</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Provides a historical record of the code review itself</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Many of these benefits are critical to a software organization over time, and many of them are beneficial to not only the author but also the reviewers. The following sections go into more specifics for each of these items.</p>
|
||||
|
||||
<section data-type="sect2" id="code_correctness">
|
||||
<h2>Code Correctness</h2>
|
||||
|
||||
<p>An obvious benefit of code review is that it allows a reviewer to check the “correctness” of the code change. <a contenteditable="false" data-primary="correctness of code" data-type="indexterm" id="id-e8TLHxtdfVIw"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-tertiary="correctness of code" data-type="indexterm" id="id-Z7T4tNtxfaI3"> </a>Having another set of eyes look over a change helps ensure that the change does what was intended. Reviewers typically look for whether a change has proper testing, is properly designed, and functions correctly and efficiently. In many cases, checking code correctness is checking whether the particular change can introduce bugs into the codebase.</p>
|
||||
|
||||
<p>Many reports point to the efficacy of code review in the prevention of future bugs in software. A study at IBM found that discovering defects earlier in a process, unsurprisingly, led to less time required to fix them later on.<sup><a data-type="noteref" id="ch01fn108-marker" href="ch09.html#ch01fn108">5</a></sup> The investment in the time for code review saved time otherwise spent in testing, debugging, and performing regressions, provided that the code review process itself was streamlined to keep it lightweight. This latter point is important; code review processes that are heavyweight, or that don’t scale properly, become unsustainable.<sup><a data-type="noteref" id="ch01fn109-marker" href="ch09.html#ch01fn109">6</a></sup> We will get into some best practices for keeping the process lightweight later in this chapter.</p>
|
||||
|
||||
<p>To prevent the evaluation of correctness from becoming more subjective than objective, authors are generally given deference to their particular approach, whether it be in the design or the function of the introduced change. A reviewer shouldn’t propose alternatives because of personal opinion. Reviewers can propose alternatives, but only if they improve comprehension (by being less complex, for example) or functionality (by being more efficient, for example). In general, engineers are encouraged to approve changes that improve the codebase rather than wait for consensus on a more “perfect” solution. This focus tends to speed up code reviews.</p>
|
||||
|
||||
<p>As tooling becomes stronger, many correctness checks are performed automatically through <a contenteditable="false" data-primary="static analysis tools" data-secondary="for code correctness" data-type="indexterm" id="id-X3T4HYulfoIY"> </a>techniques such as static analysis and <a contenteditable="false" data-primary="automated testing" data-secondary="code correctness checks" data-type="indexterm" id="id-47T2tduEf2Ij"> </a>automated testing (though tooling might never completely obviate the value for human-based inspection of code—see <a data-type="xref" href="ch20.html#static_analysis-id00082">Static Analysis</a> for more information). Though this tooling has its limits, it has definitely lessoned the need to rely on human-based code reviews for checking code <span class="keep-together">correctness.</span></p>
|
||||
|
||||
<p>That said, checking for defects during the initial code review process is still an integral part of a general “shift left” strategy, aiming to discover and resolve issues at the earliest possible time so that they don’t require escalated costs and resources farther down in the development cycle. A code review is neither a panacea nor the only check for such correctness, but it is an element of a defense-in-depth against such problems in software. As a result, code review does not need to be “perfect” to achieve results.</p>
|
||||
|
||||
<p>Surprisingly enough, checking for code correctness is not the primary benefit Google accrues from the process of code review. Checking for code correctness generally ensures that a change works, but more importance is attached to ensuring that a code change is understandable and makes sense over time and as the codebase itself scales. To evaluate those aspects, we need to look at factors other than whether the code is simply logically “correct” or understood.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="comprehension_of_code">
|
||||
<h2>Comprehension of Code</h2>
|
||||
|
||||
<p>A code review typically is the first opportunity for someone other than the author to inspect a change. <a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-tertiary="comprehension of code" data-type="indexterm" id="id-Z7T9HNtyUaI3"> </a><a contenteditable="false" data-primary="comprehension of code" data-type="indexterm" id="id-8JT4tOtWUoIB"> </a>This perspective allows a reviewer to do something that even the best engineer cannot do: provide feedback unbiased by an author’s perspective. <em>A code review is often the first test of whether a given change is understandable to a broader audience</em>. This perspective is vitally important because code will be read many more times than it is written, and understanding and comprehension are critically important.</p>
|
||||
|
||||
<p>It is often useful to find a reviewer who has a different perspective from the author, especially a reviewer who might need, as part of their job, to maintain or use the code being proposed within the change. Unlike the deference reviewers should give authors regarding design decisions, it’s often useful to treat questions on code comprehension using the maxim “the customer is always right.” In some respect, any questions you get now will be multiplied many-fold over time, so view each question on code comprehension as valid. This doesn’t mean that you need to change your approach or your logic in response to the criticism, but it does mean that you might need to explain it more clearly.</p>
|
||||
|
||||
<p>Together, the code correctness and code comprehension checks are the main criteria for an LGTM from another engineer, which is one of the approval bits needed for an approved code review. When an engineer marks a code review as LGTM, they are saying that the code does what it says and that it is understandable. Google, however, also requires that the code be sustainably maintained, so we have additional approvals needed for code in certain cases.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="code_consistency">
|
||||
<h2>Code Consistency</h2>
|
||||
|
||||
<p>At scale, code that you write will be depended on, and eventually maintained, by someone else.<a contenteditable="false" data-primary="consistency within the codebase" data-secondary="ensuring with code reviews" data-type="indexterm" id="id-8JT0HOt7CoIB"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-tertiary="code consistency" data-type="indexterm" id="id-X3TKtOtqCoIY"> </a> Many others will need to read your code and understand what you did. Others (including automated tools) might need to refactor your code long after you’ve moved to another project. Code, therefore, needs to conform to some standards of consistency so that it can be understood and maintained. Code should also avoid being overly complex; simpler code is easier for others to understand and maintain as well. Reviewers can assess how well this code lives up to the standards of the codebase itself during code review. A code review, therefore, should act to ensure <em>code health</em>.</p>
|
||||
|
||||
<p>It is for maintainability that the LGTM state of a <a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="separation from readability approval" data-type="indexterm" id="id-X3T4HqcqCoIY"> </a>code review (indicating code correctness and comprehension) is separated from that of readability approval. Readability approvals can be granted only by individuals who have successfully gone through the process of code readability training in a particular programming language. For example, Java code requires approval from an engineer who has “Java readability.”</p>
|
||||
|
||||
<p>A readability approver is tasked with reviewing <a contenteditable="false" data-primary="readability" data-secondary="ensuring with code reviews" data-type="indexterm" id="id-47TrHMSrC2Ij"> </a>code to ensure that it follows agreed-on best practices for that particular programming language, is consistent with the codebase for that language within Google’s code repository, and avoids being overly complex. Code that is consistent and simple is easier to understand and easier for tools to update when it comes time for refactoring, making it more resilient. If a particular pattern is always done in one fashion in the codebase, it’s easier to write a tool to refactor it.</p>
|
||||
|
||||
<p>Additionally, code might be written only once, but it will be read dozens, hundreds, or even thousands of times. Having code that is consistent across the codebase improves comprehension for all of engineering, and this consistency even affects the process of code review itself. Consistency sometimes clashes with functionality; a readability reviewer may prefer a less complex change that may not be functionally “better” but is easier to understand.</p>
|
||||
|
||||
<p>With a more consistent codebase, it is easier for engineers to step in and review code on someone else’s projects. Engineers might occasionally need to look outside the team for help in a code review. Being able to reach out and ask experts to review the code, knowing they can expect the code itself to be consistent, allows those engineers to focus more properly on code correctness and comprehension.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="psychological_and_cultural_benefits">
|
||||
<h2>Psychological and Cultural Benefits</h2>
|
||||
|
||||
<p>Code review also has important cultural benefits: it reinforces to software engineers that code is not “theirs” but in fact part of a collective enterprise.<a contenteditable="false" data-primary="culture" data-secondary="cultural benefits of code reviews" data-type="indexterm" id="id-X3T4HOtgIoIY"> </a><a contenteditable="false" data-primary="psychological benefits of code reviews" data-type="indexterm" id="id-47T2tatBI2Ij"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-tertiary="psychological and cultural" data-type="indexterm" id="id-dZT8cVtnIxIJ"> </a> Such psychological benefits can be subtle but are still important. Without code review, most engineers would naturally gravitate toward personal style and their own approach to software design. The code review process forces an author to not only let others have input, but to compromise for the sake of the greater good.</p>
|
||||
|
||||
<p>It is human nature to be proud of one’s craft and to be reluctant to open up one’s code to criticism by others. It is also natural to be somewhat reticent to welcome critical feedback about code that one writes. The code review process provides a mechanism to mitigate what might otherwise be an emotionally charged interaction. Code review, when it works best, provides not only a challenge to an engineer’s assumptions, but also does so in a prescribed, neutral manner, acting to temper any criticism which might otherwise be directed to the author if provided in an unsolicited manner. After all, the process <em>requires</em> critical review (we in fact call our code review tool “Critique”), so you can’t fault a reviewer for doing their job and being critical. The code review process itself, therefore, can act as the “bad cop,” whereas the reviewer can still be seen as the “good cop.”</p>
|
||||
|
||||
<p>Of course, not all, or even most, engineers need such psychological devices. But buffering such criticism through the process of code review often provides a much gentler introduction for most engineers to the expectations of the team. Many engineers joining Google, or a new team, are intimidated by code review. It is easy to think that any form of critical review reflects negatively on a person’s job performance. But over time, almost all engineers come to expect to be challenged when sending a code review and come to value the advice and questions offered through this process (though, admittedly, this sometimes takes a while).</p>
|
||||
|
||||
<p>Another psychological benefit of code review is validation. Even the most capable engineers can suffer from imposter syndrome and be too self-critical. A process like code review acts as validation and recognition for one’s work. Often, the process involves an exchange of ideas and knowledge sharing (covered in the next section), which benefits both the reviewer and the reviewee. As an engineer grows in their domain knowledge, it’s sometimes difficult for them to get positive feedback on how they improve. The process of code review can provide that mechanism.</p>
|
||||
|
||||
<p>The process of initiating a code review also forces all authors to take a little extra care with their changes. Many software engineers are not perfectionists; most will admit that code that “gets the job done” is better than code that is perfect but that takes too long to develop. Without code review, it’s natural that many of us would cut corners, even with the full intention of correcting such defects later. “Sure, I don’t have all of the unit tests done, but I can do that later.” A code review forces an engineer to resolve those issues before sending the change. Collecting the components of a change for code review psychologically forces an engineer to make sure that all of their ducks are in a row. The little moment of reflection that comes before sending off your change is the perfect time to read through your change and make sure you’re not missing anything.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="knowledge_sharing-id00052">
|
||||
<h2>Knowledge Sharing</h2>
|
||||
|
||||
<p>One of the most important, but underrated, benefits of code review is in knowledge sharing.<a contenteditable="false" data-primary="knowledge sharing" data-secondary="as benefit of code reviews" data-type="indexterm" id="id-47TrHatlT2Ij"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-tertiary="knowledge sharing" data-type="indexterm" id="id-dZT3tVt1TxIJ"> </a> Most authors pick reviewers who are experts, or at least knowledgeable, in the area under review. The review process allows reviewers to impart domain knowledge to the author, allowing the reviewer(s) to offer suggestions, new techniques, or advisory information to the author. (Reviewers can even mark some comments “FYI,” requiring no action; they are simply added as an aid to the author.) Authors who become particularly proficient in an area of the codebase will often become owners as well, who then in turn will be able to act as reviewers for other engineers.</p>
|
||||
|
||||
<p>Part of the code review process of feedback and confirmation involves asking questions on why the change is done in a particular way. This exchange of information facilitates knowledge sharing. In fact, many code reviews involve an exchange of information both ways: the authors as well as the reviewers can learn new techniques and patterns from code review. At Google, reviewers may even directly share suggested edits with an author within the code review tool itself.</p>
|
||||
|
||||
<p>An engineer may not read every email sent to them, but they tend to respond to every code review sent. This knowledge sharing can occur across time zones and projects as well, using Google’s scale to disseminate information quickly to engineers in all corners of the codebase. Code review is a perfect time for knowledge transfer: it is timely and actionable. (Many engineers at Google “meet” other engineers first through their code reviews!)</p>
|
||||
|
||||
<p>Given the amount of time Google engineers spend in code review, the knowledge accrued is quite significant. A Google engineer’s primary task is still programming, of course, but a large chunk of their time is still spent in code review. The code review process provides one of the primary ways that software engineers interact with one another and exchange information about coding techniques. Often, new patterns are advertised within the context of code review, sometimes through refactorings such as large-scale changes.</p>
|
||||
|
||||
<p>Moreover, because each change becomes part of the codebase, code review acts as a historical record. Any engineer can inspect the Google codebase and determine when some particular pattern was introduced and bring up the actual code review in <span class="keep-together">question</span>. Often, that archeology provides insights to many more engineers than the original author and reviewer(s).<a contenteditable="false" data-primary="code reviews" data-secondary="benefits of" data-startref="ix_cdrevben" data-type="indexterm" id="id-12T9tVhnT5Iw"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="code_review_best_practices">
|
||||
<h1>Code Review Best Practices</h1>
|
||||
|
||||
<p>Code review can, admittedly, introduce friction<a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-type="indexterm" id="ix_cdrevbp"> </a> and delay to an organization. Most of these issues are not problems with code review per se, but with their chosen implementation of code review. Keeping the code review process running smoothly at Google is no different, and it requires a number of best practices to ensure that code review is worth the effort put into the process. Most of those practices emphasize keeping the process nimble and quick so that code review can scale properly.</p>
|
||||
|
||||
<section data-type="sect2" id="be_polite_and_professional">
|
||||
<h2>Be Polite and Professional</h2>
|
||||
|
||||
<p>As pointed out in the Culture section of this book, Google heavily fosters a culture of trust and respect. <a contenteditable="false" data-primary="politeness and professionalism in code reviews" data-type="indexterm" id="id-l1TJHwtVcATM"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-tertiary="being polite and professional" data-type="indexterm" id="id-6ET6tot9ceTA"> </a>This filters down into our perspective on code review. A software engineer needs an LGTM from only one other engineer to satisfy our requirement on code comprehension, for example. Many engineers make comments and LGTM a change with the understanding that the change can be submitted after those changes are made, without any additional rounds of review. That said, code reviews can introduce anxiety and stress to even the most capable engineers. It is critically important to keep all feedback and criticism firmly in the professional realm.<a contenteditable="false" data-primary="professionalism in code reviews" data-type="indexterm" id="id-9ATWcot4cRTL"> </a></p>
|
||||
|
||||
<p>In general, reviewers should defer to authors on particular approaches and only point out alternatives if the author’s approach is deficient. If an author can demonstrate that several approaches are equally valid, the reviewer should accept the preference of the author. Even in those cases, if defects are found in an approach, consider the review a learning opportunity (for both sides!). All comments should remain strictly professional. Reviewers should be careful about jumping to conclusions based on a code author’s particular approach. It’s better to ask questions on why something was done the way it was before assuming that approach is wrong.</p>
|
||||
|
||||
<p>Reviewers should be prompt with their feedback. At Google, we expect feedback from a code review within 24 (working) hours. If a reviewer is unable to complete a review in that time, it’s good practice (and expected) to respond that they’ve at least seen the change and will get to the review as soon as possible. Reviewers should avoid responding to the code review in piecemeal fashion. Few things annoy an author more than getting feedback from a review, addressing it, and then continuing to get unrelated further feedback in the review process.</p>
|
||||
|
||||
<p>As much as we expect professionalism on the part of the reviewer, we expect professionalism on the part of the author as well. Remember that you are not your code, and that this change you propose is not “yours” but the team’s. After you check that piece of code into the codebase, it is no longer yours in any case. Be receptive to <span class="keep-together">questions</span> on your approach, and be prepared to explain why you did things in certain ways. Remember that part of the responsibility of an author is to make sure this code is understandable and maintainable for the future.</p>
|
||||
|
||||
<p>It’s important to treat each reviewer comment within a code review as a TODO item; a particular comment might not need to be accepted without question, but it should at least be addressed. If you disagree with a reviewer’s comment, let them know, and let them know why and don’t mark a comment as resolved until each side has had a chance to offer alternatives. One common way to keep such debates civil if an author doesn’t agree with a reviewer is to offer an alternative and ask the reviewer to PTAL (please take another look). Remember that code review is a learning opportunity for both the reviewer and the author. That insight often helps to mitigate any chances for disagreement.</p>
|
||||
|
||||
<p>By the same token, if you are an owner of code and responding to a code review within your codebase, be amenable to changes from an outside author. As long as the change is an improvement to the codebase, you should still give deference to the author that the change indicates something that could and should be improved.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="write_small_changes">
|
||||
<h2>Write Small Changes</h2>
|
||||
|
||||
<p>Probably the most important practice to keep the code review process nimble is to keep changes small. <a contenteditable="false" data-primary="changes to code" data-secondary="writing small changes" data-type="indexterm" id="id-6ETWHot0SeTA"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-tertiary="writing small changes" data-type="indexterm" id="id-9ATBtotOSRTL"> </a>A code review should ideally be easy to digest and focus on a single issue, both for the reviewer and the author. Google’s code review process discourages massive changes consisting of fully formed projects, and reviewers can rightfully reject such changes as being too large for a single review. Smaller changes also prevent engineers from wasting time waiting for reviews on larger changes, reducing downtime. These small changes have benefits further down in the software development process as well. It is far easier to determine the source of a bug within a change if that particular change is small enough to narrow it down.</p>
|
||||
|
||||
<p>That said, it’s important to acknowledge that a code review process that relies on small changes is sometimes difficult to reconcile with the introduction of major new features. A set of small, incremental code changes can be easier to digest individually, but more difficult to comprehend within a larger scheme. Some engineers at Google admittedly are not fans of the preference given to small changes. Techniques exist for managing such code changes (development on integration branches, management of changes using a diff base different than HEAD), but those techniques inevitably involve more overhead. Consider the optimization for small changes just that: an optimization, and allow your process to accommodate the occasional larger change.</p>
|
||||
|
||||
<p>“Small” changes should generally be limited to about 200 lines of code. A small change should be easy on a reviewer and, almost as important, not be so cumbersome that additional changes are delayed waiting for an extensive review. Most changes at Google are expected to be reviewed within about a day.<sup><a data-type="noteref" id="ch01fn110-marker" href="ch09.html#ch01fn110">7</a></sup> (This doesn’t necessarily mean that the review is over within a day, but that initial feedback is provided within a day.) About 35% of the changes at Google are to a single file.<sup><a data-type="noteref" id="ch01fn111-marker" href="ch09.html#ch01fn111">8</a></sup> Being easy on a reviewer allows for quicker changes to the codebase and benefits the author as well. The author wants a quick review; waiting on an extensive review for a week or so would likely impact follow-on changes. A small initial review also can prevent much more expensive wasted effort on an incorrect approach further down the line.</p>
|
||||
|
||||
<p>Because code reviews are typically small, it’s common for almost all code reviews at Google to be reviewed by one and only one person. Were that not the case—if a team were expected to weigh in on all changes to a common codebase—there is no way the process itself would scale. By keeping the code reviews small, we enable this optimization. It’s not uncommon for multiple people to comment on any given change—most code reviews are sent to a team member, but also CC’d to appropriate teams—but the primary reviewer is still the one whose LGTM is desired, and only one LGTM is necessary for any given change.<a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="from primary reviewer" data-type="indexterm" id="id-Z7T9H0uZSbT3"> </a> Any other comments, though important, are still optional.</p>
|
||||
|
||||
<p>Keeping changes small also allows the “approval” reviewers to more quickly approve any given changes. They can quickly inspect whether the primary code reviewer did due diligence and focus purely on whether this change augments the codebase while maintaining code health over time.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="write_good_change_descriptions">
|
||||
<h2>Write Good Change Descriptions</h2>
|
||||
|
||||
<p>A change description should indicate its type of change on the first line, as a summary.<a contenteditable="false" data-primary="changes to code" data-secondary="writing good change descriptions" data-type="indexterm" id="id-9ATQHotWuRTL"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-tertiary="writing good change descriptions" data-type="indexterm" id="id-e8TKtxt6uDTw"> </a><a contenteditable="false" data-primary="documentation" data-secondary="for code changes" data-type="indexterm" id="id-Z7TpcNtvubT3"> </a> The first line is prime real estate and is used to provide summaries within the code review tool itself, to act as the subject line in any associated emails, and to become the visible line Google engineers see in a history summary within Code Search (see <a data-type="xref" href="ch17.html#code_search">Code Search</a>), so that first line is important.<a contenteditable="false" data-primary="Code Search" data-type="indexterm" id="id-X3TMuOtzuLTY"> </a></p>
|
||||
|
||||
<p>Although the first line should be a summary of the entire change, the description should still go into detail on what is being changed <em>and why</em>. A description of “Bug fix” is not helpful to a reviewer or a future code archeologist. If several related modifications were made in the change, enumerate them within a list (while still keeping it on message and small). The description is the historical record for this change, and tools such as Code Search allow you to find who wrote what line in any particular change in the codebase. Drilling down into the original change is often useful when trying to fix a bug.</p>
|
||||
|
||||
<p>Descriptions aren’t the only opportunity for adding documentation to a change. When writing a public API, you generally don’t want to leak implementation details, but by all means do so within the actual implementation, where you should comment liberally. If a reviewer does not understand why you did something, even if it is correct, it is a good indicator that such code needs better structure or better comments (or both). If, during the code review process, a new decision is reached, update the change description, or add appropriate comments within the implementation. A code review is not just something that you do in the present time; it is something you do to record what you did for posterity.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="keep_reviewers_to_a_minimum">
|
||||
<h2>Keep Reviewers to a Minimum</h2>
|
||||
|
||||
<p>Most code reviews at<a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-tertiary="keeping reviewers to a minimum" data-type="indexterm" id="id-e8TLHxtLhDTw"> </a> Google are reviewed by precisely one reviewer.<sup><a data-type="noteref" id="ch01fn112-marker" href="ch09.html#ch01fn112">9</a></sup> Because the code <a contenteditable="false" data-primary="reviewers of code, keeping to a minimum" data-type="indexterm" id="id-8JTocOtahVTB"> </a>review process allows the bits on code correctness, owner acceptance, and language readability to be handled by one individual, the code review process scales quite well across an organization the size of Google.</p>
|
||||
|
||||
<p>There is a tendency within the industry, and within individuals, to try to get additional input (and unanimous consent) from a cross-section of engineers. After all, each additional reviewer can add their own particular insight to the code review in question. But we’ve found that this leads to diminishing returns; the most important LGTM is the first one, and subsequent ones don’t add as much as you might think to the equation. The cost of additional reviewers quickly outweighs their value.</p>
|
||||
|
||||
<p>The code review process is optimized around the trust we place in our engineers to do the right thing. In certain cases, it can be useful to get a particular change reviewed by multiple people, but even in those cases, those reviewers should focus on different aspects of the same change.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="automate_where_possible">
|
||||
<h2>Automate Where Possible</h2>
|
||||
|
||||
<p>Code review is a human process, and that human input is important, but if there are components of the code process that can be automated, try to do so.<a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-tertiary="automating where possible" data-type="indexterm" id="id-Z7T9HNtxfbT3"> </a> Opportunities to automate mechanical human tasks should be explored; investments in proper tooling reap dividends.<a contenteditable="false" data-primary="automation" data-secondary="of code reviews" data-type="indexterm" id="id-8JT4tOtyfVTB"> </a> At Google, our code review tooling allows authors to automatically submit and automatically sync changes to the source control system upon approval (usually used for fairly simple changes).</p>
|
||||
|
||||
<p>One of the most important technological improvements regarding automation over the past few years is automatic static analysis of a given code change (see <a data-type="xref" href="ch20.html#static_analysis-id00082">Static Analysis</a>). Rather than require authors to run tests, linters, or formatters, the current Google code review tooling provides most of that utility automatically through<a contenteditable="false" data-primary="presubmits" data-type="indexterm" id="id-X3TKtqclfLTY"> </a> what is known as <em>presubmits</em>. A presubmit process is run when a change is initially sent to a reviewer. Before that change is sent, the presubmit process can detect a variety of problems with the existing change, reject the current change (and prevent sending an awkward email to a reviewer), and ask the original author to fix the change first. Such automation not only helps out with the code review process itself, it also allows the reviewers to focus on more important concerns than formatting.<a contenteditable="false" data-primary="code reviews" data-secondary="best practices" data-startref="ix_cdrevbp" data-type="indexterm" id="id-dZT7SYcLfnTJ"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="types_of_code_reviews">
|
||||
<h1>Types of Code Reviews</h1>
|
||||
|
||||
<p>All code reviews are not alike! Different<a contenteditable="false" data-primary="code reviews" data-secondary="types of" data-type="indexterm" id="ix_cdrevtyp"> </a> types of code review require different levels of focus on the various aspects of the review process. Code changes at Google generally fall into one of the following buckets (though there is sometimes overlap):</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Greenfield reviews and new feature development</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Behavioral changes, improvements, and optimizations</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Bug fixes and rollbacks</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Refactorings and large-scale changes</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<section data-type="sect2" id="greenfield_code_reviews">
|
||||
<h2>Greenfield Code Reviews</h2>
|
||||
|
||||
<p>The least common type of code review is that of entirely new code, a so-called <em>greenfield review</em>.<a contenteditable="false" data-primary="greenfield code reviews" data-type="indexterm" id="id-e8TKtxt7Sqsw"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="types of" data-tertiary="greenfield reviews" data-type="indexterm" id="id-Z7TpcNtZS9s3"> </a> A greenfield review is the most important time to evaluate whether the code will stand the test of time: that it will be easier to maintain as time and scale change the underlying assumptions of the code. Of course, the introduction of entirely new code should not come as a surprise. As mentioned earlier in this chapter, code is a liability, so the introduction of entirely new code should generally solve a real problem rather than simply provide yet another alternative. <a contenteditable="false" data-primary="design reviews for new code or projects" data-type="indexterm" id="id-8JTwSOtXSAsB"> </a>At Google, we generally require new code and/or projects to undergo an extensive design review, apart from a code review. A code review is not the time to debate design decisions already made in the past (and by the same token, a code review is not the time to introduce the design of a proposed API).</p>
|
||||
|
||||
<p>To ensure that code is sustainable, a greenfield review should ensure that an API matches an agreed design (which may require reviewing a design document) and is tested <em>fully</em>, with all API endpoints having some form of unit test, and that those tests fail when the code’s assumptions change. (See <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>). <a contenteditable="false" data-primary="ownership of code" data-secondary="for greenfield reviews" data-type="indexterm" id="id-8JToczcXSAsB"> </a>The code should also have proper owners (one of the first reviews in a new project is often of a single OWNERS file for the new directory), be sufficiently commented, and provide supplemental documentation, if needed.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="greenfield reviews necessitating for a project" data-type="indexterm" id="id-X3T0SqcJSgsY"> </a> A greenfield review might also necessitate the introduction of a project into the continuous integration system. (See <a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a>).</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="behavioral_changescomma_improvementscom">
|
||||
<h2>Behavioral Changes, Improvements, and Optimizations</h2>
|
||||
|
||||
<p>Most changes at Google generally fall into the broad category of modifications to existing code within the codebase.<a contenteditable="false" data-primary="code reviews" data-secondary="types of" data-tertiary="behavioral changes, improvements, and optimizations" data-type="indexterm" id="id-e8TLHxt6uqsw"> </a> These additions may include modifications to API endpoints,<a contenteditable="false" data-primary="improvements to existing code, code reviews for" data-type="indexterm" id="id-Z7T4tNtvu9s3"> </a> improvements to existing implementations, or optimizations for other factors such as performance.<a contenteditable="false" data-primary="optimizations of existing code, code reviews for" data-type="indexterm" id="id-8JTocOtMuAsB"> </a> Such changes are the bread and butter of most software engineers.</p>
|
||||
|
||||
<p>In each of these cases, the guidelines that apply to a greenfield review also apply: is this change necessary, and does this change improve the codebase? Some of the best modifications to a codebase are actually deletions! Getting rid of dead or obsolete code is one of the best ways to improve the overall code health of a codebase.</p>
|
||||
|
||||
<p>Any behavioral modifications <a contenteditable="false" data-primary="behaviors" data-secondary="code reviews for changes in" data-type="indexterm" id="id-8JT0HlSMuAsB"> </a>should necessarily include revisions to appropriate tests for any new API behavior. Augmentations to the implementation should be tested in a Continuous Integration (CI) system to ensure that those modifications don’t break any underlying assumptions of the existing tests. As well, optimizations should of course ensure that they don’t affect those tests and might need to include performance benchmarks for the reviewers to consult. Some optimizations might also require benchmark tests.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="bug_fixes_and_rollbacks">
|
||||
<h2>Bug Fixes and Rollbacks</h2>
|
||||
|
||||
<p>Inevitably, you will need to submit a change for<a contenteditable="false" data-primary="code reviews" data-secondary="types of" data-tertiary="bug fixes and rollbacks" data-type="indexterm" id="id-Z7T9HNt3h9s3"> </a> a bug fix to your <a contenteditable="false" data-primary="bug fixes" data-type="indexterm" id="id-8JT4tOtahAsB"> </a>codebase. <em>When doing so, avoid the temptation to address other issues</em>. Not only does this risk increasing the size of the code review, it also makes it more difficult to perform regression testing or for others to roll back your change. A bug fix should focus solely on fixing the indicated bug and (usually) updating associated tests to catch the error that occurred in the first place.</p>
|
||||
|
||||
<p>Addressing the bug with a revised test is often necessary. The bug surfaced because existing tests were either inadequate, or the code had certain assumptions that were not met. As a reviewer of a bug fix, it is important to ask for updates to unit tests if applicable.</p>
|
||||
|
||||
<p>Sometimes, a code change in a codebase as large as Google’s causes some dependency to fail that was either not detected properly by tests or that unearths an untested part of the codebase. <a contenteditable="false" data-primary="rollbacks" data-type="indexterm" id="id-X3T4HeSahgsY"> </a>In those cases, Google allows such changes to be “rolled back,” usually by the affected downstream customers. A rollback consists of a change that essentially undoes the previous change. Such rollbacks can be created in seconds because they just revert the previous change to a known state, but they still require a code review.</p>
|
||||
|
||||
<p>It also becomes critically important that any change that could cause a potential rollback (and that includes all changes!) be as small and atomic as possible so that a rollback, if needed, does not cause further breakages on other dependencies that can be difficult to untangle. At Google, we’ve seen developers start to depend on new code very quickly after it is submitted, and rollbacks sometimes break these developers as a result. Small changes help to mitigate these concerns, both because of their atomicity, and because reviews of small changes tend to be done quickly.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="refactorings_and_large-scale_changes">
|
||||
<h2>Refactorings and Large-Scale Changes</h2>
|
||||
|
||||
<p>Many changes at Google are automatically generated: the author of the change isn’t a person, but a machine.<a contenteditable="false" data-primary="code reviews" data-secondary="types of" data-tertiary="refactorings and large-scale changes" data-type="indexterm" id="id-8JT0HOtyfAsB"> </a><a contenteditable="false" data-primary="refactorings" data-secondary="code reviews for" data-type="indexterm" id="id-X3TKtOtlfgsY"> </a> We discuss more <a contenteditable="false" data-primary="large-scale changes" data-secondary="code reviews for" data-type="indexterm" id="id-47TacatEfjsj"> </a>about the large-scale change (LSC) process in <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>, but even machine-generated changes require review. In cases where the change is considered low risk, it is reviewed by designated reviewers who have approval privileges for our entire codebase. But for cases in which the change might be risky or otherwise requires local domain expertise, individual engineers might be asked to review automatically generated changes as part of their normal workflow.</p>
|
||||
|
||||
<p>At first look, a review for an automatically generated change should be handled the same as any other code review: the reviewer should check for correctness and applicability of the change. However, we encourage reviewers to limit comments in the associated change and only flag concerns that are specific to their code, not the underlying tool or LSC generating the changes. While the specific change might be machine generated, the overall process generating these changes has already been reviewed, and individual teams cannot hold a veto over the process, or it would not be possible to scale such changes across the organization. If there is a concern about the underlying tool or process, reviewers can escalate out of band to an LSC oversight group for more information.</p>
|
||||
|
||||
<p>We also encourage reviewers of automatic changes to avoid expanding their scope. When reviewing a new feature or a change written by a teammate, it is often reasonable to ask the author to address related concerns within the same change, so long as the request still follows the earlier advice to keep the change small. This does not apply to automatically generated changes because the human running the tool might have hundreds of changes in flight, and even a small percentage of changes with review comments or unrelated questions limits the scale at which the human can effectively operate the tool.<a contenteditable="false" data-primary="code reviews" data-secondary="types of" data-startref="ix_cdrevtyp" data-type="indexterm" id="id-47TrHMSEfjsj"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00013">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Code review is one of the most important and critical processes at Google. Code review acts as the glue connecting engineers with one another, and the code review process is the primary developer workflow upon which almost all other processes must hang, from testing to static analysis to CI. A code review process must scale appropriately, and for that reason, best practices, including small changes and rapid feedback and iteration, are important to maintain developer satisfaction and appropriate production velocity.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00110">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Code review has many benefits, including ensuring code correctness, comprehension, and consistency across a codebase.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Always check your assumptions through someone else; optimize for the reader.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Provide the opportunity for critical feedback while remaining professional.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Code review is important for knowledge sharing throughout an organization.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Automation is critical for scaling the process.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The code review itself provides a historical record.<a contenteditable="false" data-primary="code reviews" data-startref="ix_cdrev_chapter" data-type="indexterm" id="id-X3T4HVHahltDiM"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn103"><sup><a href="ch09.html#ch01fn103-marker">1</a></sup>We also use <a href="https://www.gerritcodereview.com">Gerrit</a> to review Git code, primarily for our open source projects. However, Critique is the primary tool of a typical software engineer at Google.</p><p data-type="footnote" id="ch01fn104"><sup><a href="ch09.html#ch01fn104-marker">2</a></sup>Steve McConnell, <em>Code Complete</em> (Redmond: Microsoft Press, 2004).</p><p data-type="footnote" id="ch01fn106"><sup><a href="ch09.html#ch01fn106-marker">3</a></sup>At Google, “readability” does not refer simply to comprehension, but to the set of styles and best practices that allow code to be maintainable to other engineers. See <a data-type="xref" href="ch03.html#knowledge_sharing">Knowledge Sharing</a>.</p><p data-type="footnote" id="ch01fn107"><sup><a href="ch09.html#ch01fn107-marker">4</a></sup>Some changes to documentation and configurations might not require a code review, but it is often still preferable to obtain such a review.</p><p data-type="footnote" id="ch01fn108"><sup><a href="ch09.html#ch01fn108-marker">5</a></sup>“Advances in Software Inspection,” <em>IEEE Transactions on Software Engineering</em>, SE-12(7): 744–751, July 1986. Granted, this study took place before robust tooling and automated testing had become so important in the software development process, but the results still seem relevant in the modern software age.</p><p data-type="footnote" id="ch01fn109"><sup><a href="ch09.html#ch01fn109-marker">6</a></sup>Rigby, Peter C. and Christian Bird. 2013. "Convergent contemporary software peer review practices." ESEC/FSE 2013: <em>Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering</em>, August 2013: 202-212. <a href="https://dl.acm.org/doi/10.1145/2491411.2491444"><em>https://dl.acm.org/doi/10.1145/2491411.2491444</em></a>.</p><p data-type="footnote" id="ch01fn110"><sup><a href="ch09.html#ch01fn110-marker">7</a></sup>Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli, <a href="https://oreil.ly/m7FnJ">"Modern code review: a case study at Google."</a></p><p data-type="footnote" id="ch01fn111"><sup><a href="ch09.html#ch01fn111-marker">8</a></sup>Ibid.</p><p data-type="footnote" id="ch01fn112"><sup><a href="ch09.html#ch01fn112-marker">9</a></sup>Ibid.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
549
clones/abseil.io/resources/swe-book/html/ch10.html
Normal file
|
@ -0,0 +1,549 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="documentation-id00040">
|
||||
<h1>Documentation</h1>
|
||||
|
||||
<p class="byline">Written by Tom Manshreck</p>
|
||||
|
||||
<p class="byline">Edited by Riona MacNamara</p>
|
||||
|
||||
<p>Of the complaints most engineers have about writing, using, and maintaining code, a singular common frustration <a contenteditable="false" data-primary="documentation" data-type="indexterm" id="ix_docx"> </a>is the lack of quality documentation. “What are the side effects of this method?” “I got an error after step 3.” “What does this acronym mean?” “Is this document up to date?” Every software engineer has voiced complaints about the quality, quantity, or sheer lack of documentation throughout their career, and the software engineers at Google are no different.</p>
|
||||
|
||||
<p>Technical writers and project managers may help, but software engineers will always need to write most documentation themselves. Engineers, therefore, need the proper tools and incentives to do so effectively. The key to making it easier for them to write quality documentation is to introduce processes and tools that scale with the organization and that tie into their existing workflow.</p>
|
||||
|
||||
<p>Overall, the state of engineering documentation in the late 2010s is similar to the state of software testing in the late 1980s. Everyone recognizes that more effort needs to be made to improve it, but there is not yet organizational recognition of its critical benefits. That is changing, if slowly. At Google, our most successful efforts have been when documentation is <em>treated like code</em> and incorporated into the traditional engineering workflow, making it easier for engineers to write and maintain simple <span class="keep-together">documents.</span></p>
|
||||
|
||||
<section data-type="sect1" id="what_qualifies_as_documentationquestion">
|
||||
<h1>What Qualifies as Documentation?</h1>
|
||||
|
||||
<p>When we refer to “documentation,” we’re talking about every<a contenteditable="false" data-primary="documentation" data-secondary="about" data-type="indexterm" id="id-yLfJSeUQte"> </a> supplemental text that an engineer needs to write to do their job: not only standalone documents, but code comments as well. (In fact, most of the documentation an engineer at Google writes comes in the form of code comments.) We’ll discuss the various types of engineering documents further in this chapter.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="why_is_documentation_neededquestion_mar">
|
||||
<h1>Why Is Documentation Needed?</h1>
|
||||
|
||||
<p>Quality documentation has tremendous benefits for an engineering organization. Code and APIs become more comprehensible, reducing mistakes.<a contenteditable="false" data-primary="documentation" data-secondary="benefits of" data-type="indexterm" id="ix_docxben"> </a> Project teams are more focused when their design goals and team objectives are clearly stated. Manual processes are easier to follow when the steps are clearly outlined. Onboarding new members to a team or code base takes much less effort if the process is clearly <span class="keep-together">documented.</span></p>
|
||||
|
||||
<p>But because documentation’s benefits are all necessarily downstream, they generally don’t reap immediate benefits to the author. Unlike testing, which (as we’ll see) quickly provides benefits to a programmer, documentation generally requires more effort up front and doesn’t provide clear benefits to an author until later. But, like investments in testing, the investment made in documentation will pay for itself over time. After all, you might write a document only once,<sup><a data-type="noteref" id="ch01fn113-marker" href="ch10.html#ch01fn113">1</a></sup> but it will be read hundreds, perhaps thousands of times afterward; its initial cost is amortized across all the future readers. Not only does documentation scale over time, but it is critical for the rest of the organization to scale as well. It helps answer questions like these:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Why were these design decisions made?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Why did we implement this code in this manner?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Why did <em>I</em> implement this code in this manner, if you’re looking at your own code two years later?</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>If documentation conveys all these benefits, why is it generally considered “poor” by engineers? One reason, as we’ve mentioned, is that the benefits aren’t <em>immediate</em>, especially to the writer. But there are several other reasons:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Engineers often view writing as a separate skill than that of programming. (We’ll try to illustrate that this isn’t quite the case, and even where it is, it isn’t necessarily a separate skill from that of software engineering.)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Some engineers don’t feel like they are capable writers. But you don’t need a robust command of English<sup><a data-type="noteref" id="ch01fn114-marker" href="ch10.html#ch01fn114">2</a></sup> to produce workable documentation. You just need to step outside yourself a bit and see things from the audience’s perspective.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Writing documentation is often more difficult because of limited tools support or integration into the developer workflow.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Documentation is viewed as an extra burden—something else to maintain—rather than something that will make maintenance of their existing code easier.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Not every engineering team needs a technical writer (and even if that were the case, there aren’t enough of them). This means that engineers will, by and large, write most of the documentation themselves. So, instead of forcing engineers to become technical writers, we should instead think about how to make writing documentation easier for engineers. Deciding how much effort to devote to documentation is a decision your organization will need to make at some point.</p>
|
||||
|
||||
<p>Documentation benefits several different groups. Even to the writer, documentation provides the following benefits:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>It helps<a contenteditable="false" data-primary="APIs" data-secondary="benefits of documentation to" data-type="indexterm" id="id-BKf1SJSoSyCrho"> </a> formulate an API. Writing documentation is one of the surest ways to figure out if your API makes sense. Often, the writing of the documentation itself leads engineers to reevaluate design decisions that otherwise wouldn’t be questioned. If you can’t explain it and can’t define it, you probably haven’t designed it well enough.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It provides a road map for maintenance and a historical record. Tricks in code should be avoided, in any case, but good comments help out a great deal when you’re staring at code you wrote two years ago, trying to figure out what’s wrong.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It makes your code look more professional and drive traffic. Developers will naturally assume that a well-documented API is a better-designed API. That’s not always the case, but they are often highly correlated. Although this benefit sounds cosmetic, it’s not quite so: whether a product has good documentation is usually a pretty good indicator of how well a product will be maintained.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It will prompt fewer questions from other users. This is probably the biggest benefit over time to someone writing the documentation. If you have to explain something to someone more than once, it usually makes sense to document that process.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>As great as these benefits are to the writer of documentation, the lion’s share of documentation’s benefits will naturally accrue to the reader. Google’s C++ Style Guide notes the maxim “<a href="https://oreil.ly/zCsPc">optimize for the reader</a>.” This maxim applies not just to code, but to the comments around code, or the documentation set attached to an API. Much like testing, the effort you put into writing good documents will reap benefits many times over its lifetime. Documentation is critical over time, and reaps tremendous benefits for especially critical code as an organization scales.<a contenteditable="false" data-primary="documentation" data-secondary="benefits of" data-startref="ix_docxben" data-type="indexterm" id="id-O5fJUvc7hl"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="documentation_is_like_code">
|
||||
<h1>Documentation Is Like Code</h1>
|
||||
|
||||
<p>Software engineers who write in a single, primary programming language still often reach for different languages to solve specific problems. <a contenteditable="false" data-primary="documentation" data-secondary="treating as code" data-type="indexterm" id="ix_docxsim"> </a>An engineer might write shell scripts or Python to run command-line tasks, or they might write most of their backend code in C++ but write some middleware code in Java, and so on. Each language is a tool in the toolbox.</p>
|
||||
|
||||
<p>Documentation should be no different: it’s a tool, written in a different language (usually English) to accomplish a particular task. Writing documentation is not much different than writing code. Like a programming language, it has rules, a particular syntax, and style decisions, often to accomplish a similar purpose as that within code: enforce consistency, improve clarity, and avoid (comprehension) errors. Within technical documentation, grammar is important not because one needs rules, but to standardize the voice and avoid confusing or distracting the reader. Google requires a certain comment style for many of its languages for this reason.</p>
|
||||
|
||||
<p>Like code, documents should also have owners. Documents without owners become stale and difficult to maintain. Clear ownership also makes it easier to handle documentation through existing developer workflows: bug tracking systems, code review tooling, and so forth. Of course, documents with different owners can still conflict with one another.<a contenteditable="false" data-primary="canonical documentation" data-type="indexterm" id="id-Zvf9SzT6CD"> </a> In those cases, it is important to designate <em>canonical</em> documentation: determine the primary source and consolidate other associated documents into that primary source (or deprecate the duplicates).</p>
|
||||
|
||||
<p>The prevalent<a contenteditable="false" data-primary="go/ links" data-secondary="use with canonical documentation" data-type="indexterm" id="id-W1f3SwIKCY"> </a> usage of “go/ links” at Google (see <a data-type="xref" href="ch03.html#knowledge_sharing">Knowledge Sharing</a>) makes this process easier. Documents with straightforward go/ links often become the canonical source of truth. One other way to promote canonical documents is to associate them directly with the code they document by placing them directly under source control and alongside the source code itself.</p>
|
||||
|
||||
<p>Documentation is often so tightly coupled to code that it should, as much as possible, be treated <a href="https://oreil.ly/G0LBo"><em>as code</em></a>. That is, your documentation should:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Have internal policies or rules to be followed</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Be placed under source control</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Have clear ownership responsible for maintaining the docs</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Undergo reviews for changes (and change <em>with</em> the code it documents)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Have issues tracked, as bugs are tracked in code</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Be periodically evaluated (tested, in some respect)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If possible, be measured for aspects such as accuracy, freshness, etc. (tools have still not caught up here)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>The more engineers treat documentation as “one of” the necessary tasks of software development, the less they will resent the upfront costs of writing, and the more they will reap the long-term benefits. In addition, making the task of documentation easier reduces those upfront costs. </p>
|
||||
|
||||
<aside data-type="sidebar" id="callout_the_google_wiki">
|
||||
<h5>Case Study: The Google Wiki</h5>
|
||||
|
||||
<p>When Google was much smaller and leaner, it had few technical writers. The easiest way to share information <a contenteditable="false" data-primary="documentation" data-secondary="treating as code" data-tertiary="Google wiki and" data-type="indexterm" id="id-O5f3S7UDCECL"> </a>was through our own internal wiki (GooWiki).<a contenteditable="false" data-primary="Google wiki (GooWiki)" data-type="indexterm" id="id-73fVUaU5CdCq"> </a> At first, this seemed like a reasonable approach; all engineers shared a single documentation set and could update it as needed.</p>
|
||||
|
||||
<p>But as Google scaled, problems with a wiki-style approach became apparent. Because there were no true owners for documents, many became obsolete.<sup><a data-type="noteref" id="ch01fn115-marker" href="ch10.html#ch01fn115">3</a></sup> Because no process was put in place for adding new documents, duplicate documents and document sets began appearing. GooWiki had a flat namespace, and people were not good at applying any hierarchy to the documentation sets. At one point, there were 7 to 10 documents (depending on how you counted them) on setting up Borg, our production compute environment, only a few of which seemed to be maintained, and most were specific to certain teams with certain permissions and assumptions.</p>
|
||||
|
||||
<p>Another problem with GooWiki became apparent over time: the people who could fix the documents were not the people who used them. New users discovering bad documents either couldn’t confirm that the documents were wrong or didn’t have an easy way to report errors. They knew something was wrong (because the document didn’t work), but they couldn’t “fix” it. Conversely, the people best able to fix the documents often didn’t need to consult them after they were written. The documentation became so poor as Google grew that the quality of documentation became Google’s number one developer complaint on our annual developer surveys.</p>
|
||||
|
||||
<p>The way to improve the situation was to move important documentation under the same sort of source control that was being used to track code changes.<a contenteditable="false" data-primary="source control" data-secondary="moving documentation to" data-type="indexterm" id="id-Mef9SbIZC3CW"> </a> Documents began to have their own owners, canonical locations within the source tree, and processes for identifying bugs and fixing them; the documentation began to dramatically improve. Additionally, the way documentation was written and maintained began to look the same as how code was written and maintained. Errors in the documents could be reported within our bug tracking software. Changes to the documents could be handled using the existing code review process. Eventually, engineers began to fix the documents themselves or send changes to technical writers (who were often the owners).</p>
|
||||
|
||||
<p>Moving documentation to source control was initially met with a lot of controversy. Many engineers were convinced that doing away with the GooWiki, that bastion of freedom of information, would lead to poor quality because the bar for documentation (requiring a review, requiring owners for documents, etc.) would be higher. But that wasn’t the case. The documents became better.</p>
|
||||
|
||||
<p>The introduction of Markdown as a common documentation formatting language also helped<a contenteditable="false" data-primary="Markdown" data-type="indexterm" id="id-lOfRSlt8ClC9"> </a> because it made it easier for engineers to understand how to edit documents without needing specialized expertise in HTML or CSS. <a contenteditable="false" data-primary="code" data-secondary="embedding documentation in with g3doc" data-type="indexterm" id="id-m7fQUDtJCRCV"> </a><a contenteditable="false" data-primary="g3doc" data-type="indexterm" id="id-e8f9HbtOCvCx"> </a>Google eventually introduced its own framework for embedding documentation within code: <a href="https://oreil.ly/YjrTD">g3doc</a>. With that framework, documentation improved further, as documents existed side by side with the source code within the engineer’s development environment. Now, engineers could update the code and its associated documentation in the same change (a practice for which we’re still trying to improve adoption).</p>
|
||||
|
||||
<p>The key difference was that maintaining documentation became a similar experience to maintaining code: engineers filed bugs, made changes to documents in changelists, sent changes to reviews by experts, and so on. Leveraging of existing developer workflows, rather than creating new ones, was a key benefit.<a contenteditable="false" data-primary="documentation" data-secondary="treating as code" data-startref="ix_docxsim" data-type="indexterm" id="id-m7f6SAhJCRCV"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="know_your_audience">
|
||||
<h1>Know Your Audience</h1>
|
||||
|
||||
<p>One of the most important mistakes that engineers make when writing documentation is to write only for themselves. <a contenteditable="false" data-primary="documentation" data-secondary="knowing your audience" data-type="indexterm" id="ix_docxaud"> </a>It’s natural to do so, and writing for yourself is not without value: after all, you might need to look at this code in a few years and try to figure out what you once meant. You also might be of approximately the same skill set as someone reading your document. But if you write only for yourself, you are going to make certain assumptions, and given that your document might be read by a very wide audience (all of engineering, external developers), even a few lost readers is a large cost. As an organization grows, mistakes in documentation become more prominent, and your assumptions often do not apply.</p>
|
||||
|
||||
<p>Instead, before you begin writing, you should (formally or informally) identify the audience(s) your documents need to satisfy. A design document might need to persuade decision makers. A tutorial might need to provide very explicit instructions to someone utterly unfamiliar with your codebase. An API might need to provide complete and accurate reference information for any users of that API, be they experts or novices. Always try to identify a primary audience and write to that audience.</p>
|
||||
|
||||
<p>Good documentation need not be polished or “perfect.” One mistake engineers make when writing documentation is assuming they need to be much better writers. By that measure, few software engineers would write. Think about writing like you do about testing or any other process you need to do as an engineer. Write to your audience, in the voice and style that they expect. If you can read, you can write. Remember that your audience is standing where you once stood, but <em>without your new domain knowledge</em>. So you don’t need to be a great writer; you just need to get someone like you as familiar with the domain as you now are. (And as long as you get a stake in the ground, you can improve this document over time.)</p>
|
||||
|
||||
<section data-type="sect2" id="types_of_audiences">
|
||||
<h2>Types of Audiences</h2>
|
||||
|
||||
<p>We’ve pointed out that you should write at the skill level and domain knowledge appropriate for your audience. <a contenteditable="false" data-primary="documentation" data-secondary="knowing your audience" data-tertiary="types of audiences" data-type="indexterm" id="id-DBfASbU0IDce"> </a>But who precisely is your audience? Chances are, you have multiple audiences based on one or more of the following criteria:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Experience level (expert programmers, or junior engineers who might not even be familiar—gulp!—with the language).<a contenteditable="false" data-primary="experience levels for documentation audiences" data-type="indexterm" id="id-BKf1SJSoSpHJI6cM"> </a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Domain knowledge (team members, or other engineers in your organization who are familiar only with API endpoints).<a contenteditable="false" data-primary="domain knowledge of documentation audiences" data-type="indexterm" id="id-wAfzSkSqUwHnIxcw"> </a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Purpose (end users who might need your API to do a specific task and need to find that<a contenteditable="false" data-primary="purpose of documentation users" data-type="indexterm" id="id-O5f3SKSMH5H0IxcW"> </a> information quickly, or software gurus who are responsible for the guts of a particularly hairy implementation that you hope no one else needs to <span class="keep-together">maintain).</span></p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>In some cases, different audiences require different writing styles, but in most cases, the trick is to write in a way that applies as broadly to your different audience groups as possible. Often, you will need to explain a complex topic to both an expert and a novice. Writing for the expert with domain knowledge may allow you to cut corners, but you’ll confuse the novice; conversely, explaining everything in detail to the novice will doubtless annoy the expert.</p>
|
||||
|
||||
<p>Obviously, writing such documents is a balancing act and there’s no silver bullet, but one thing we’ve found is that it helps to keep your documents <em>short</em>. Write descriptively enough to explain complex topics to people unfamiliar with the topic, but don’t lose or annoy experts. Writing a short document often requires you to write a longer one (getting all the information down) and then doing an edit pass, removing duplicate information where you can. This might sound tedious, but keep in mind that this expense is spread across all the readers of the documentation.<a contenteditable="false" data-primary="Pascal, Blaise" data-type="indexterm" id="id-73fVUMI6ILcq"> </a> As Blaise Pascal once said, “If I had more time, I would have written you a shorter letter.” By keeping a document short and clear, you will ensure that it will satisfy both an expert and a novice.</p>
|
||||
|
||||
<p>Another important audience distinction is based on how a user encounters a <span class="keep-together">document:</span></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p><em>Seekers</em> are engineers who <em>know what they want</em> and want to know if what they are looking at fits the bill. <a contenteditable="false" data-primary="seekers (of documentation)" data-type="indexterm" id="id-xYfeHGSYSet0IYcR"> </a>A key pedagogical device for this audience is <span class="keep-together"><em>consistency</em></span>. If you are writing reference documentation for this group—within a code file, for example—you will want to have your comments follow a similar format so that readers can quickly scan a reference and see whether they find what they are looking for.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Stumblers</em> might not know exactly what they want. <a contenteditable="false" data-primary="stumblers, documentation for" data-type="indexterm" id="id-xYf6UGSmUet0IYcR"> </a>They might have only a vague idea of how to implement what they are working with. The key for this audience is <em>clarity</em>. Provide overviews or introductions (at the top of a file, for example) that explain the purpose of the code they are looking at. It’s also useful to identify when a doc is <em>not</em> appropriate for an audience. A lot of documents at Google begin with a “TL;DR statement” such as “TL;DR: if you are not interested in C++ compilers at Google, you can stop reading now.”</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Finally, one important audience distinction is between that of a customer (e.g., a user of an API) and that of a <a contenteditable="false" data-primary="providers, documentation for" data-type="indexterm" id="id-Mef9SAh0I1cW"> </a>provider (e.g., a member of the project team).<a contenteditable="false" data-primary="customers, documentation for" data-type="indexterm" id="id-xYf6UrhkIwcr"> </a> As much as possible, documents intended for one should be kept apart from documents intended for the other. Implementation details are important to a team member for maintenance purposes; end users should not need to read such information. Often, engineers denote design decisions within the reference API of a library they publish. Such reasonings belong more appropriately in specific documents (design documents) or, at best, within the implementation details of code hidden behind an interface.<a contenteditable="false" data-primary="documentation" data-secondary="knowing your audience" data-startref="ix_docxaud" data-type="indexterm" id="id-lOfqHdhkI4c9"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="documentation_types">
|
||||
<h1>Documentation Types</h1>
|
||||
|
||||
<p>Engineers write various different types of documentation as part of their work: design<a contenteditable="false" data-primary="documentation" data-secondary="types of" data-type="indexterm" id="ix_docxtyp"> </a> documents, code comments, how-to documents, project pages, and more. These all count as “documentation.” But it is important to know the different types, and to <em>not mix types</em>. A document should have, in general, a singular purpose, and stick to it. Just as an API should do one thing and do it well, avoid trying to do several things within one document. Instead, break out those pieces more logically.</p>
|
||||
|
||||
<p>There are several main types of documents that software engineers often need to write:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Reference documentation, including code comments</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Design documents</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Tutorials</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Conceptual documentation</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Landing pages</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>It was common in the early days of Google for teams to have monolithic wiki pages with bunches of links (many broken or obsolete), some conceptual information about how the system worked, an API reference, and so on, all sprinkled together. Such documents fail because they don’t serve a single purpose (and they also get so long that no one will read them; some notorious wiki pages scrolled through several dozens of screens). Instead, make sure your document has a singular purpose, and if adding something to that page doesn’t make sense, you probably want to find, or even create, another document for that purpose.</p>
|
||||
|
||||
<section data-type="sect2" id="reference_documentation">
|
||||
<h2>Reference Documentation</h2>
|
||||
|
||||
<p>Reference<a contenteditable="false" data-primary="reference documentation" data-type="indexterm" id="ix_refdoc"> </a> documentation is the most common type that engineers<a contenteditable="false" data-primary="documentation" data-secondary="types of" data-tertiary="reference" data-type="indexterm" id="ix_docxtypref"> </a> need to write; indeed, they often need to write some form of reference documents every day. By reference documentation, we mean anything that documents the usage of code within the codebase.<a contenteditable="false" data-primary="comments" data-secondary="code" data-type="indexterm" id="id-73fQHaUms3fq"> </a> Code comments are the most common form of reference documentation that an engineer must maintain. Such comments can be divided into two basic camps: API comments versus implementation comments.<a contenteditable="false" data-primary="APIs" data-secondary="API comments" data-type="indexterm" id="id-XEfwT4U0sDfB"> </a> Remember the audience differences between these two: API comments don’t need to discuss implementation details or design decisions and can’t assume a user is as versed in the API as the author. <a contenteditable="false" data-primary="implementation comments" data-type="indexterm" id="id-MefwIBUMs8fW"> </a>Implementation comments, on the other hand, can assume a lot more domain knowledge of the reader, though be careful in assuming too much: people leave projects, and sometimes it’s safer to be methodical about exactly why you wrote this code the way you did.</p>
|
||||
|
||||
<p>Most reference documentation, even when provided as separate documentation from the code, is generated from <a contenteditable="false" data-primary="codebase" data-secondary="comments in, reference documentation generated from" data-type="indexterm" id="id-O5f3SAH4sZfL"> </a>comments within the codebase itself. (As it should; reference documentation should be single-sourced as much as possible.) Some languages such as Java or Python have specific commenting frameworks (Javadoc, PyDoc, GoDoc) meant to make generation of this reference documentation easier. <a contenteditable="false" data-primary="programming languages" data-secondary="reference documentation" data-type="indexterm" id="id-73fVUxHms3fq"> </a>Other languages, such as C++, have no standard “reference documentation” implementation, but because C++ separates out its API surface (in header or <em>.h</em> files) from the implementation (<em>.cc</em> files), header files are often a natural place to document a C++ API.<a contenteditable="false" data-primary="APIs" data-secondary="C++, documentation for" data-type="indexterm" id="id-xYfvI8HJsKfr"> </a><a contenteditable="false" data-primary="C++" data-secondary="APIs, reference documentation for" data-type="indexterm" id="id-lOfQsYHqskf9"> </a></p>
|
||||
|
||||
<p>Google takes this approach: a C++ API deserves to have its reference documentation live within the header file. Other reference documentation is embedded directly in the Java, Python, and Go source code as well. Because Google’s Code Search browser (see <a data-type="xref" href="ch17.html#code_search">Code Search</a>) is so robust, we’ve found little benefit to providing separate generated reference documentation. Users in Code Search not only search code easily, they can usually find the original definition of that code as the top result. Having the documentation alongside the code’s definitions also makes the documentation easier to discover and maintain.</p>
|
||||
|
||||
<p>We all know that code comments are essential to a well-documented API. But what precisely is a “good” comment? Earlier in this chapter, we identified two major audiences for reference documentation: seekers and stumblers. Seekers know what they want; stumblers don’t. The key win for seekers is a consistently commented codebase so that they can quickly scan an API and find what they are looking for. The key win for stumblers is clearly identifying the purpose of an API, often at the top of a file header. We’ll walk through some code comments in the subsections that follow. The code commenting guidelines that follow apply to C++, but similar rules are in place at Google for other languages.</p>
|
||||
|
||||
<section data-type="sect3" id="file_comments">
|
||||
<h3>File comments</h3>
|
||||
|
||||
<p>Almost all<a contenteditable="false" data-primary="reference documentation" data-secondary="file comments" data-type="indexterm" id="id-xYf5S0UJs6s1fb"> </a> code files at Google <a contenteditable="false" data-primary="file comments" data-type="indexterm" id="id-lOfLUxUqswsKfJ"> </a>must contain a file comment. (Some header files that contain only one utility function, etc., might deviate from this standard.) File comments should begin with a header of the following form:</p>
|
||||
|
||||
<div data-type="example" id="id-xMceHpsJsKfr">
|
||||
<pre data-type="programlisting">// -----------------------------------------------------------------------------
|
||||
// str_cat.h
|
||||
// -----------------------------------------------------------------------------
|
||||
//
|
||||
// This header file contains functions for efficiently concatenating and appending
|
||||
// strings: StrCat() and StrAppend(). Most of the work within these routines is
|
||||
// actually handled through use of a special AlphaNum type, which was designed
|
||||
// to be used as a parameter type that efficiently manages conversion to
|
||||
// strings and avoids copies in the above operations.
|
||||
…</pre>
|
||||
</div>
|
||||
|
||||
<p>Generally, a file comment should begin with an outline of what’s contained in the code you are reading. It should identify the code’s main use cases and intended audience (in the preceding case, developers who want to concatenate strings). Any API that cannot be succinctly described in the first paragraph or two is usually the sign of an API that is not well thought out. Consider breaking the API into separate components in those cases.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="class_comments">
|
||||
<h3>Class comments</h3>
|
||||
|
||||
<p>Most modern programming <a contenteditable="false" data-primary="reference documentation" data-secondary="class comments" data-type="indexterm" id="id-lOfRSxUQtwsKfJ"> </a>languages are object oriented.<a contenteditable="false" data-primary="class comments" data-type="indexterm" id="id-m7fQUMUMtpsnf5"> </a> Class comments are therefore important for defining the API “objects” in use in a codebase. All public classes (and structs) at Google must contain a class comment describing the class/struct, important methods of that class, and the purpose of the class. Generally, class comments should be “nouned” with documentation emphasizing their object aspect. That is, say, “The Foo class contains x, y, z, allows you to do Bar, and has the following Baz aspects,” and so on.</p>
|
||||
|
||||
<p>Class comments should generally begin with a comment of the following form:</p>
|
||||
|
||||
<div data-type="example" id="id-mocaTDtesQfV">
|
||||
<pre data-type="programlisting">// -----------------------------------------------------------------------------
|
||||
// AlphaNum
|
||||
// -----------------------------------------------------------------------------
|
||||
//
|
||||
// The AlphaNum class acts as the main parameter type for StrCat() and
|
||||
// StrAppend(), providing efficient conversion of numeric, boolean, and
|
||||
// hexadecimal values (through the Hex type) into strings.</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="function_comments">
|
||||
<h3>Function comments</h3>
|
||||
|
||||
<p>All free functions, or public methods of a class, at Google must also contain a function comment <a contenteditable="false" data-primary="reference documentation" data-secondary="function comments" data-type="indexterm" id="id-m7f6SMUkhpsnf5"> </a>describing what the function <em>does</em>.<a contenteditable="false" data-primary="function comments" data-type="indexterm" id="id-apf9HvUrhvsGfq"> </a> Function comments should stress the <em>active</em> nature of their use, beginning with an indicative verb describing what the function does and what is returned.</p>
|
||||
|
||||
<p>Function comments should generally begin with a comment of the following form:</p>
|
||||
|
||||
<div data-type="example" id="id-ekcRT9hMsGfx">
|
||||
<pre data-type="programlisting">// StrCat()
|
||||
//
|
||||
// Merges the given strings or numbers, using no delimiter(s),
|
||||
// returning the merged result as a string.
|
||||
…</pre>
|
||||
</div>
|
||||
|
||||
<p>Note that starting a function comment with a declarative verb introduces consistency across a header file. A seeker can quickly scan an API and read just the verb to get an idea of whether the function is appropriate: “Merges, Deletes, Creates,” and so on.</p>
|
||||
|
||||
<p>Some documentation styles (and some documentation generators) require various forms of boilerplate on function comments, like "Returns:", "Throws:", and so forth, but at Google we haven’t found them to be necessary. It is often clearer to present such information in a single prose comment that’s not broken up into artificial section boundaries:</p>
|
||||
|
||||
<div data-type="example" id="id-qKcBt5hysZfL">
|
||||
<pre data-type="programlisting">// Creates a new record for a customer with the given name and address,
|
||||
// and returns the record ID, or throws `DuplicateEntryError` if a
|
||||
// record with that name already exists.
|
||||
<span>int</span> <span>AddCustomer</span><span>(</span><span>string</span> name<span>,</span> <span>string</span> address<span>);</span></pre>
|
||||
</div>
|
||||
|
||||
<p>Notice how the postcondition, parameters, return value, and exceptional cases are naturally documented together (in this case, in a single sentence), because they are not independent of one another. Adding explicit boilerplate sections would make the comment more verbose and <a contenteditable="false" data-primary="reference documentation" data-startref="ix_refdoc" data-type="indexterm" id="id-YGfDSXhZhOs4f0"> </a>repetitive, but<a contenteditable="false" data-primary="documentation" data-secondary="types of" data-startref="ix_docxtypref" data-tertiary="reference" data-type="indexterm" id="id-LmfeUOhdhesWfx"> </a> no clearer (and arguably less clear). </p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="design_docs">
|
||||
<h2>Design Docs</h2>
|
||||
|
||||
<p>Most teams at Google require an approved design document before starting work on any major project.<a contenteditable="false" data-primary="design documents" data-type="indexterm" id="id-O5f3S7UXtZfL"> </a><a contenteditable="false" data-primary="documentation" data-secondary="types of" data-tertiary="design documents" data-type="indexterm" id="id-73fVUaU7t3fq"> </a> A software engineer typically writes the proposed design document using a specific design doc template approved by the team. Such documents are designed to be collaborative, so they are often shared in Google Docs, which has good collaboration tools. Some teams require such design documents to be discussed and debated at specific team meetings, where the finer points of the design can be discussed or critiqued by experts. In some respects, these design discussions act as a form of code review before any code is written.</p>
|
||||
|
||||
<p>Because the development of a design document is one of the first processes an engineer undertakes before deploying a new system, it is also a convenient place to ensure that various concerns are covered. The canonical design document templates at Google require engineers to consider aspects of their design such as security implications, internationalization, storage requirements and privacy concerns, and so on. In most cases, such parts of those design documents are reviewed by experts in those domains.</p>
|
||||
|
||||
<p>A good design document should cover the goals of the design, its implementation strategy, and propose key design decisions with an emphasis on their individual trade-offs. The best design documents suggest design goals and cover alternative designs, denoting their strong and weak points.</p>
|
||||
|
||||
<p>A good design document, once approved, also acts not only as a historical record, but as a measure of whether the project successfully achieved its goals. Most teams archive their design documents in an appropriate location within their team documents so that they can review them at a later time. It’s often useful to review a design document before a product is launched to ensure that the stated goals when the design document was written remain the stated goals at launch (and if they do not, either the document or the product can be adjusted accordingly).</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="tutorials">
|
||||
<h2>Tutorials</h2>
|
||||
|
||||
<p>Every software engineer, when they join a new team, will want to get up to speed as quickly as possible.<a contenteditable="false" data-primary="documentation" data-secondary="types of" data-tertiary="tutorials" data-type="indexterm" id="id-73feSaU9h3fq"> </a><a contenteditable="false" data-primary="tutorials" data-type="indexterm" id="id-XEfYU4UvhDfB"> </a> Having a tutorial that walks someone through the setup of a new project is invaluable; “Hello World” has established itself as one of the best ways to ensure that all team members start off on the right foot. <a contenteditable="false" data-primary="“Hello World” tutorials" data-primary-sortas="Hello" data-type="indexterm" id="id-MefnHBUKh8fW"> </a>This goes for documents as well as code. Most projects deserve a “Hello World” document that assumes nothing and gets the engineer to make something “real” happen.</p>
|
||||
|
||||
<p>Often, the best time to write a tutorial, if one does not yet exist, is when you first join a team. (It’s also the best time to find bugs in any existing tutorial you are following.) Get a notepad or other way to take notes, and write down everything you need to do along the way, assuming no domain knowledge or special setup constraints; after you’re done, you’ll likely know what mistakes you made during the process—and why—and can then edit down your steps to get a more streamlined tutorial. Importantly, write <em>everything</em> you need to do along the way; try not to assume any particular setup, permissions, or domain knowledge. If you do need to assume some other setup, state that clearly in the beginning of the tutorial as a set of prerequisites.</p>
|
||||
|
||||
<p>Most tutorials require you to perform a number of steps, in order. In those cases, number those steps explicitly. If the focus of the tutorial is on the <em>user</em> (say, for external developer documentation), then number each action that a user needs to undertake. Don’t number actions that the system may take in response to such user actions. <a contenteditable="false" data-primary="tutorials" data-secondary="example of a bad tutorial" data-type="indexterm" id="id-xYf6UWT7hKfr"> </a>It is critical and important to number explicitly every step when doing this. Nothing is more annoying than an error on step 4 because you forget to tell someone to properly authorize their username, for example.</p>
|
||||
|
||||
<section data-type="sect3" id="example_a_bad_tutorial">
|
||||
<h3>Example: A bad tutorial</h3>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>Download the package from our server at http://example.com</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Copy the shell script to your home directory</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Execute the shell script</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The foobar system will communicate with the authentication system</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Once authenticated, foobar will bootstrap a new database named “baz”</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Test “baz” by executing a SQL command on the command line</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Type: CREATE DATABASE my_foobar_db;</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>In the preceding procedure, steps 4 and 5 happen on the server end. It’s unclear whether the user needs to do anything, but they don’t, so those side effects can be mentioned as part of step 3. As well, it’s unclear whether step 6 and step 7 are different. (They aren’t.) Combine all atomic user operations into single steps so that the user knows they need to do something at each step in the process. Also, if your tutorial has user-visible input or output, denote that on separate lines (often using the convention of a <code>monospaced bold</code> font).</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="example_a_bad_tutorial_made_better">
|
||||
<h3>Example: A bad tutorial made better</h3>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>Download the <a contenteditable="false" data-primary="tutorials" data-secondary="example, bad tutorial made better" data-type="indexterm" id="id-m7f6S0SGSWUYsmhpf3"> </a>package from our server at <em>http://example.com</em>:</p>
|
||||
|
||||
<pre data-type="programlisting">$ curl -I <strong>http://example.com</strong></pre>
|
||||
</li>
|
||||
<li>
|
||||
<p>Copy the shell script to your home directory:</p>
|
||||
|
||||
<pre data-type="programlisting"><strong>$ cp foobar.sh ~</strong></pre>
|
||||
</li>
|
||||
<li>
|
||||
<p>Execute the shell script in your home directory:</p>
|
||||
|
||||
<pre data-type="programlisting"><strong>$ cd ~; foobar.sh</strong></pre>
|
||||
|
||||
<p>The foobar system will first communicate with the authentication system. Once authenticated, foobar will bootstrap a new database named “baz” and open an input shell.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Test “baz” by executing a SQL command on the command line:</p>
|
||||
|
||||
<pre data-type="programlisting"><strong>baz:$ CREATE DATABASE my_foobar_db;</strong></pre>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>Note how each step requires specific user intervention. If, instead, the tutorial had a focus on some other aspect (e.g., a document about the “life of a server”), number those steps from the perspective of that focus (what the server does).</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="conceptual_documentation">
|
||||
<h2>Conceptual Documentation</h2>
|
||||
|
||||
<p>Some code requires deeper explanations or insights than can be obtained simply by reading the reference documentation.<a contenteditable="false" data-primary="conceptual documentation" data-type="indexterm" id="id-XEfpS4UZCDfB"> </a><a contenteditable="false" data-primary="documentation" data-secondary="types of" data-tertiary="conceptual" data-type="indexterm" id="id-MefqUBUZC8fW"> </a> In those cases, we need conceptual documentation to provide overviews of the APIs or systems. Some examples of conceptual documentation might be a library overview for a popular API, a document describing the life cycle of data within a server, and so on. In almost all cases, a conceptual document is meant to augment, not replace, a reference documentation set. Often this leads to duplication of some information, but with a purpose: to promote clarity. In those cases, it is not necessary for a conceptual document to cover all edge cases (though a reference should cover those cases religiously). In this case, sacrificing some accuracy is acceptable for clarity. The main point of a conceptual document is to impart understanding.</p>
|
||||
|
||||
<p>“Concept” documents are the most difficult forms of documentation to write. As a result, they are often the most neglected type of document within a software engineer’s toolbox. One problem engineers face when writing conceptual documentation is that it often cannot be embedded directly within the source code because there isn’t a canonical location to place it. <a contenteditable="false" data-primary="APIs" data-secondary="conceptual documentation and" data-type="indexterm" id="id-Mef9SKHZC8fW"> </a>Some APIs have a relatively broad API surface area, in which case, a file comment might be an appropriate place for a “conceptual” explanation of the API. But often, an API works in conjunction with other APIs and/or modules. The only logical place to document such complex behavior is through a separate conceptual document. If comments are the unit tests of documentation, conceptual documents are the integration tests.</p>
|
||||
|
||||
<p>Even when an API is appropriately scoped, it often makes sense to provide a separate conceptual document. For example, Abseil’s <code>StrFormat</code> library covers a variety of concepts that accomplished users of the API should understand. In those cases, both internally and externally, we provide a <a href="https://oreil.ly/TMwSj">format concepts document</a>.</p>
|
||||
|
||||
<p>A concept document needs to be useful to a broad audience: both experts and novices alike. Moreover, it needs to emphasize <em>clarity</em>, so it often needs to sacrifice completeness (something best reserved for a reference) and (sometimes) strict accuracy. That’s not to say a conceptual document should intentionally be inaccurate; it just means that it should focus on common usage and leave rare usages or side effects for reference documentation.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="landing_pages">
|
||||
<h2>Landing Pages</h2>
|
||||
|
||||
<p>Most engineers are members of a team, and most teams have a “team page” somewhere on their company’s intranet. <a contenteditable="false" data-primary="landing pages" data-type="indexterm" id="id-Mef9SBU8c8fW"> </a><a contenteditable="false" data-primary="documentation" data-secondary="types of" data-tertiary="landing pages" data-type="indexterm" id="id-xYf6U0UXcKfr"> </a>Often, these sites are a bit of a mess: a typical landing page might contain some interesting links, sometimes several documents titled “read this first!”, and some information both for the team and for its customers. Such documents start out useful but rapidly turn into disasters; because they become so cumbersome to maintain, they will eventually get so obsolete that they will be fixed by only the brave or the desperate.</p>
|
||||
|
||||
<p>Luckily, such documents look intimidating, but are actually straightforward to fix: ensure that a landing page clearly identifies its purpose, and then include <em>only</em> links to other pages for more information. If something on a landing page is doing more than being a traffic cop, it is <em>not doing its job</em>. If you have a separate setup document, link to that from the landing page as a separate document. If you have too many links on the landing page (your page should not scroll multiple screens), consider breaking up the pages by taxonomy, under different sections.</p>
|
||||
|
||||
<p>Most poorly configured landing pages serve two different purposes: they are the “goto” page for someone who is a user of your product or API, or they are the home page for a team. Don’t have the page serve both masters—it will become confusing. Create a separate “team page” as an internal page apart from the main landing page. What the team needs to know is often quite different than what a customer of your API needs to know.<a contenteditable="false" data-primary="documentation" data-secondary="types of" data-startref="ix_docxtyp" data-type="indexterm" id="id-lOfRSaTZckf9"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="documentation_reviews">
|
||||
<h1>Documentation Reviews</h1>
|
||||
|
||||
<p>At Google, all code needs to be reviewed, and our code review process is well understood and accepted.<a contenteditable="false" data-primary="documentation reviews" data-type="indexterm" id="ix_docrev"> </a> In general, documentation also needs review (though this is less universally accepted). If you want to “test” whether your documentation works, you should generally have someone else review it.</p>
|
||||
|
||||
<p>A technical document benefits from three different types of reviews, each emphasizing different aspects:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A technical review, for accuracy. <a contenteditable="false" data-primary="technical reviews" data-type="indexterm" id="id-DBfASaS5SxTWu4"> </a>This review is usually done by a subject matter expert, often another member of your team. Often, this is part of a code review itself.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>An audience review, for clarity. <a contenteditable="false" data-primary="audience reviews" data-type="indexterm" id="id-BKf1SJSvUATkuo"> </a>This is usually someone unfamiliar with the domain. This might be someone new to your team or a customer of your API.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A writing review, for consistency.<a contenteditable="false" data-primary="writing reviews (for technical documents)" data-type="indexterm" id="id-wAfzSkSmHOTWua"> </a> This is often a technical writer or volunteer.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Of course, some of these lines are sometimes blurred, but if your document is high profile or might end up being externally published, you probably want to ensure that it receives more types of reviews. (We’ve used a similar review process for this book.) Any document tends to benefit from the aforementioned reviews, even if some of those reviews are ad hoc. That said, even getting one reviewer to review your text is preferable to having no one review it.</p>
|
||||
|
||||
<p class="pagebreak-before">Importantly, if documentation is tied into the engineering workflow, it will often improve over time. Most documents at Google now implicitly go through an audience review because at some point, their audience will be using them, and hopefully letting you know when they aren’t working (via bugs or other forms of feedback).</p>
|
||||
|
||||
<aside data-type="sidebar" id="callout_the_developer_guide_library">
|
||||
<h5>Case Study: The Developer Guide Library</h5>
|
||||
|
||||
<p>As mentioned earlier, there were problems associated with having most (almost all) engineering documentation contained within a shared wiki: little ownership of important documentation, competing documentation, obsolete information, and difficulty in filing bugs or issues with documentation. But this problem was not seen in some documents: the Google C++ style guide was owned by a select group of senior engineers (style arbiters) who managed it. The document was kept in good shape because certain people cared about it. They implicitly owned that document. The document was also canonical: there was only one C++ style guide.</p>
|
||||
|
||||
<p>As previously mentioned, documentation that sits directly within source code is one way to promote the establishment of canonical documents; if the documentation sits alongside the source code, it should usually be the most applicable (hopefully). At Google, each API usually has a separate <em>g3doc</em> directory where such documents live (written as Markdown files and readable within our Code Search browser). Having the documentation exist alongside the source code not only establishes de facto ownership, it makes the documentation seem more wholly “part” of the code.</p>
|
||||
|
||||
<p>Some documentation sets, however, cannot exist very logically within source code. A “C++ developer guide” for Googlers, for example, has no obvious place to sit within the source code. <a contenteditable="false" data-primary="C++" data-secondary="developer guide for Googlers" data-type="indexterm" id="id-Mef9S1Tkt0uW"> </a>There is no master “C++” directory where people will look for such information. In this case (and others that crossed API boundaries), it became useful to create standalone documentation sets in their own depot. Many of these culled together associated existing documents into a common set, with common navigation and look-and-feel. Such documents were noted as “Developer Guides” and, like the code in the codebase, were under source control in a specific documentation depot, with this depot organized by topic rather than API. Often, technical writers managed these developer guides, because they were better at explaining topics across API boundaries.</p>
|
||||
|
||||
<p>Over time, these developer guides became canonical. Users who wrote competing or supplementary documents became amenable to adding their documents to the canonical document set after it was established, and then deprecating their competing documents. Eventually, the C++ style guide became part of a larger “C++ Developer Guide.” As the documentation set became more comprehensive and more authoritative, its quality also improved. Engineers began logging bugs because they knew someone was maintaining these documents. Because the documents were locked down under source control, with proper owners, engineers also began sending changelists directly to the technical writers.</p>
|
||||
|
||||
<p>The<a contenteditable="false" data-primary="go/ links" data-secondary="use with canonical documentation" data-type="indexterm" id="id-lOfRSMsQtAu9"> </a> introduction of go/ links (see <a data-type="xref" href="ch03.html#knowledge_sharing">Knowledge Sharing</a>) allowed most documents to, in effect, more easily establish themselves as canonical on any given topic. Our C++ Developer Guide became established at “go/cpp,” for example. With better internal search, go/ links, and the integration of multiple documents into a common documentation set, such canonical documentation sets became more authoritative and robust over time.<a contenteditable="false" data-primary="documentation reviews" data-startref="ix_docrev" data-type="indexterm" id="id-e8f9Hrs0tbux"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="documentation_philosophy">
|
||||
<h1>Documentation Philosophy</h1>
|
||||
|
||||
<p>Caveat: the following section is more of a treatise on technical writing<a contenteditable="false" data-primary="documentation" data-secondary="philosophy" data-type="indexterm" id="ix_docxphil"> </a> best practices (and personal opinion) than of “how Google does it.” Consider it optional for software engineers to fully grasp, though understanding these concepts will likely allow you to more easily write technical information.</p>
|
||||
|
||||
<section data-type="sect2" id="whocomma_whatcomma_whencomma_wherecomma">
|
||||
<h2>WHO, WHAT, WHEN, WHERE, and WHY</h2>
|
||||
|
||||
<p>Most technical documentation answers a “HOW” question. <a contenteditable="false" data-primary="documentation" data-secondary="philosophy" data-tertiary="who, what, why, when, where, and how" data-type="indexterm" id="id-BKf1SbU2HYFp"> </a>How does this work? How do I program to this API? How do I set up this server? As a result, there’s a tendency for software engineers to jump straight into the “HOW” on any given document and ignore the other questions associated with it: the WHO, WHAT, WHEN, WHERE, and WHY. <a contenteditable="false" data-primary="who, what, when, where, and why questions, answering in documentation" data-type="indexterm" id="id-wAflU5UmHKFd"> </a>It’s true that none of those are generally as important as the HOW—a design document is an exception because an equivalent aspect is often the WHY—but without a proper framing of technical documentation, documents end up confusing. Try to address the other questions in the first two paragraphs of any <span class="keep-together">document:</span></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>WHO was discussed previously: that’s the audience. But sometimes you also need to explicitly call out and address the audience in a document. Example: “This document is for new engineers on the Secret Wizard project.”</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>WHAT identifies the purpose of this document: “This document is a tutorial designed to start a Frobber server in a test environment.” Sometimes, merely writing the WHAT helps you frame the document appropriately. If you start adding information that isn’t applicable to the WHAT, you might want to move that information into a separate document.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>WHEN identifies when this document was created, reviewed, or updated. Documents in source code have this date noted implicitly, and some other publishing schemes automate this as well. But, if not, make sure to note the date on which the document was written (or last revised) on the document itself.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>WHERE is often implicit as well, but decide where the document should live. Usually, the preference should be under some sort of version control, ideally <em>with the source code it documents</em>. But other formats work for different purposes as well. At Google, we often use Google Docs for easy collaboration, particularly on design issues. At some point, however, any shared document becomes less of a discussion and more of a stable historical record. At that point, move it to someplace more permanent, with clear ownership, version control, and responsibility.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>WHY sets up the purpose for the document. Summarize what you expect someone to take away from the document after reading it. A good rule of thumb is to establish the WHY in the introduction to a document. When you write the summary, verify whether you’ve met your original expectations (and revise <span class="keep-together">accordingly).</span></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_beginningcomma_middlecomma_and_end">
|
||||
<h2>The Beginning, Middle, and End</h2>
|
||||
|
||||
<p>All documents—indeed, all parts of documents—have a beginning, middle, and end. <a contenteditable="false" data-primary="documentation" data-secondary="philosophy" data-tertiary="beginning, middle, and end sections" data-type="indexterm" id="id-wAfzS5U8TKFd"> </a>Although it sounds amazingly<a contenteditable="false" data-primary="beginning, middle, and end sections for documents" data-type="indexterm" id="id-O5fJU7UpTxFL"> </a> silly, most documents should often have, at a minimum, those three sections. A document with only one section has only one thing to say, and very few documents have only one thing to say. Don’t be afraid to add sections to your document; they break up the flow into logical pieces and provide readers with a roadmap of what the document covers.</p>
|
||||
|
||||
<p>Even the simplest document usually has more than one thing to say. Our popular “<span class="keep-together">C++</span> Tips of the Week” have traditionally been very short, focusing on one small piece of advice. However, even here, having sections helps. Traditionally, the first section denotes the problem, the middle section goes through the recommended solutions, and the conclusion summarizes the takeaways. Had the document consisted of only one section, some readers would doubtless have difficulty teasing out the important points.</p>
|
||||
|
||||
<p>Most engineers loathe redundancy, and with good reason.<a contenteditable="false" data-primary="redundancy in documentation" data-type="indexterm" id="id-73feSwTWTmFq"> </a> But in documentation, redundancy is often useful. An important point buried within a wall of text can be difficult to remember or tease out. On the other hand, placing that point at a more prominent location early can lose context provided later on. Usually, the solution is to introduce and summarize the point within an introductory paragraph, and then use the rest of the section to make your case in a more detailed fashion. In this case, redundancy helps the reader understand the importance of what is being stated.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_parameters_of_good_documentation">
|
||||
<h2>The Parameters of Good Documentation</h2>
|
||||
|
||||
<p>There are usually three aspects of good <a contenteditable="false" data-primary="completeness, accuracy, and clarity in documentation" data-type="indexterm" id="id-O5f3S7U8IxFL"> </a>documentation: completeness, accuracy, and clarity.<a contenteditable="false" data-primary="documentation" data-secondary="philosophy" data-tertiary="parameters of good documentation" data-type="indexterm" id="id-73fVUaU6ImFq"> </a> You rarely get all three within the same document; as you try to make a document more “complete,” for example, clarity can begin to suffer. If you try to document every possible use case of an API, you might end up with an incomprehensible mess.<a contenteditable="false" data-primary="programming languages" data-secondary="documenting" data-type="indexterm" id="id-XEfzH4UeIGFB"> </a> For programming languages, being completely accurate in all cases (and documenting all possible side effects) can also affect clarity. For other documents, trying to be clear about a complicated topic can subtly affect the accuracy of the document; you might decide to ignore some rare side effects in a conceptual document, for example, because the point of the document is to familiarize someone with the usage of an API, not provide a dogmatic overview of all intended behavior.</p>
|
||||
|
||||
<p>In each case, a “good document” is defined as the document that is <em>doing its intended job</em>. As a result, you rarely want a document doing more than one job. For each document (and for each document type), decide on its focus and adjust the writing appropriately. Writing a conceptual document? You probably don’t need to cover every part of the API. Writing a reference? You probably want this complete, but perhaps must sacrifice some clarity. Writing a landing page? Focus on organization and keep discussion to a minimum. All of this adds up to quality, which, admittedly, is stubbornly difficult to accurately measure.</p>
|
||||
|
||||
<p>How can you quickly improve the quality of a document? Focus on the needs of the audience. Often, less is more. For example, one mistake engineers often make is adding design decisions or implementation details to an API document. Much like you should ideally separate the interface from an implementation within a well-designed API, you should avoid discussing design decisions in an API document. Users don’t need to know this information. Instead, put those decisions in a specialized document for that purpose (usually a design doc).</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="deprecating_documents">
|
||||
<h2>Deprecating Documents</h2>
|
||||
|
||||
<p>Just like old code can cause problems, so can old documents.<a contenteditable="false" data-primary="deprecation" data-secondary="of old documentation" data-type="indexterm" id="id-73feSaUmsmFq"> </a><a contenteditable="false" data-primary="documentation" data-secondary="philosophy" data-tertiary="deprecating documents" data-type="indexterm" id="id-XEfYU4U0sGFB"> </a> Over time, documents become stale, obsolete, or (often) abandoned. Try as much as possible to avoid abandoned documents, but when a document no longer serves any purpose, either remove it or identify it as obsolete (and, if available, indicate where to go for new information). Even for unowned documents, someone adding a note that “This no longer works!” is more helpful than saying nothing and leaving something that seems authoritative but no longer works.</p>
|
||||
|
||||
<p>At Google, we often attach “freshness dates” to documentation. Such documents note the last time a document was reviewed, and metadata in the documentation set will send email reminders when the document hasn’t been touched in, for example, three months. Such freshness dates, as shown in the following example—and tracking your documents as bugs—can help make a documentation set easier to maintain over time, which is the main concern for a document:</p>
|
||||
|
||||
<div data-type="example" id="id-XacwTksDFn">
|
||||
<pre data-type="programlisting"><!--*
|
||||
# Document freshness: For more information, see <a class="orm:hideurl" href="https://goto.google.com/fresh-source">go/fresh-source</a>.
|
||||
freshness: { owner: `username` reviewed: '2019-02-27' }
|
||||
*--></pre>
|
||||
</div>
|
||||
|
||||
<p>Users who own such a document have an incentive to keep that freshness date current (and if the document is under source control, that requires a code review). As a result, it’s a low-cost means to ensure that a document is looked over from time to time. At Google, we found that including the owner of a document in this freshness date within the document itself with a byline of “Last reviewed by...” led to increased adoption as well.<a contenteditable="false" data-primary="documentation" data-secondary="philosophy" data-startref="ix_docxphil" data-type="indexterm" id="id-xYf5SqIJspFr"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="when_do_you_need_technical_writersquest">
|
||||
<h1>When Do You Need Technical Writers?</h1>
|
||||
|
||||
<p>When<a contenteditable="false" data-primary="documentation" data-secondary="when you need technical writers" data-type="indexterm" id="id-DBfASbUGiB"> </a> Google was young and growing, there weren’t enough technical writers in software engineering. (That’s still the case.) Those projects deemed important tended to receive a technical writer, regardless of whether that team really needed one.<a contenteditable="false" data-primary="technical writers, writing documentation" data-type="indexterm" id="id-BKf9UbU3iB"> </a> The idea was that the writer could relieve the team of some of the burden of writing and maintaining documents and (theoretically) allow the important project to achieve greater velocity. This turned out to be a bad assumption.</p>
|
||||
|
||||
<p>We learned that most engineering teams can write documentation for themselves (their team) perfectly fine; it’s only when they are writing documents for another audience that they tend to need help because it’s difficult to write to another audience. The feedback loop within your team regarding documents is more immediate, the domain knowledge and assumptions are clearer, and the perceived needs are more obvious. Of course, a technical writer can often do a better job with grammar and organization, but supporting a single team isn’t the best use of a limited and specialized resource; it doesn’t scale. It introduced a perverse incentive: become an important project and your software engineers won’t need to write documents. Discouraging engineers from writing documents turns out to be the opposite of what you want to do.</p>
|
||||
|
||||
<p>Because they are a limited resource, technical writers should generally focus on tasks that software engineers <em>don’t</em> need to do as part of their normal duties. Usually, this involves writing documents that cross API boundaries. Project Foo might clearly know what documentation Project Foo needs, but it probably has a less clear idea what Project Bar needs. A technical writer is better able to stand in as a person unfamiliar with the domain. In fact, it’s one of their critical roles: to challenge the assumptions your team makes about the utility of your project. It’s one of the reasons why many, if not most, software engineering technical writers tend to focus on this specific type of API documentation.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00014">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Google has made good strides in addressing documentation quality over the past decade, but to be frank, documentation at Google is not yet a first-class citizen. For comparison, engineers have gradually accepted that testing is necessary for any code change, no matter how small. As well, testing tooling is robust, varied and plugged into an engineering workflow at various points. Documentation is not ingrained at nearly the same level.</p>
|
||||
|
||||
<p>To be fair, there’s not necessarily the same need to address documentation as with testing. Tests can be made atomic (unit tests) and can follow prescribed form and function. Documents, for the most part, cannot. Tests can be automated, and schemes to automate documentation are often lacking. Documents are necessarily subjective; the quality of the document is measured not by the writer, but by the reader, and often quite asynchronously. That said, there is a recognition that documentation is important, and processes around document development are improving. In this author’s opinion, the quality of documentation at Google is better than in most software engineering shops.</p>
|
||||
|
||||
<p>To change the quality of engineering documentation, engineers—and the entire engineering organization—need to accept that they are both the problem and the solution. Rather than throw up their hands at the state of documentation, they need to realize that producing quality documentation is part of their job and saves them time and effort in the long run. For any piece of code that you expect to live more than a few months, the extra cycles you put in documenting that code will not only help others, it will help you maintain that code as well.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00112">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Documentation is hugely important over time and scale.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Documentation changes should leverage the existing developer workflow.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Keep documents focused on one purpose.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Write for your audience, not yourself.<a contenteditable="false" data-primary="documentation" data-startref="ix_docx" data-type="indexterm" id="id-XEfpSmSKT6UYUp"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn113"><sup><a href="ch10.html#ch01fn113-marker">1</a></sup>OK, you will need to maintain it and revise it occasionally.</p><p data-type="footnote" id="ch01fn114"><sup><a href="ch10.html#ch01fn114-marker">2</a></sup>English is still the primary language for most programmers, and most technical documentation for programmers relies on an understanding of English.</p><p data-type="footnote" id="ch01fn115"><sup><a href="ch10.html#ch01fn115-marker">3</a></sup>When we deprecated GooWiki, we found that around 90% of the documents had no views or updates in the previous few months.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
434
clones/abseil.io/resources/swe-book/html/ch11.html
Normal file
|
@ -0,0 +1,434 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="testing_overview">
|
||||
<h1>Testing Overview</h1>
|
||||
|
||||
<p class="byline">Written by Adam Bender</p>
|
||||
|
||||
<p class="byline">Edited by Tom Manshreck</p>
|
||||
|
||||
<p>Testing has always been a part of programming. <a contenteditable="false" data-primary="testing" data-type="indexterm" id="ix_tst1"> </a>In fact, the first time you wrote a computer program you almost certainly threw some sample data at it to see whether it performed as you expected. For a long time, the state of the art in software testing resembled a very similar process, largely manual and error prone. However, since the early 2000s, the software industry’s approach to testing has evolved dramatically to cope with the size and complexity of modern software systems. Central to that evolution has been the practice of developer-driven, automated testing.</p>
|
||||
|
||||
<p>Automated testing can prevent bugs from escaping into the wild and affecting your users. <a contenteditable="false" data-primary="bugs" data-secondary="catching later in development, costs of" data-type="indexterm" id="id-RaUrskhd"> </a>The later in the development cycle a bug is caught, the more expensive it is; exponentially so in many cases.<sup><a data-type="noteref" id="ch01fn116-marker" href="ch11.html#ch01fn116">1</a></sup> However, “catching bugs” is only part of the motivation. An equally important reason why you want to test your software is to support the ability to change. Whether you’re adding new features, doing a refactoring focused on code health, or undertaking a larger redesign, automated testing can quickly catch mistakes, and this makes it possible to change software with confidence.</p>
|
||||
|
||||
<p>Companies that can iterate faster can adapt more rapidly to changing technologies, market conditions, and customer tastes. If you have a robust testing practice, you needn’t fear change—you can embrace it as an essential quality of developing software. The more and faster you want to change your systems, the more you need a fast way to test them.</p>
|
||||
|
||||
<p class="pagebreak-before">The act of writing tests also improves the design of your systems. As the first clients of your code, a test can tell you much about your design choices. Is your system too tightly coupled to a database? Does the API support the required use cases? Does your system handle all of the edge cases? Writing automated tests forces you to confront these issues early on in the development cycle. Doing so generally leads to more modular software that enables greater flexibility later on.</p>
|
||||
|
||||
<p>Much ink has been spilled about the subject of testing software, and for good reason: for such an important practice, doing it well still seems to be a mysterious craft to many. At Google, while we have come a long way, we still face difficult problems getting our processes to scale reliably across the company. In this chapter, we’ll share what we have learned to help further the conversation.</p>
|
||||
|
||||
<section data-type="sect1" id="why_do_we_write_testsquestion_mark">
|
||||
<h1>Why Do We Write Tests?</h1>
|
||||
|
||||
<p>To better understand how to get the most out of testing, let’s start from the beginning. <a contenteditable="false" data-primary="testing" data-secondary="reasons for writing tests" data-type="indexterm" id="ix_tstwhy"> </a>When we talk about automated testing, what are we really talking about?</p>
|
||||
|
||||
<p>The simplest test is defined by:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A single behavior you are testing, usually a method or API that you are calling</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A specific input, some value that you pass to the API</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>An observable output or behavior</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A controlled environment such as a single isolated process</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>When you execute a test like this, passing the input to the system and verifying the output, you will learn whether the system behaves as you expect. Taken in aggregate, hundreds or thousands of simple tests (usually called a <em>test suite</em>) can tell you how well your entire product conforms to its intended design and, more important, when it doesn’t.<a contenteditable="false" data-primary="test suite" data-type="indexterm" id="id-1JUaCzhOcR"> </a></p>
|
||||
|
||||
<p>Creating and maintaining a healthy test suite takes real effort. As a codebase grows, so too will the test suite. It will begin to face challenges like instability and slowness. A failure to address these problems will cripple a test suite. Keep in mind that tests derive their value from the trust engineers place in them. If testing becomes a productivity sink, constantly inducing toil and uncertainty, engineers will lose trust and begin to find workarounds. A bad test suite can be worse than no test suite at all.</p>
|
||||
|
||||
<p>In addition to empowering companies to build great products quickly, testing is becoming critical to ensuring the safety of important products and services in our lives. Software is more involved in our lives than ever before, and defects can cause more than a little annoyance: they can cost massive amounts of money, loss of property, or, worst of all, loss of life.<sup><a data-type="noteref" id="ch01fn117-marker" href="ch11.html#ch01fn117">2</a></sup></p>
|
||||
|
||||
<p>At Google, we have determined that testing cannot be an afterthought. Focusing on quality and testing is part of how we do our jobs. We have learned, sometimes painfully, that failing to build quality into our products and services inevitably leads to bad outcomes. As a result, we have built testing into the heart of our engineering <span class="keep-together">culture.</span></p>
|
||||
|
||||
<section data-type="sect2" id="the_story_of_google_web_server">
|
||||
<h2>The Story of Google Web Server</h2>
|
||||
|
||||
<p>In Google’s early days, engineer-driven testing was often assumed to be of little importance.<a contenteditable="false" data-primary="testing" data-secondary="reasons for writing tests" data-tertiary="Google Web Server, story of" data-type="indexterm" id="id-XKUVsrCrcVcx"> </a> Teams regularly relied on smart people to get the software right. A few systems ran large integration tests, but mostly it was the Wild West. <a contenteditable="false" data-primary="Google Web Server (GWS)" data-type="indexterm" id="id-rbUOCpCDcGcN"> </a>One product in particular seemed to suffer the worst: it was called the Google Web Server, also known as GWS.</p>
|
||||
|
||||
<p>GWS is the web server responsible for serving Google Search queries and is as important to Google Search as air traffic control is to an airport. Back in 2005, as the project swelled in size and complexity, productivity had slowed dramatically. Releases were becoming buggier, and it was taking longer and longer to push them out. Team members had little confidence when making changes to the service, and often found out something was wrong only when features stopped working in production. (At one point, more than 80% of production pushes contained user-affecting bugs that had to be rolled back.)</p>
|
||||
|
||||
<p>To address these problems, the tech lead (TL) of GWS decided to institute a policy of engineer-driven, automated testing. As part of this policy, all new code changes were required to include tests, and those tests would be run continuously. Within a year of instituting this policy, the number of emergency pushes <em>dropped by half</em>. This drop occurred despite the fact that the project was seeing a record number of new changes every quarter. Even in the face of unprecedented growth and change, testing brought renewed productivity and confidence to one of the most critical projects at Google. Today, GWS has tens of thousands of tests, and releases almost every day with relatively few customer-visible failures.</p>
|
||||
|
||||
<p>The changes in GWS marked a watershed for testing culture at Google as teams in other parts of the company saw the benefits of testing and moved to adopt similar tactics.</p>
|
||||
|
||||
<p class="pagebreak-before">One of the key insights the GWS experience taught us was that you can’t rely on programmer ability alone to avoid product defects. <a contenteditable="false" data-primary="bugs" data-secondary="not prevented by programmer ability alone" data-type="indexterm" id="id-NxU3sySyc3cv"> </a>Even if each engineer writes only the occasional bug, after you have enough people working on the same project, you will be swamped by the ever-growing list of defects. Imagine a hypothetical 100-person team whose engineers are so good that they each write only a single bug a month. Collectively, this group of amazing engineers still produces five new bugs every workday. Worse yet, in a complex system, fixing one bug can often cause another, as engineers adapt to known bugs and code around them.</p>
|
||||
|
||||
<p>The best teams find ways to turn the collective wisdom of its members into a benefit for the entire team. That is exactly what automated testing does. After an engineer on the team writes a test, it is added to the pool of common resources available to others. Everyone else on the team can now run the test and will benefit when it detects an issue. <a contenteditable="false" data-primary="debugging versus testing" data-type="indexterm" id="id-63UzsRTEc8c8"> </a>Contrast this with an approach based on debugging, wherein each time a bug occurs, an engineer must pay the cost of digging into it with a debugger. The cost in engineering resources is night and day and was the fundamental reason GWS was able to turn its fortunes around.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="testing_at_the_speed_of_modern_developm">
|
||||
<h2>Testing at the Speed of Modern Development</h2>
|
||||
|
||||
<p>Software systems are growing larger and ever more complex.<a contenteditable="false" data-primary="testing" data-secondary="automating to keep up with modern development" data-type="indexterm" id="id-rbUlspCXfGcN"> </a> A typical application or service at Google is made up of thousands or millions of lines of code. It uses hundreds of libraries or frameworks and must be delivered via unreliable networks to an increasing number of platforms running with an uncountable number of configurations. To make matters worse, new versions are pushed to users frequently, sometimes multiple times each day. This is a far cry from the world of shrink-wrapped software that saw updates only once or twice a year.</p>
|
||||
|
||||
<p>The ability for humans to manually validate every behavior in a system has been unable to keep pace with the explosion of features and platforms in most software. Imagine what it would take to manually test all of the functionality of<a contenteditable="false" data-primary="Google Search" data-secondary="manually testing functionality of" data-type="indexterm" id="id-dVU0sLHPfPc7"> </a> Google Search, like finding flights, movie times, relevant images, and of course web search results (see <a data-type="xref" href="ch11.html#screenshots_of_two_complex_google_searc">Figure 11-1</a>). Even if you can determine how to solve that problem, you then need to multiply that workload by every language, country, and device Google Search must support, and don’t forget to check for things like accessibility and security. Attempting to assess product quality by asking humans to manually interact with every feature just doesn’t scale. When it comes to testing, there is one clear answer: automation.</p>
|
||||
|
||||
<figure id="screenshots_of_two_complex_google_searc"><img alt="Screenshots of two complex Google search results" src="images/seag_1101.png">
|
||||
<figcaption><span class="label">Figure 11-1. </span>Screenshots of two complex Google search results</figcaption>
|
||||
</figure>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="writecomma_runcomma_react">
|
||||
<h2>Write, Run, React</h2>
|
||||
|
||||
<p>In its purest form, automating<a contenteditable="false" data-primary="testing" data-secondary="write, run, react in automating testing" data-type="indexterm" id="ix_tstwrrre"> </a> testing consists of three activities: writing tests, running tests, and reacting to test failures. An automated test is a small bit of code, usually a single function or method, that calls into an isolated part of a larger system that you want to test. The test code sets up an expected environment, calls into the system, usually with a known input, and verifies the result. Some of the tests are very small, exercising a single code path; others are much larger and can involve entire systems, like a mobile operating system or web browser.</p>
|
||||
|
||||
<p><a data-type="xref" href="ch11.html#an_example_test">An example test</a> presents a deliberately simple test in Java using no frameworks or testing libraries. This is not how you would write an entire test suite, but at its core every automated test looks similar to this very simple example.</p>
|
||||
|
||||
<div data-type="example" id="an_example_test">
|
||||
<h5><span class="label">Example 11-1. </span>An example test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">// Verifies a Calculator class can handle negative results.
|
||||
public void main(String[] args) {
|
||||
Calculator calculator = new Calculator();
|
||||
int expectedResult = -3;
|
||||
int actualResult = calculator.subtract(2, 5); // Given 2, Subtracts 5.
|
||||
assert(expectedResult == actualResult);
|
||||
}
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
<p>Unlike the QA processes of yore, in which rooms of dedicated software testers pored over new versions of a system, exercising every possible behavior, the engineers who build systems today play an active and integral role in writing and running automated tests for their own code. Even in companies where QA is a prominent organization, developer-written tests are commonplace. At the speed and scale that today’s systems are being developed, the only way to keep up is by sharing the development of tests around the entire engineering staff.</p>
|
||||
|
||||
<p>Of course, writing tests is different from writing <em>good tests</em>. It can be quite difficult to train tens of thousands of engineers to write good tests. We will discuss what we have learned about writing good tests in the chapters that follow.</p>
|
||||
|
||||
<p>Writing tests is only the first step in the process of automated testing. After you have written tests, you need to run them. Frequently. At its core, automated testing consists of repeating the same action over and over, only requiring human attention when something breaks. We will discuss this Continuous Integration (CI) and testing in <a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a>. By expressing<a contenteditable="false" data-primary="code" data-secondary="expressing tests as" data-type="indexterm" id="id-2KUPCvTjUAcx"> </a> tests as code instead of a manual series of steps, we can run them every time the code changes—easily thousands of times per day. Unlike human testers, machines never grow tired or bored.</p>
|
||||
|
||||
<p>Another benefit of having tests expressed as code is that it is easy to modularize them for execution in various environments. Testing the behavior of Gmail in Firefox requires no more effort than doing so in Chrome, provided you have configurations for both of these systems.<sup><a data-type="noteref" id="ch01fn118-marker" href="ch11.html#ch01fn118">3</a></sup> Running tests for a user interface (UI) in Japanese or German can be done using the same test code as for English.</p>
|
||||
|
||||
<p>Products and services under active development will inevitably experience test failures.<a contenteditable="false" data-primary="failures" data-secondary="addressing test failures" data-type="indexterm" id="id-xbUpsDcmUVcV"> </a> What really makes a testing process effective is how it addresses test failures. Allowing failing tests to pile up quickly defeats any value they were providing, so it is imperative not to let that happen. Teams that prioritize fixing a broken test within minutes of a failure are able to keep confidence high and failure isolation fast, and therefore derive more value out of their tests.<a contenteditable="false" data-primary="culture" data-secondary="healthy automated testing culture" data-type="indexterm" id="id-wbUlC4cQUAcJ"> </a></p>
|
||||
|
||||
<p>In summary, a healthy automated testing culture encourages everyone to share the work of writing tests. Such a culture also ensures that tests are run regularly. Last, and perhaps most important, it places an emphasis on fixing broken tests quickly so as to maintain high confidence in the process.<a contenteditable="false" data-primary="testing" data-secondary="write, run, react in automating testing" data-startref="ix_tstwrrre" data-type="indexterm" id="id-wbUvsdfQUAcJ"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="benefits_of_testing_code">
|
||||
<h2>Benefits of Testing Code</h2>
|
||||
|
||||
<p>To developers coming<a contenteditable="false" data-primary="code" data-secondary="benefits of testing" data-type="indexterm" id="ix_cdtst"> </a> from organizations that<a contenteditable="false" data-primary="testing" data-secondary="benefits of testing code" data-type="indexterm" id="ix_tstcd"> </a> don’t have a strong testing culture, the idea of writing tests as a means of improving productivity and velocity might seem antithetical. After all, the act of writing tests can take just as long (if not longer!) than implementing a feature would take in the first place. On the contrary, at Google, we’ve found that investing in software tests provides several key benefits to developer <span class="keep-together">productivity:</span></p>
|
||||
|
||||
<dl>
|
||||
<dt>Less debugging</dt>
|
||||
<dd>As you would expect, tested code has fewer defects when it is submitted. Critically, it also has fewer defects throughout its existence; most of them will be caught before the code is submitted. A piece of code at Google is expected to be modified dozens of times in its lifetime. It will be changed by other teams and even automated code maintenance systems. A test written once continues to pay dividends and prevent costly defects and annoying debugging sessions through the lifetime of the project. Changes to a project, or the dependencies of a project, that break a test can be quickly detected by test infrastructure and rolled back before the problem is ever released to production.</dd>
|
||||
<dt>Increased confidence in changes</dt>
|
||||
<dd>All software changes. Teams with good tests can review and accept changes to their project with confidence because all important behaviors of their project are continuously verified. Such projects encourage refactoring. Changes that refactor code while preserving existing behavior should (ideally) require no changes to existing tests.</dd>
|
||||
<dt>Improved documentation</dt>
|
||||
<dd>Software documentation is notoriously unreliable. From outdated requirements to missing edge cases, it is common for documentation to have a tenuous relationship to the code. Clear, focused tests that exercise one behavior at a time function as executable documentation. If you want to know what the code does in a particular case, look at the test for that case. Even better, when requirements change and new code breaks an existing test, we get a clear signal that the “documentation” is now out of date. Note that tests work best as documentation only if care is taken to keep them clear and concise.</dd>
|
||||
<dt>Simpler reviews</dt>
|
||||
<dd>All code at Google is reviewed by at least one other engineer before it can be submitted (see <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a> for more details). A code reviewer spends less effort verifying code works as expected if the code review includes thorough tests that demonstrate code correctness, edge cases, and error conditions. Instead of the tedious effort needed to mentally walk each case through the code, the reviewer can verify that each case has a passing test.</dd>
|
||||
<dt>Thoughtful design</dt>
|
||||
<dd>Writing tests for new code is a practical means of exercising the API design of the code itself. If new code is difficult to test, it is often because the code being tested has too many responsibilities or difficult-to-manage dependencies. Well-designed code should be modular, avoiding tight coupling and focusing on specific responsibilities. Fixing design issues early often means less rework later.</dd>
|
||||
<dt>Fast, high-quality releases</dt>
|
||||
<dd>With a healthy automated test suite, teams can release new versions of their application with confidence. Many projects at Google release a new version to production every day—even large projects with hundreds of engineers and thousands of code changes submitted every day. This would not be possible<a contenteditable="false" data-primary="testing" data-secondary="benefits of testing code" data-startref="ix_tstcd" data-type="indexterm" id="id-gbUmseu2HZuqc0"> </a> without<a contenteditable="false" data-primary="code" data-secondary="benefits of testing" data-startref="ix_cdtst" data-type="indexterm" id="id-LkUXCzuRHVuxc8"> </a> automated testing.<a contenteditable="false" data-primary="testing" data-secondary="reasons for writing tests" data-startref="ix_tstwhy" data-type="indexterm" id="id-7eUwHXuLHdubcv"> </a></dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="designing_a_test_suite">
|
||||
<h1>Designing a Test Suite</h1>
|
||||
|
||||
<p>Today, Google operates at a massive scale, but we<a contenteditable="false" data-primary="testing" data-secondary="designing a test suite" data-type="indexterm" id="ix_tstdessu"> </a> haven’t always been so large, and the foundations of our approach were laid long ago. Over the years, as our codebase has grown, we have learned a lot about how to approach the design and execution of a test suite, often by making mistakes and cleaning up afterward.</p>
|
||||
|
||||
<p>One of the lessons we learned fairly early on is that engineers favored writing larger, system-scale tests, but that these tests were slower, less reliable, and more difficult to debug than smaller tests. Engineers, fed up with debugging the system-scale tests, asked themselves, “Why can’t we just test one server at a time?” or, “Why do we need to test a whole server at once? We could test smaller modules individually.” Eventually, the desire to reduce pain led teams to develop smaller and smaller tests, which turned out to be faster, more stable, and generally less painful.</p>
|
||||
|
||||
<p>This led to a lot of discussion around the company about the exact meaning of “small.” Does small mean unit test? What about integration tests, what size are those? We have come to the conclusion that there are two distinct dimensions for every test case: size and scope. Size refers to the resources that are required to run a test case: things like memory, processes, and time. Scope refers to the specific code paths we are verifying. Note that executing a line of code is different from verifying that it worked as expected. Size and scope are interrelated but distinct concepts.</p>
|
||||
|
||||
<section data-type="sect2" id="test_size">
|
||||
<h2>Test Size</h2>
|
||||
|
||||
<p>At Google, we classify every one of our tests into a size<a contenteditable="false" data-primary="test sizes" data-type="indexterm" id="ix_tstsz"> </a> and encourage engineers to always write<a contenteditable="false" data-primary="testing" data-secondary="designing a test suite" data-tertiary="test size" data-type="indexterm" id="id-DOU0C1CzhmfQ"> </a> the smallest possible test for a given piece of functionality. A test’s size is determined not by its number of lines of code, but by how it runs, what it is allowed to do, and how many resources it consumes. In fact, in some cases, our definitions of small, medium, and large are actually encoded as constraints the testing infrastructure can enforce on a test. We go into the details in a moment, but in brief, <em>small tests</em> run in a single process, <em>medium tests</em> run on a single machine, and <em>large tests</em> run wherever they want, as demonstrated in <a data-type="xref" href="ch11.html#test_sizes">Figure 11-2</a>.<sup><a data-type="noteref" id="ch01fn119-marker" href="ch11.html#ch01fn119">4</a></sup></p>
|
||||
|
||||
<figure id="test_sizes"><img alt="Test sizes" src="images/seag_1103.png">
|
||||
<figcaption><span class="label">Figure 11-2. </span>Test sizes</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>We make this distinction, as opposed to the more traditional “unit” or “integration,” because the most important qualities we want from our test suite are speed and determinism, regardless of the scope of the test. Small tests, regardless of the scope, are almost always faster and more deterministic than tests that involve more infrastructure or consume more resources. Placing restrictions on small tests makes speed and determinism much easier to achieve. As test sizes grow, many of the restrictions are relaxed. Medium tests have more flexibility but also more risk of nondeterminism. Larger tests are saved for only the most complex and difficult testing scenarios. Let’s take a closer look at the exact constraints imposed on each type of test.</p>
|
||||
|
||||
<section data-type="sect3" id="small_tests">
|
||||
<h3>Small tests</h3>
|
||||
|
||||
<p>Small tests are the most constrained of the three test sizes. <a contenteditable="false" data-primary="test sizes" data-secondary="small tests" data-type="indexterm" id="id-rbUlspCdh1h1fy"> </a><a contenteditable="false" data-primary="small tests" data-type="indexterm" id="id-dVUnCPC9hqhNfD"> </a>The primary constraint is that small tests must run in a single process. In many languages, we restrict this even further to say that they must run on a single thread. This means that the code performing the test must run in the same process as the code being tested. You can’t run a server and have a separate test process connect to it. It also means that you can’t run a third-party program such as a database as part of your test.</p>
|
||||
|
||||
<p>The other important constraints on small tests are that they aren’t allowed to sleep, perform I/O operations,<sup><a data-type="noteref" id="ch01fn120-marker" href="ch11.html#ch01fn120">5</a></sup> or make any other blocking calls. This means that small tests aren’t allowed to access the network or disk. Testing code that relies on these sorts of operations requires the use of test doubles (see <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>) to replace the heavyweight dependency with a lightweight, in-process dependency.</p>
|
||||
|
||||
<p>The purpose of these restrictions is to ensure that small tests don’t have access to the main sources of test slowness or nondeterminism. A test that runs on a single process and never makes blocking calls can effectively run as fast as the CPU can handle. It’s difficult (but certainly not impossible) to accidentally make such a test slow or nondeterministic. The constraints on small tests provide a sandbox that prevents engineers from shooting themselves in the foot.</p>
|
||||
|
||||
<p>These restrictions might seem excessive at first, but consider a modest suite of a couple hundred small test cases running throughout the day.<a contenteditable="false" data-primary="flaky tests" data-type="indexterm" id="id-NxU3sRhNhvhGfr"> </a> If even a few of them fail nondeterministically (often called <a href="https://oreil.ly/NxC4A">flaky tests</a>), tracking down the cause becomes a serious drain on productivity.<a contenteditable="false" data-primary="nondeterministic behavior in tests" data-type="indexterm" id="id-pbUoHxh0hAhyf0"> </a> At Google’s scale, such a problem could grind our testing infrastructure to a halt.</p>
|
||||
|
||||
<p>At Google, we encourage engineers to try to write small tests whenever possible, regardless of the scope of the test, because it keeps the entire test suite running fast and reliably. For more discussion on small versus unit tests, see <a data-type="xref" href="ch12.html#unit_testing">Unit Testing</a>.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="medium_tests">
|
||||
<h3>Medium tests</h3>
|
||||
|
||||
<p>The constraints placed on small tests can be too restrictive for many interesting kinds of tests. <a contenteditable="false" data-primary="test sizes" data-secondary="medium tests" data-type="indexterm" id="id-dVU0sPC8SqhNfD"> </a><a contenteditable="false" data-primary="medium tests" data-type="indexterm" id="id-nbUNCmCYSQhGfk"> </a>The next rung up the ladder of test sizes is the medium test. Medium tests can span multiple processes, use threads, and can make blocking calls, including network calls, to <code>localhost</code>. The only remaining restriction is that medium tests aren’t allowed to make network calls to any system other than <code>localhost</code>. In other words, the test must be contained within a single machine.</p>
|
||||
|
||||
<p>The ability to run multiple processes opens up a lot of possibilities. For example, you could run a database instance to validate that the code you’re testing integrates correctly in a more realistic setting. Or you could test a combination of web UI and server code. Tests of web applications often involve tools like <a href="https://oreil.ly/W27Uf">WebDriver</a> that start a real browser and control it remotely via the test process.</p>
|
||||
|
||||
<p>Unfortunately, with increased flexibility comes increased potential for tests to become slow and nondeterministic. Tests that span processes or are allowed to make blocking calls are dependent on the operating system and third-party processes to be fast and deterministic, which isn’t something we can guarantee in general. Medium tests still provide a bit of protection by preventing access to remote machines via the network, which is far and away the biggest source of slowness and nondeterminism in most systems. Still, when writing medium tests, the "safety" is off, and engineers need to be much more careful.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="large_tests">
|
||||
<h3>Large tests</h3>
|
||||
|
||||
<p>Finally, we have<a contenteditable="false" data-primary="test sizes" data-secondary="large tests" data-type="indexterm" id="id-nbU0smCXTQhGfk"> </a> large tests. <a contenteditable="false" data-primary="large tests" data-seealso="larger testing" data-type="indexterm" id="id-NxURCzC1TvhGfr"> </a>Large tests remove the <code>localhost</code> restriction imposed on medium tests, allowing the test and the system being tested to span across multiple machines. For example, the test might run against a system in a remote cluster.</p>
|
||||
|
||||
<p>As before, increased flexibility comes with increased risk. Having to deal with a system that spans multiple machines and the network connecting them increases the chance of slowness and nondeterminism significantly compared to running on a single machine. We mostly reserve large tests for full-system end-to-end tests that are more about validating configuration than pieces of code, and for tests of legacy components for which it is impossible to use test doubles. We’ll talk more about use cases for large tests in <a data-type="xref" href="ch14.html#larger_testing">Larger Testing</a>. Teams at Google will frequently isolate their large tests from their small or medium tests, running them only during the build and release process so as not to impact developer workflow.</p>
|
||||
|
||||
<aside class="pagebreak-before less_space" data-type="sidebar" id="flaky_tests_are_expensive">
|
||||
<h5>Case Study: Flaky Tests Are Expensive</h5>
|
||||
|
||||
<p>If you have a few thousand tests, each with a very tiny bit of nondeterminism, running all day, occasionally one will probably fail (flake). <a contenteditable="false" data-primary="flaky tests" data-secondary="expense of" data-type="indexterm" id="id-pbU8szCZt6TahxfO"> </a>As the number of tests grows, statistically so will the number of flakes. If each test has even a 0.1% of failing when it should not, and you run 10,000 tests per day, you will be investigating 10 flakes per day. Each investigation takes time away from something more productive that your team could be doing.</p>
|
||||
|
||||
<p>In some cases, you can limit the impact of flaky tests by automatically rerunning them when they fail. This is effectively trading CPU cycles for engineering time. At low levels of flakiness, this trade-off makes sense. Just keep in mind that rerunning a test is only delaying the need to address the root cause of flakiness.</p>
|
||||
|
||||
<p>If test flakiness continues to grow, you will experience something much worse than lost productivity: a loss of confidence in the tests. It doesn’t take needing to investigate many flakes before a team loses trust in the test suite. After that happens, engineers will stop reacting to test failures, eliminating any value the test suite provided. Our experience suggests that as you approach 1% flakiness, the tests begin to lose value. At Google, our flaky rate hovers around 0.15%, which implies thousands of flakes every day. We fight hard to keep flakes in check, including actively investing engineering hours to fix them.</p>
|
||||
|
||||
<p>In most cases, flakes <a contenteditable="false" data-primary="nondeterministic behavior in tests" data-type="indexterm" id="id-xbUpsVh0toTvhJf1"> </a>appear because of nondeterministic behavior in the tests themselves. Software provides many sources of nondeterminism: clock time, thread scheduling, network latency, and more. Learning how to isolate and stabilize the effects of randomness is not easy. Sometimes, effects are tied to low-level concerns like hardware interrupts or browser rendering engines. A good automated test infrastructure should help engineers identify and mitigate any nondeterministic behavior.</p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="properties_common_to_all_test_sizes">
|
||||
<h3>Properties common to all test sizes</h3>
|
||||
|
||||
<p>All tests should strive to be hermetic: a test should contain all of the information necessary to set up, execute, and tear down its environment.<a contenteditable="false" data-primary="test sizes" data-secondary="properties common to all sizes" data-type="indexterm" id="id-NxU3szCOIvhGfr"> </a> Tests should assume as little as possible about the outside environment, such as the order in which the tests are run. For example, they should not rely on a shared database. This constraint becomes more challenging with larger tests, but effort should still be made to ensure isolation.</p>
|
||||
|
||||
<p>A test should contain <em>only</em> the information required to exercise the behavior in question. Keeping tests clear and simple aids reviewers in verifying that the code does what it says it does. <a contenteditable="false" data-primary="failures" data-secondary="clear code aiding in diagnosing test failures" data-type="indexterm" id="id-pbUyCRHKIAhyf0"> </a>Clear code also aids in diagnosing failure when they fail. We like to say that “a test should be obvious upon inspection.” Because there are no tests for the tests themselves, they require manual review as an important check on correctness. As a corollary to this, we also <a href="https://oreil.ly/fQSuk">strongly discourage the use of control flow statements like conditionals and loops in a test</a>. More complex test flows risk containing bugs themselves and make it more difficult to determine the cause of a test failure.</p>
|
||||
|
||||
<p>Remember that tests are often revisited only when something breaks. When you are called to fix a broken test that you have never seen before, you will be thankful someone took the time to make it easy to understand. Code is read far more than it is written, so make sure you write the test you’d like to read!</p>
|
||||
|
||||
<section data-type="sect4" id="test_sizes_in_practice">
|
||||
<h4>Test sizes in practice</h4>
|
||||
|
||||
<p>Having precise definitions of test sizes has allowed us to create tools to enforce them. <a contenteditable="false" data-primary="test sizes" data-secondary="in practice" data-type="indexterm" id="id-2KUNsZCZh3IEhOfG"> </a>Enforcement enables us to scale our test suites and still make certain guarantees about speed, resource utilization, and stability. The extent to which these definitions are enforced at Google varies by language. For example, we run all Java tests using a custom security manager that will cause all tests tagged as small to fail if they attempt to do something prohibited, such as establish a network <span class="keep-together">connection.</span></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="test_scope">
|
||||
<h2>Test Scope</h2>
|
||||
|
||||
<p>Though we at Google put a lot of emphasis on test size, another important property to consider is test scope.<a contenteditable="false" data-primary="scope of tests" data-type="indexterm" id="ix_scptst"> </a><a contenteditable="false" data-primary="testing" data-secondary="designing a test suite" data-tertiary="test scope" data-type="indexterm" id="ix_tstdessusc"> </a> Test scope refers to how much code is being validated by a given test. Narrow-scoped tests (commonly called <em>unit tests</em>) are designed to validate the logic in a small, focused part of the codebase, like an individual class or method. <a contenteditable="false" data-primary="unit testing" data-secondary="narrow-scoped tests (or unit tests)" data-type="indexterm" id="id-rbUjtpC4S2fN"> </a>Medium-scoped tests (commonly called <em>integration tests</em>) are designed to verify interactions between a small number of components; for example, between a server and its database. <a contenteditable="false" data-primary="integration tests" data-type="indexterm" id="id-nbUPSmCYSqfK"> </a>Large-scoped tests (commonly referred to by names like <em>functional tests</em>, <em>end-to-end tests</em>, or <em>system tests</em>) are designed to validate the interaction of several distinct parts of the system, or emergent behaviors that aren’t expressed in a single class or method.<a contenteditable="false" data-primary="system tests" data-type="indexterm" id="id-kbU8fPC1SafZ"> </a><a contenteditable="false" data-primary="end-to-end tests" data-type="indexterm" id="id-2KUeUZCgSnfx"> </a><a contenteditable="false" data-primary="functional tests" data-type="indexterm" id="id-xbU3unC8SvfV"> </a></p>
|
||||
|
||||
<p>It’s important to note that when we talk about unit tests as being narrowly scoped, we’re referring to the code that is being <em>validated</em>, not the code that is being <em>executed</em>. It’s <a contenteditable="false" data-primary="dependencies" data-secondary="test scope and" data-type="indexterm" id="id-rbUJHdH4S2fN"> </a>quite common for a class to have many dependencies or other classes it refers to, and these dependencies will naturally be invoked while testing the target class. Though some <a href="https://oreil.ly/Lj-t3">other testing strategies</a> make heavy use <a contenteditable="false" data-primary="test doubles" data-type="indexterm" id="id-nbUxhvHYSqfK"> </a>of test doubles (fakes or mocks) to avoid executing code outside of the system under test, at Google, we prefer to keep the real dependencies in place when it is feasible to do so. <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a> discusses this issue in more detail.</p>
|
||||
|
||||
<p>Narrow-scoped tests tend to be small, and broad-scoped tests tend to be medium or large, but this isn’t always the case. For example, it’s possible to write a broad-scoped test of a server endpoint that covers all of its normal parsing, request validation, and business logic, which is nevertheless small because it uses doubles to stand in for all out-of-process dependencies like a database or filesystem. Similarly, it’s possible to write a narrow-scoped test of a single method that must be medium sized. For <span class="keep-together">example,</span> modern web frameworks often bundle HTML and JavaScript together, and testing a UI component like a date picker often requires running an entire browser, even to validate a single code path.</p>
|
||||
|
||||
<p>Just as we encourage tests of smaller size, at Google, we also encourage engineers to write tests of narrower scope.<a contenteditable="false" data-primary="test sizes" data-secondary="test scope and" data-type="indexterm" id="id-rbUlsXh4S2fN"> </a> As a very rough guideline, we tend to aim to have a mix of around 80% of our tests being narrow-scoped unit tests that validate the majority of our business logic; 15% medium-scoped integration tests that validate the interactions between two or more components; and 5% end-to-end tests that validate the entire system. <a data-type="xref" href="ch11.html#googleapostrophes_version_of_mike_cohna">Figure 11-3</a> depicts how we can visualize this as a pyramid.</p>
|
||||
|
||||
<figure id="googleapostrophes_version_of_mike_cohna"><img alt="Google’s version of Mike Cohn’s test pyramid; percentages are by test case count, and every team’s mix will be a little different" src="images/seag_1104.png">
|
||||
<figcaption><span class="label">Figure 11-3. </span>Google’s version of Mike Cohn’s test pyramid;<sup><a data-type="noteref" id="ch01fn121-marker" href="ch11.html#ch01fn121">6</a></sup> percentages are by test case count, and every team’s mix will be a little different</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Unit tests form an excellent base because they are fast, stable, and dramatically narrow the scope and reduce the cognitive load required to identify all the possible behaviors a class or function has. <a contenteditable="false" data-primary="hourglass antipattern in testing" data-type="indexterm" id="id-nbU0sxTYSqfK"> </a><a contenteditable="false" data-primary="ice cream cone antipattern in testing" data-type="indexterm" id="id-NxURCpTnSwfv"> </a>Additionally, they make failure diagnosis quick and painless.<a contenteditable="false" data-primary="antipatterns in test suites" data-type="indexterm" id="id-63UqHRT3S4f8"> </a> Two antipatterns to be aware of are the "ice cream cone" and the "hourglass," as illustrated in <a data-type="xref" href="ch11.html#test_suite_antipatterns">Figure 11-4</a>.</p>
|
||||
|
||||
<p>With the ice cream cone, engineers write many end-to-end tests but few integration or unit tests. Such suites tend to be slow, unreliable, and difficult to work with. This pattern often appears in projects that start as prototypes and are quickly rushed to production, never stopping to address testing debt.</p>
|
||||
|
||||
<p>The hourglass involves many end-to-end tests and many unit tests but few integration tests. It isn’t quite as bad as the ice cream cone, but it still results in many end-to-end test failures that could have been caught quicker and more easily with a suite of medium-scope tests. The hourglass pattern occurs when tight coupling makes it difficult to instantiate individual dependencies in isolation.</p>
|
||||
|
||||
<figure id="test_suite_antipatterns"><img alt="Test suite antipatterns" src="images/seag_1105.png">
|
||||
<figcaption><span class="label">Figure 11-4. </span>Test suite antipatterns</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Our recommended mix of tests is determined by our two primary goals: engineering productivity and product confidence. Favoring unit tests gives us high confidence quickly, and early in the development process. Larger tests act as sanity checks as the product develops; they should not be viewed as a primary method for catching bugs.</p>
|
||||
|
||||
<p>When considering your own mix, you might want a different balance. If you emphasize integration testing, you might discover that your test suites take longer to run but catch more issues between components. When you emphasize unit tests, your test suites can complete very quickly, and you will catch many common logic bugs. But, unit tests cannot verify the interactions between components, like <a href="https://oreil.ly/mALqH">a contract between two systems developed by different teams</a>. A good test suite contains a blend of different test sizes and scopes that are appropriate to the local architectural and organizational realities.<a contenteditable="false" data-primary="scope of tests" data-startref="ix_scptst" data-type="indexterm" id="id-xbUeCQu8SvfV"> </a><a contenteditable="false" data-primary="tests" data-secondary="designing a test suite" data-startref="ix_tstdessusc" data-tertiary="test scope" data-type="indexterm" id="id-wbU3HXumSjfJ"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_beyonceacutesemicolon_rule">
|
||||
<h2>The Beyoncé Rule</h2>
|
||||
|
||||
<p>We are often asked, when coaching new hires, which behaviors or properties actually need to be tested? <a contenteditable="false" data-primary="testing" data-secondary="designing a test suite" data-tertiary="Beyoncé Rule" data-type="indexterm" id="id-OwUls2C9TJfV"> </a><a contenteditable="false" data-primary="Beyoncé Rule" data-type="indexterm" id="id-XKUaCrCmTXfx"> </a>The straightforward answer is: test everything that you don’t want to break. In other words, if you want to be confident that a system exhibits a particular behavior, the only way to be sure it will is to write an automated test for it. This includes all of the usual suspects like testing performance, behavioral correctness, accessibility, and security. It also includes less obvious properties like testing how a system handles failure.</p>
|
||||
|
||||
<p>We have a name for this general philosophy: we call it the <a href="https://oreil.ly/X7_-z">Beyoncé Rule</a>. Succinctly, it can be stated as follows: “If you liked it, then you shoulda put a test on it.” The Beyoncé Rule is often invoked by infrastructure teams that are responsible for making changes across the entire codebase. If unrelated infrastructure changes pass all of your tests but still break your team’s product, you are on the hook for fixing it and adding the additional tests.</p>
|
||||
|
||||
<aside data-type="sidebar" id="testing_for_failure">
|
||||
<h5>Testing for Failure</h5>
|
||||
|
||||
<p>One of the most important situations a system must account for is failure.<a contenteditable="false" data-primary="failures" data-secondary="testing for system failure" data-type="indexterm" id="id-dVU0sPCLtkTNfD"> </a> Failure is inevitable, but waiting for an actual catastrophe to find out how well a system responds to a catastrophe is a recipe for pain. Instead of waiting for a failure, write automated tests that simulate common kinds of failures. This includes simulating exceptions or errors in unit tests and injecting Remote Procedure Call (RPC) errors or latency in integration and end-to-end tests. It can also include much larger disruptions that affect the real production network using techniques like <a href="https://oreil.ly/iOO4F">Chaos Engineering</a>. A predictable and controlled response to adverse conditions is a hallmark of a reliable system.<a contenteditable="false" data-primary="chaos engineering" data-type="indexterm" id="id-NxUKHzCKtrTGfr"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="a_note_on_code_coverage">
|
||||
<h2>A Note on Code Coverage</h2>
|
||||
|
||||
<p>Code coverage is a measure of which lines of feature code are exercised by which tests.<a contenteditable="false" data-primary="testing" data-secondary="designing a test suite" data-tertiary="code coverage" data-type="indexterm" id="id-XKUVsrCyIXfx"> </a><a contenteditable="false" data-primary="code coverage" data-type="indexterm" id="id-rbUOCpCaI2fN"> </a> If you have 100 lines of code and your tests execute 90 of them, you have 90% code coverage.<sup><a data-type="noteref" id="ch01fn122-marker" href="ch11.html#ch01fn122">7</a></sup> Code coverage is often held up as the gold standard metric for understanding test quality, and that is somewhat unfortunate. It is possible to exercise a lot of lines of code with a few tests, never checking that each line is doing anything useful. That’s because code coverage only measures that a line was invoked, not what happened as a result. (We recommend only measuring coverage from small tests to avoid coverage inflation that occurs when executing larger tests.)</p>
|
||||
|
||||
<p>An even more insidious problem with code coverage is that, like other metrics, it quickly becomes a goal unto itself. It is common for teams to establish a bar for expected code coverage—for instance, 80%. At first, that sounds eminently reasonable; surely you want to have at least that much coverage. In practice, what happens is that instead of treating 80% like a floor, engineers treat it like a ceiling. Soon, changes begin landing with no more than 80% coverage. After all, why do more work than the metric requires?</p>
|
||||
|
||||
<p>A better way to approach the quality of your test suite is to think about the behaviors that are tested. Do you have confidence that everything your customers expect to work will work? Do you feel confident you can catch breaking changes in your dependencies? Are your tests stable and reliable? Questions like these are a more holistic way to think about a test suite. Every product and team is going to be different; some will have difficult-to-test interactions with hardware, some involve massive datasets. Trying to answer the question “do we have enough tests?” with a single number ignores a lot of context and is unlikely to be useful. Code coverage can provide some insight into untested code, but it is not a substitute for thinking critically about how well your system is tested.<a contenteditable="false" data-primary="testing" data-secondary="designing a test suite" data-startref="ix_tstdessu" data-type="indexterm" id="id-dVU0sXtVIYf7"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="testing_at_google_scale">
|
||||
<h1>Testing at Google Scale</h1>
|
||||
|
||||
<p>Much of the guidance<a contenteditable="false" data-primary="testing" data-secondary="at Google scale" data-type="indexterm" id="ix_tstGoo"> </a> to this point can be applied to codebases of almost any size. However, we should spend some time on what we have learned testing at our very large scale. To understand how testing works at Google, you need an understanding of our development environment, the most important fact about which is that most of Google’s code is kept in a single, monolithic repository (<a href="https://oreil.ly/qSihi">monorepo</a>). Almost every line of code for every product and service we operate is all stored in one place. We have more than two billion lines of code in the repository today.</p>
|
||||
|
||||
<p>Google’s codebase experiences close to 25 million lines of change every week. Roughly half of them are made by the tens of thousands of engineers working in our monorepo, and the other half by our automated systems, in the form of configuration updates or large-scale changes (<a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>). Many of those changes are initiated from outside the immediate project. We don’t place many limitations on the ability of engineers to reuse code.</p>
|
||||
|
||||
<p>The openness of our codebase encourages a level of co-ownership that lets everyone take responsibility for the codebase. One benefit of such openness is the ability to directly fix bugs in a product or service you use (subject to approval, of course) instead of complaining about it. This also implies that many people will make changes in a part of the codebase owned by someone else.</p>
|
||||
|
||||
<p>Another thing that makes Google a little different is that almost no teams use repository branching.<a contenteditable="false" data-primary="repository branching, not used at Google" data-type="indexterm" id="id-ZlU3sPhNUz"> </a> All changes are committed to the repository head and are immediately visible for everyone to see. Furthermore, all software builds are performed using the last committed change that our testing infrastructure has validated.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="system at Google" data-type="indexterm" id="id-DOU0CohaUr"> </a> When a product or service is built, almost every dependency required to run it is also built from source, also from the head of the repository. <a contenteditable="false" data-primary="TAP" data-see="Test Automation Platform" data-type="indexterm" id="id-OwUOHgh8Up"> </a>Google manages testing at this scale by use of a CI system.<a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-type="indexterm" id="id-XKU1tNhRUy"> </a> One of the key components of our CI system is our Test Automated Platform (TAP).</p>
|
||||
|
||||
<div data-type="note" id="id-ZGfVS0U7"><h6>Note</h6>
|
||||
<p>For more information on TAP and our CI philosophy, see <span class="keep-together"><a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a></span>.</p>
|
||||
</div>
|
||||
|
||||
<p>Whether you are considering our size, our monorepo, or the number of products we offer, Google’s engineering environment is complex. Every week it experiences millions of changing lines, billions of test cases being run, tens of thousands of binaries being built, and hundreds of products being updated—talk about complicated!</p>
|
||||
|
||||
<section data-type="sect2" id="the_pitfalls_of_a_large_test_suite">
|
||||
<h2>The Pitfalls of a Large Test Suite</h2>
|
||||
|
||||
<p>As a codebase grows, you will inevitably need <a contenteditable="false" data-primary="test suite" data-secondary="large, pitfalls of" data-type="indexterm" id="id-rbUlspCaIVUN"> </a>to make changes to existing code. When poorly written, automated tests can make it more difficult to make those changes.<a contenteditable="false" data-primary="brittle tests" data-type="indexterm" id="id-dVUnCPCVIAU7"> </a> Brittle tests—those that over-specify expected outcomes or rely on extensive and complicated boilerplate—can actually resist change. These poorly written tests can fail even when unrelated changes are made.</p>
|
||||
|
||||
<p>If you have ever made a five-line change to a feature only to find dozens of unrelated, broken tests, you have felt the friction of brittle tests. Over time, this friction can make a team reticent to perform necessary refactoring to keep a codebase healthy. The subsequent chapters will cover strategies that you can use to improve the robustness and quality of your tests.</p>
|
||||
|
||||
<p>Some of the worst offenders of brittle tests come from the misuse <a contenteditable="false" data-primary="mocking" data-secondary="misuse of mock objects, causing brittle tests" data-type="indexterm" id="id-nbU0sptOIAUK"> </a>of mock objects. Google’s codebase has suffered so badly from an abuse of mocking frameworks that it has led some engineers to declare “no more mocks!” Although that is a strong statement, understanding the limitations of mock objects can help you avoid misusing them.</p>
|
||||
|
||||
<div data-type="note" id="id-nofxhXIGUN"><h6>Note</h6>
|
||||
<p>For more information on working effectively with mock objects, see <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>.</p>
|
||||
</div>
|
||||
|
||||
<p>In addition to the friction caused by brittle tests, a larger suite of tests will be slower to run. The slower a test suite, the less frequently it will be run, and the less benefit it provides. We use a number of techniques to speed up our test suite, including parallelizing execution and using faster hardware. However, these kinds of tricks are eventually swamped by a large number of individually slow test cases.</p>
|
||||
|
||||
<p>Tests can become slow for many reasons, like booting significant portions of a system, firing up an emulator before execution, processing large datasets, or waiting for disparate systems to synchronize. Tests often start fast enough but slow down as the system grows. For example, maybe you have an integration test exercising a single dependency that takes five seconds to respond, but over the years you grow to depend on a dozen services, and now the same tests take five minutes.</p>
|
||||
|
||||
<p>Tests can also become slow due to unnecessary speed limits introduced by functions like <code>sleep()</code> and <code>setTimeout()</code>. Calls to these functions are often used as naive heuristics before checking the result of nondeterministic behavior. Sleeping for half a second here or there doesn’t seem too dangerous at first; however, if a “wait-and-check” is embedded in a widely used utility, pretty soon you have added minutes of idle time to every run of your test suite. A better solution is to actively poll for a state transition with a frequency closer to microseconds. You can combine this with a timeout value in case a test fails to reach a stable state.</p>
|
||||
|
||||
<p>Failing to keep a test suite deterministic and fast ensures it will become roadblock to productivity. At Google, engineers who encounter these tests have found ways to work around slowdowns, with some going as far as to skip the tests entirely when submitting changes. Obviously, this is a risky practice and should be discouraged, but if a test suite is causing more harm than good, eventually engineers will find a way to get their job done, tests or no tests.</p>
|
||||
|
||||
<p>The secret to living with a large test suite is to treat it with respect. Incentivize engineers to care about their tests; reward them as much for having rock-solid tests as you would for having a great feature launch. Set appropriate performance goals and refactor slow or marginal tests. Basically, treat your tests like production code. When simple changes begin taking nontrivial time, spend effort making your tests less brittle.</p>
|
||||
|
||||
<p>In addition to developing the proper culture, invest in your testing infrastructure by developing linters, documentation, or other assistance that makes it more difficult to write bad tests. Reduce the number of frameworks and tools you need to support to increase the efficiency of the time you invest to improve things.<sup><a data-type="noteref" id="ch01fn123-marker" href="ch11.html#ch01fn123">8</a></sup> If you don’t invest in making it easy to manage your tests, eventually engineers will decide it isn’t worth having them at all.<a contenteditable="false" data-primary="testing" data-secondary="at Google scale" data-startref="ix_tstGoo" data-type="indexterm" id="id-GoUOCrU9IGUe"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="history_of_testing_at_google">
|
||||
<h1>History of Testing at Google</h1>
|
||||
|
||||
<p>Now that we’ve discussed how Google approaches testing, it might be enlightening to learn how we got here. <a contenteditable="false" data-primary="testing" data-secondary="history at Google" data-type="indexterm" id="ix_tsthst"> </a>As mentioned previously, Google’s engineers didn’t always embrace the value of automated testing. In fact, until 2005, testing was closer to a curiosity than a disciplined practice. Most of the testing was done manually, if it was done at all. However, from 2005 to 2006, a testing revolution occurred and changed the way we approach software engineering. Its effects continue to reverberate within the company to this day.</p>
|
||||
|
||||
<p>The experience of the GWS project, which we discussed at the opening of this chapter, acted as a catalyst. It made it clear how powerful automated testing could be. Following the improvements to GWS in 2005, the practices began spreading across the entire company. The tooling was primitive. However, the volunteers, who came to be known as the Testing Grouplet, didn’t let that slow them down.</p>
|
||||
|
||||
<p>Three key initiatives helped usher automated testing into the company’s consciousness: Orientation Classes, the Test Certified program, and Testing on the Toilet. Each one had influence in a completely different way, and together they reshaped Google’s engineering culture.</p>
|
||||
|
||||
<section data-type="sect2" id="orientation_classes">
|
||||
<h2>Orientation Classes</h2>
|
||||
|
||||
<p>Even though much of the early<a contenteditable="false" data-primary="testing" data-secondary="history at Google" data-tertiary="orientation classes" data-type="indexterm" id="id-OwUls2CLheuV"> </a> engineering staff at Google eschewed testing, the pioneers of automated testing at Google knew that at the rate the company was growing, new engineers would quickly outnumber existing team members. If they could reach all the new hires in the company, it could be an extremely effective avenue for introducing cultural change. Fortunately, there was, and still is, a single choke point that all new engineering hires pass through: orientation.</p>
|
||||
|
||||
<p>Most of Google’s early orientation program concerned things like medical benefits and how Google Search worked, but starting in 2005 it also began including an hour-long discussion of the value of automated testing.<sup><a data-type="noteref" id="ch01fn124-marker" href="ch11.html#ch01fn124">9</a></sup> The class covered the various benefits of testing, such as increased productivity, better documentation, and support for refactoring. It also covered how to write a good test. For many Nooglers (new Googlers) at the time, such a class was their first exposure to this material. Most important, all of these ideas were presented as though they were standard practice at the company. The new hires had no idea that they were being used as trojan horses to sneak this idea into their unsuspecting teams.</p>
|
||||
|
||||
<p>As Nooglers joined their teams following orientation, they began writing tests and questioning those on the team who didn’t. Within only a year or two, the population of engineers who had been taught testing outnumbered the pretesting culture engineers. As a result, many new projects started off on the right foot.</p>
|
||||
|
||||
<p>Testing has now become more widely practiced in the industry, so most new hires arrive with the expectations of automated testing firmly in place. Nonetheless, orientation classes continue to set expectations about testing and connect what Nooglers know about testing outside of Google to the challenges of doing so in our very large and very complex codebase.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="test_certified">
|
||||
<h2>Test Certified</h2>
|
||||
|
||||
<p>Initially, the larger and more complex parts of our codebase appeared resistant to good testing practices.<a contenteditable="false" data-primary="testing" data-secondary="history at Google" data-tertiary="Test Certified program" data-type="indexterm" id="id-XKUVsrC2Snux"> </a> Some projects had such poor code quality that they were almost impossible to test. To give projects a clear path forward, the Testing Grouplet devised a certification program that they called Test Certified. Test Certified aimed to give teams a way to understand the maturity of their testing processes and, more critically, cookbook instructions on how to improve it.</p>
|
||||
|
||||
<p>The program was organized into five levels, and each level required some concrete actions to improve the test hygiene on the team. The levels were designed in such a way that each step up could be accomplished within a quarter, which made it a convenient fit for Google’s internal planning cadence.</p>
|
||||
|
||||
<p>Test Certified Level 1 covered the basics: set up a continuous build; start tracking code coverage; classify all your tests as small, medium, or large; identify (but don’t necessarily fix) flaky tests; and create a set of fast (not necessarily comprehensive) tests that can be run quickly. Each subsequent level added more challenges like “no releases with broken tests” or “remove all nondeterministic tests.” By Level 5, all tests were automated, fast tests were running before every commit, all nondeterminism had been removed, and every behavior was covered. An internal dashboard applied social pressure by showing the level of every team. It wasn’t long before teams were competing with one another to climb the ladder.</p>
|
||||
|
||||
<p>By the time the Test Certified program was replaced by an automated approach in 2015 (more on pH later), it had helped more than 1,500 projects improve their testing culture.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="testing_on_the_toilet">
|
||||
<h2>Testing on the Toilet</h2>
|
||||
|
||||
<p>Of all the methods the Testing Grouplet used to try<a contenteditable="false" data-primary="testing" data-secondary="history at Google" data-tertiary="Testing on the Toilet (TotT)" data-type="indexterm" id="id-rbUlspCkTnuN"> </a> to improve testing at Google, perhaps none was<a contenteditable="false" data-primary="Testing on the Toilet (TotT)" data-type="indexterm" id="id-dVUnCPClTmu7"> </a> more off-beat than Testing on the Toilet (TotT). The goal of TotT was fairly simple: actively raise awareness about testing across the entire company. The question is, what’s the best way to do that in a company with employees scattered around the world?</p>
|
||||
|
||||
<p>The Testing Grouplet considered the idea of a regular email newsletter, but given the heavy volume of email everyone deals with at Google, it was likely to become lost in the noise. After a little bit of brainstorming, someone proposed the idea of posting flyers in the restroom stalls as a joke. We quickly recognized the genius in it: the bathroom is one place that everyone must visit at least once each day, no matter what. Joke or not, the idea was cheap enough to implement that it had to be tried.</p>
|
||||
|
||||
<p>In April 2006, a short writeup covering how to improve testing in Python appeared in restroom stalls across Google. This first episode was posted by a small band of volunteers. To say the reaction was polarized is an understatement; some saw it as an invasion of personal space, and they objected strongly. Mailing lists lit up with complaints, but the TotT creators were content: the people complaining were still talking about testing.</p>
|
||||
|
||||
<p>Ultimately, the uproar subsided and TotT quickly became a staple of Google culture. To date, engineers from across the company have produced several hundred episodes, covering almost every aspect of testing imaginable (in addition to a variety of other technical topics). New episodes are eagerly anticipated and some engineers even volunteer to post the episodes around their own buildings. We intentionally limit each episode to exactly one page, challenging authors to focus on the most important and actionable advice. A good episode contains something an engineer can take back to the desk immediately and try.</p>
|
||||
|
||||
<p>Ironically for a publication that appears in one of the more private locations, TotT has had an outsized public impact. Most external visitors see an episode at some point in their visit, and such encounters often lead to funny conversations about how Googlers always seem to be thinking about code. Additionally, TotT episodes make great blog posts, something the original TotT authors recognized early on. They began publishing <a href="https://oreil.ly/86Nho">lightly edited versions publicly</a>, helping to share our experience with the industry at large.</p>
|
||||
|
||||
<p>Despite starting as a joke, TotT has had the longest run and the most profound impact of any of the testing initiatives started by the Testing Grouplet.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="testing_culture_today">
|
||||
<h2>Testing Culture Today</h2>
|
||||
|
||||
<p>Testing culture at Google today has come a long way from 2005.<a contenteditable="false" data-primary="testing" data-secondary="history at Google" data-tertiary="contemporary testing culture" data-type="indexterm" id="id-dVU0sPCVImu7"> </a><a contenteditable="false" data-primary="culture" data-secondary="testing culture today at Google" data-type="indexterm" id="id-nbUNCmCOIKuK"> </a> Nooglers still attend orientation classes on testing, and TotT continues to be distributed almost weekly. However, the expectations of testing have more deeply embedded themselves in the daily developer workflow.</p>
|
||||
|
||||
<p>Every code change at Google is required to go through code review. And every change is expected to include both the feature code and tests. Reviewers are expected to review the quality and correctness of both. In fact, it is perfectly reasonable to block a change if it is missing tests.</p>
|
||||
|
||||
<p>As a replacement for Test Certified, one of our engineering productivity teams recently launched a tool called Project Health (pH). <a contenteditable="false" data-primary="Project Health (pH) tool" data-type="indexterm" id="id-NxU3sktOILuv"> </a>The pH tool continuously gathers dozens of metrics on the health of a project, including test coverage and test latency, and makes them available internally. pH is measured on a scale of one (worst) to five (best). A pH-1 project is seen as a problem for the team to address. Almost every team that runs a continuous build automatically gets a pH score.</p>
|
||||
|
||||
<p>Over time, testing has become an integral part of Google’s engineering culture. We have myriad ways to reinforce its value to engineers across the company. Through a combination of training, gentle nudges, mentorship, and, yes, even a little friendly competition, we have created the clear expectation that testing is everyone’s job.</p>
|
||||
|
||||
<p>Why didn’t we start by mandating the writing of tests?</p>
|
||||
|
||||
<p>The Testing Grouplet had considered asking for a testing mandate from senior leadership but quickly decided against it. Any mandate on how to develop code would be seriously counter to Google culture and likely slow the progress, independent of the idea being mandated. The belief was that successful ideas would spread, so the focus became demonstrating success.</p>
|
||||
|
||||
<p>If engineers were deciding to write tests on their own, it meant that they had fully accepted the idea and were likely to keep doing the right thing—even if no one was compelling them to.<a contenteditable="false" data-primary="testing" data-secondary="history at Google" data-startref="ix_tsthst" data-type="indexterm" id="id-2KUNsKI7Ijux"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="the_limits_of_automated_testing">
|
||||
<h1>The Limits of Automated Testing</h1>
|
||||
|
||||
<p>Automated testing<a contenteditable="false" data-primary="automated testing" data-secondary="limits of" data-type="indexterm" id="id-1JU3skCbFR"> </a> is not suitable for all testing tasks. <a contenteditable="false" data-primary="testing" data-secondary="automated, limits of" data-type="indexterm" id="id-ZlUbCjCYFz"> </a>For example, testing the quality of search results often involves human judgment. We conduct targeted, internal studies using Search Quality Raters who execute real queries and record their impressions. Similarly, it is difficult to capture the nuances of audio and video quality in an automated test, so we often use human judgment to evaluate the performance of telephony or video-calling systems.</p>
|
||||
|
||||
<p>In addition to qualitative judgements, there are certain creative assessments at which humans excel. For example, searching for complex security vulnerabilities is something that humans do better than automated systems. After a human has discovered and understood a flaw, it can be added to an automated security testing system like Google’s <a href="https://oreil.ly/6_W_q">Cloud Security Scanner</a> where it can be run continuously and at scale.</p>
|
||||
|
||||
<p>A more<a contenteditable="false" data-primary="exploratory testing" data-type="indexterm" id="id-DOUEsRtvFr"> </a> generalized term for this technique is Exploratory Testing. Exploratory Testing is a fundamentally creative endeavor in which someone treats the application under test as a puzzle to be broken, maybe by executing an unexpected set of steps or by inserting unexpected data. When conducting an exploratory test, the specific problems to be found are unknown at the start. They are gradually uncovered by probing commonly overlooked code paths or unusual responses from the application. As with the detection of security vulnerabilities, as soon as an exploratory test discovers an issue, an automated test should be added to prevent future regressions.</p>
|
||||
|
||||
<p class="pagebreak-before">Using automated testing to cover well-understood behaviors enables the expensive and qualitative efforts of human testers to focus on the parts of your products for which they can provide the most value—and avoid boring them to tears in the <span class="keep-together">process.</span></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00015">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>The adoption of developer-driven automated testing has been one of the most transformational software engineering practices at Google. It has enabled us to build larger systems with larger teams, faster than we ever thought possible. It has helped us keep up with the increasing pace of technological change. Over the past 15 years, we have successfully transformed our engineering culture to elevate testing into a cultural norm. Despite the company growing by a factor of almost 100 times since the journey began, our commitment to quality and testing is stronger today than it has ever been.</p>
|
||||
|
||||
<p>This chapter has been written to help orient you to how Google thinks about testing. In the next few chapters, we are going to dive even deeper into some key topics that have helped shape our understanding of what it means to write good, stable, and reliable tests. We will discuss the what, why, and how of unit tests, the most common kind of test at Google. We will wade into the debate on how to effectively use test doubles in tests through techniques such as faking, stubbing, and interaction testing. Finally, we will discuss the challenges with testing larger and more complex systems, like many of those we have at Google.</p>
|
||||
|
||||
<p>At the conclusion of these three chapters, you should have a much deeper and clearer picture of the testing strategies we use and, more important, why we use them.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00113">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Automated testing is foundational to enabling software to change.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>For tests to scale, they must be automated.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A balanced test suite is necessary for maintaining healthy test coverage.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>“If you liked it, you should have put a test on it.”</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Changing the testing culture in organizations<a contenteditable="false" data-primary="testing" data-startref="ix_tst1" data-type="indexterm" id="id-dVU0sNs9hjCEsD"> </a> takes time.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn116"><sup><a href="ch11.html#ch01fn116-marker">1</a></sup>See <a href="https://oreil.ly/27R87">"Defect Prevention: Reducing Costs and Enhancing Quality."</a></p><p data-type="footnote" id="ch01fn117"><sup><a href="ch11.html#ch01fn117-marker">2</a></sup>See <a href="https://oreil.ly/lhO7Z">"Failure at Dhahran."</a></p><p data-type="footnote" id="ch01fn118"><sup><a href="ch11.html#ch01fn118-marker">3</a></sup>Getting the behavior right across different browsers and languages is a different story! But, ideally, the end-user experience should be the same for everyone.</p><p data-type="footnote" id="ch01fn119"><sup><a href="ch11.html#ch01fn119-marker">4</a></sup>Technically, we have four sizes of test at Google: small, medium, large, and <em>enormous</em>. The internal difference between large and enormous is actually subtle and historical; so, in this book, most descriptions of large actually apply to our notion of enormous.</p><p data-type="footnote" id="ch01fn120"><sup><a href="ch11.html#ch01fn120-marker">5</a></sup>There is a little wiggle room in this policy. Tests are allowed to access a filesystem if they use a hermetic, in-memory implementation.</p><p data-type="footnote" id="ch01fn121"><sup><a href="ch11.html#ch01fn121-marker">6</a></sup>Mike Cohn, <em>Succeeding with Agile: Software Development Using Scrum</em> (New York: Addison-Wesley Professional, 2009).</p><p data-type="footnote" id="ch01fn122"><sup><a href="ch11.html#ch01fn122-marker">7</a></sup>Keep in mind that there are different kinds of coverage (line, path, branch, etc.), and each says something different about which code has been tested. In this simple example, line coverage is being used.</p><p data-type="footnote" id="ch01fn123"><sup><a href="ch11.html#ch01fn123-marker">8</a></sup>Each supported language at Google has one standard test framework and one standard mocking/stubbing library. One set of infrastructure runs most tests in all languages across the entire codebase.</p><p data-type="footnote" id="ch01fn124"><sup><a href="ch11.html#ch01fn124-marker">9</a></sup>This class was so successful that an updated version is still taught today. In fact, it is one of the longest-running orientation classes in the company’s history.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
863
clones/abseil.io/resources/swe-book/html/ch12.html
Normal file
|
@ -0,0 +1,863 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="unit_testing">
|
||||
<h1>Unit Testing</h1>
|
||||
|
||||
<p class="byline">Written by Erik Kuefler</p>
|
||||
|
||||
<p class="byline">Edited by Tom Manshreck</p>
|
||||
|
||||
<p>The previous chapter introduced two of the main axes along which Google classifies tests: <em>size</em> and <em>scope</em>. <a contenteditable="false" data-primary="unit testing" data-type="indexterm" id="ix_untst"> </a>To recap, size refers to the resources consumed by a test and what it is allowed to do, and scope refers to how much code a test is intended to validate. Though Google has clear definitions for test size, scope tends to be a little fuzzier. We use the term <em>unit test</em> to refer to tests of relatively narrow scope, such as of a single class or method. Unit tests are usually small in size, but this isn’t always the case.</p>
|
||||
|
||||
<p>After preventing bugs, the most important purpose of a test is to improve engineers’ productivity. <a contenteditable="false" data-primary="engineering productivity" data-secondary="improving with testing" data-type="indexterm" id="id-RQtZC0Uo"> </a>Compared to broader-scoped tests, unit tests have many properties that make them an excellent way to optimize productivity:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>They tend to be small according to Google’s definitions of test size.<a contenteditable="false" data-primary="test sizes" data-secondary="unit tests" data-type="indexterm" id="id-oRtVCwCjCoSL"> </a> Small tests are fast and deterministic, allowing developers to run them frequently as part of their workflow and get immediate feedback.<a contenteditable="false" data-primary="small tests" data-type="indexterm" id="id-xRtjHLCYCLS1"> </a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They tend to be easy to write at the same time as the code they’re testing, allowing engineers to focus their tests on the code they’re working on without having to set up and understand a larger system.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They promote high levels of test coverage because they are quick and easy to write. High test coverage allows engineers to make changes with confidence that they aren’t breaking anything.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They tend to make it easy to understand what’s wrong when they fail because each test is conceptually simple and focused on a particular part of the system.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They can serve as documentation and examples, showing engineers how to use the part of the system being tested and how that system is intended to work.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Due to their many advantages, most tests written at Google are unit tests, and as a rule of thumb, we encourage engineers to aim for a mix of about 80% unit tests and 20% broader-scoped tests. This advice, coupled with the ease of writing unit tests and the speed with which they run, means that engineers run a <em>lot</em> of unit tests—it’s not at all unusual for an engineer to execute thousands of unit tests (directly or indirectly) during the average workday.</p>
|
||||
|
||||
<p>Because they make up such a big part of engineers’ lives, Google puts a lot of focus on <em>test maintainability</em>. Maintainable tests<a contenteditable="false" data-primary="maintainability of tests" data-type="indexterm" id="id-E1tdHqsQ"> </a> are ones that "just work": after writing them, engineers don’t need to think about them again until they fail, and those failures indicate real bugs with clear causes. The bulk of this chapter focuses on exploring the idea of maintainability and techniques for achieving it.</p>
|
||||
|
||||
<section data-type="sect1" id="the_importance_of_maintainability">
|
||||
<h1>The Importance of Maintainability</h1>
|
||||
|
||||
<p>Imagine this scenario: Mary wants to add a simple<a contenteditable="false" data-primary="unit testing" data-secondary="maintainability of tests, importance of" data-type="indexterm" id="id-JvtpCYHqf1"> </a> new feature to the product and is able to implement it quickly, perhaps requiring only a couple dozen lines of code. But when she goes to check in her change, she gets a screen full of errors back from the automated testing system. She spends the rest of the day going through those failures one by one. In each case, the change introduced no actual bug, but broke some of the assumptions that the test made about the internal structure of the code, requiring those tests to be updated. Often, she has difficulty figuring out what the tests were trying to do in the first place, and the hacks she adds to fix them make those tests even more difficult to understand in the future. Ultimately, what should have been a quick job ends up taking hours or even days of busywork, killing Mary’s productivity and sapping her morale.</p>
|
||||
|
||||
<p>Here, testing had the opposite of its intended effect by draining productivity rather than improving it while not meaningfully increasing the quality of the code under test. This scenario is far too common, and Google engineers struggle with it every day. There’s no magic bullet, but many engineers at Google have been working to develop sets of patterns and practices to alleviate these problems, which we encourage the rest of the company to follow.</p>
|
||||
|
||||
<p>The problems Mary ran into weren’t her fault, and there was nothing she could have done to avoid them: bad tests must be fixed before they are checked in, lest they impose a drag on future engineers. Broadly speaking, the issues she encountered fall into two categories. First, the tests she was working with were <em>brittle</em>: they broke in response to a harmless and unrelated change that introduced no real bugs. Second, the tests were <em>unclear</em>: after they were failing, it was difficult to determine what was wrong, how to fix it, and what those tests were supposed to be doing in the first place.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="preventing_brittle_tests">
|
||||
<h1>Preventing Brittle Tests</h1>
|
||||
|
||||
<p>As just defined, a <a contenteditable="false" data-primary="unit testing" data-secondary="preventing brittle tests" data-type="indexterm" id="ix_untstbr"> </a>brittle test<a contenteditable="false" data-primary="brittle tests" data-secondary="preventing" data-type="indexterm" id="ix_brittst"> </a> is one that fails in the face of an unrelated change to production code that does not introduce any real bugs.<sup><a data-type="noteref" id="ch01fn125-marker" href="ch12.html#ch01fn125">1</a></sup> Such tests must be diagnosed and fixed by engineers as part of their work. In small codebases with only a few engineers, having to tweak a few tests for every change might not be a big problem. But if a team regularly writes brittle tests, test maintenance will inevitably consume a larger and larger proportion of the team’s time as they are forced to comb through an increasing number of failures in an ever-growing test suite. If a set of tests needs to be manually tweaked by engineers for each change, calling it an "automated test suite" is a bit of a stretch!</p>
|
||||
|
||||
<p>Brittle tests cause pain in codebases of any size, but they become particularly acute at Google’s scale. An individual engineer might easily run thousands of tests in a single day during the course of their work, and a single large-scale change (see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>) can trigger hundreds of thousands of tests. At this scale, spurious breakages that affect even a small percentage of tests can waste huge amounts of engineering time. Teams at Google vary quite a bit in terms of how brittle their test suites are, but we’ve identified a few practices and patterns that tend to make tests more robust to change.</p>
|
||||
|
||||
<section data-type="sect2" id="strive_for_unchanging_tests">
|
||||
<h2>Strive for Unchanging Tests</h2>
|
||||
|
||||
<p>Before talking about patterns for avoiding brittle tests, we need to answer a question: just how <a contenteditable="false" data-primary="brittle tests" data-secondary="preventing" data-tertiary="striving for unchanging tests" data-type="indexterm" id="id-jRtDCVHZI8hQ"> </a>often should we expect to need to change a test after writing it? Any time spent updating old tests is time that can’t be spent on more valuable work. Therefore, <em>the ideal test is unchanging</em>: after it’s written, it never needs to change unless the requirements of the system under test change.<a contenteditable="false" data-primary="unchanging tests" data-type="indexterm" id="id-Obt7cpHLIEhd"> </a></p>
|
||||
|
||||
<p>What does this look like in practice? We need to think about the kinds of changes that engineers make to production code and how we should expect tests to respond to those changes. <a contenteditable="false" data-primary="changes to code" data-secondary="types of changes to production code" data-type="indexterm" id="id-2VtvCLcWIrhw"> </a>Fundamentally, there are four kinds of changes:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Pure refactorings</dt>
|
||||
<dd>When an engineer refactors the internals<a contenteditable="false" data-primary="refactorings" data-type="indexterm" id="id-yRtyCQHyI6IlhA"> </a> of a system without modifying its interface, whether for performance, clarity, or any other reason, the system’s tests shouldn’t need to change. The role of tests in this case is to ensure that the refactoring didn’t change the system’s behavior. Tests that need to be changed during a refactoring indicate that either the change is affecting the system’s behavior and isn’t a pure refactoring, or that the tests were not written at an appropriate level of abstraction. Google’s reliance on large-scale changes (described in <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>) to do such refactorings makes this case particularly important for us.</dd>
|
||||
<dt>New features</dt>
|
||||
<dd>When an engineer adds new<a contenteditable="false" data-primary="features, new" data-type="indexterm" id="id-1VtaCyILIoIDhW"> </a> features or behaviors to an existing system, the system’s existing behaviors should remain unaffected. The engineer must write new tests to cover the new behaviors, but they shouldn’t need to change any existing tests. As with refactorings, a change to existing tests when adding new features suggest unintended consequences of that feature or inappropriate tests.</dd>
|
||||
<dt>Bug fixes</dt>
|
||||
<dd>Fixing a bug is much like <a contenteditable="false" data-primary="bug fixes" data-type="indexterm" id="id-qRtDCqSEIZIbh8"> </a>adding a new feature: the presence of the bug suggests that a case was missing from the initial test suite, and the bug fix should include that missing test case. Again, bug fixes typically shouldn’t require updates to existing tests.</dd>
|
||||
<dt>Behavior changes</dt>
|
||||
<dd>Changing a system’s existing behavior is the one case when we expect to have to make updates to the system’s existing tests.<a contenteditable="false" data-primary="behaviors" data-secondary="updates to tests for changes in" data-type="indexterm" id="id-rRt6CzsxIdImhL"> </a> Note that such changes tend to be significantly more expensive than the other three types. A system’s users are likely to rely on its current behavior, and changes to that behavior require coordination with those users to avoid confusion or breakages. Changing a test in this case indicates that we’re breaking an explicit contract of the system, whereas changes in the previous cases indicate that we’re breaking an unintended contract. Low-level libraries will often invest significant effort in avoiding the need to ever make a behavior change so as not to break their users.</dd>
|
||||
</dl>
|
||||
|
||||
<p>The takeaway is that after you write a test, you shouldn’t need to touch that test again as you refactor the system, fix bugs, or add new features. This understanding is what makes it possible to work with a system at scale: expanding it requires writing only a small number of new tests related to the change you’re making rather than potentially having to touch every test that has ever been written against the system. Only breaking changes in a system’s behavior should require going back to change its tests, and in such situations, the cost of updating those tests tends to be small relative to the cost of updating all of the system’s users.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="test_via_public_apis">
|
||||
<h2>Test via Public APIs</h2>
|
||||
|
||||
<p>Now that we<a contenteditable="false" data-primary="brittle tests" data-secondary="preventing" data-tertiary="testing via public APIs" data-type="indexterm" id="ix_brittstAPI"> </a> understand <a contenteditable="false" data-primary="APIs" data-secondary="testing via public APIs" data-type="indexterm" id="ix_APItst"> </a>our goal, let’s look at some practices for making sure that tests don’t need to change unless the requirements of the system being tested change. By far the most important way to ensure this is to write tests that invoke the system being tested in the same way its users would; that is, make calls against its public API <a href="https://oreil.ly/ijat0">rather than its implementation details</a>. If tests work the same way as the system’s users, by definition, change that breaks a test might also break a user. As an additional bonus, such tests can serve as useful examples and documentation for users.</p>
|
||||
|
||||
<p>Consider <a data-type="xref" href="ch12.html#example_onetwo-onedot_a_transaction_api">A transaction API</a>, which validates a transaction and saves it to a database.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onedot_a_transaction_api">
|
||||
<h5><span class="label">Example 12-1. </span>A transaction API</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">public void processTransaction(Transaction transaction) {
|
||||
if (isValid(transaction)) {
|
||||
saveToDatabase(transaction);
|
||||
}
|
||||
}
|
||||
|
||||
private boolean isValid(Transaction t) {
|
||||
return t.getAmount() < t.getSender().getBalance();
|
||||
}
|
||||
|
||||
private void saveToDatabase(Transaction t) {
|
||||
String s = t.getSender() + "," + t.getRecipient() + "," + t.getAmount();
|
||||
database.put(t.getId(), s);
|
||||
}
|
||||
|
||||
public void setAccountBalance(String accountName, int balance) {
|
||||
// Write the balance to the database directly
|
||||
}
|
||||
|
||||
public void getAccountBalance(String accountName) {
|
||||
// Read transactions from the database to determine the account balance
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>A tempting way to test this code would be to remove the "private" visibility modifiers and test the implementation logic directly, as demonstrated in <a data-type="xref" href="ch12.html#example_onetwo-twodot_a_naive_test_of_a">A naive test of a transaction API’s implementation</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twodot_a_naive_test_of_a">
|
||||
<h5><span class="label">Example 12-2. </span>A naive test of a transaction API’s implementation</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void emptyAccountShouldNotBeValid() {
|
||||
assertThat(processor.isValid(newTransaction().setSender(EMPTY_ACCOUNT)))
|
||||
.isFalse();
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldSaveSerializedData() {
|
||||
processor.saveToDatabase(newTransaction()
|
||||
.setId(123)
|
||||
.setSender("me")
|
||||
.setRecipient("you")
|
||||
.setAmount(100));
|
||||
assertThat(database.get(123)).isEqualTo("me,you,100");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>This test interacts with the transaction processor in a much different way than its real users would: it peers into the system’s internal state and calls methods that aren’t publicly exposed as part of the system’s API. As a result, the test is brittle, and almost any refactoring of the system under test (such as renaming its methods, factoring them out into a helper class, or changing the serialization format) would cause the test to break, even if such a change would be invisible to the class’s real users.</p>
|
||||
|
||||
<p>Instead, the same test coverage can be achieved by testing only against the class’s public API, as shown in <a data-type="xref" href="ch12.html#example_onetwo-threedot_testing_the_pub">Testing the public API</a>.<sup><a data-type="noteref" id="ch01fn127-marker" href="ch12.html#ch01fn127">2</a></sup></p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-threedot_testing_the_pub">
|
||||
<h5><span class="label">Example 12-3. </span>Testing the public API</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldTransferFunds() {
|
||||
processor.setAccountBalance("me", 150);
|
||||
processor.setAccountBalance("you", 20);
|
||||
|
||||
processor.processTransaction(newTransaction()
|
||||
.setSender("me")
|
||||
.setRecipient("you")
|
||||
.setAmount(100));
|
||||
|
||||
assertThat(processor.getAccountBalance("me")).isEqualTo(50);
|
||||
assertThat(processor.getAccountBalance("you")).isEqualTo(120);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldNotPerformInvalidTransactions() {
|
||||
processor.setAccountBalance("me", 50);
|
||||
processor.setAccountBalance("you", 20);
|
||||
|
||||
processor.processTransaction(newTransaction()
|
||||
.setSender("me")
|
||||
.setRecipient("you")
|
||||
.setAmount(100));
|
||||
|
||||
assertThat(processor.getAccountBalance("me")).isEqualTo(50);
|
||||
assertThat(processor.getAccountBalance("you")).isEqualTo(20);
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Tests using only public APIs are, by definition, accessing the system under test in the same manner that its users would. Such tests are more realistic and less brittle because they form explicit contracts: if such a test breaks, it implies that an existing user of the system will also be broken. Testing only these contracts means that you’re free to do whatever internal refactoring of the system you want without having to worry about making tedious changes to tests.</p>
|
||||
|
||||
<p>It’s not always clear what constitutes a "public API," and the question really gets to the heart of what a "unit" is in unit testing.<a contenteditable="false" data-primary="units (in unit testing)" data-type="indexterm" id="id-vRtDC0tDUPhR"> </a><a contenteditable="false" data-primary="public APIs" data-type="indexterm" id="id-nRtRHlt9Uyhp"> </a> Units can be as small as an individual function or as broad as a set of several related packages/modules. When we say "public API" in this context, we’re really talking about the API exposed by that unit to third parties outside of the team that owns the code. This doesn’t always align with the notion of visibility provided by some programming languages; for example, classes in Java might define themselves as "public" to be accessible by other packages in the same unit but are not intended for use by other parties outside of the unit. Some languages like Python have no built-in notion of visibility (often relying on conventions like prefixing private method names with underscores), and build systems like <a href="https://bazel.build">Bazel</a> can further restrict who is allowed to depend on APIs declared public by the programming language.</p>
|
||||
|
||||
<p>Defining an appropriate scope for a unit<a contenteditable="false" data-primary="scope of tests" data-secondary="defining scope for a unit" data-type="indexterm" id="id-nRt2C1u9Uyhp"> </a> and hence what should be considered the public API is more art than science, but here are some rules of thumb:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>If a method or class exists only to support one or two other classes (i.e., it is a "helper class"), it probably shouldn’t be considered its own unit, and its functionality should be tested through those classes instead of directly.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If a package or class is designed to be accessible by anyone without having to consult with its owners, it almost certainly constitutes a unit that should be tested directly, where its tests access the unit in the same way that the users would.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If a package or class can be accessed only by the people who own it, but it is designed to provide a general piece of functionality useful in a range of contexts (i.e., it is a "support library"), it should also be considered a unit and tested directly. This will usually create some redundancy in testing given that the support library’s code will be covered both by its own tests and the tests of its users. However, such redundancy can be valuable: without it, a gap in test coverage could be introduced if one of the library’s users (and its tests) were ever removed.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>At Google, we’ve found that engineers sometimes need to be persuaded that testing via public APIs is better than testing against implementation details. The reluctance is understandable because it’s often much easier to write tests focused on the piece of code you just wrote rather than figuring out how that code affects the system as a whole. Nevertheless, we have found it valuable to encourage such practices, as the extra upfront effort pays for itself many times over in reduced maintenance burden. Testing against public APIs won’t completely prevent brittleness, but it’s the most important thing you can do to ensure that your tests fail only in the event of meaningful changes to your system.<a contenteditable="false" data-primary="APIs" data-secondary="testing via public APIs" data-startref="ix_APItst" data-type="indexterm" id="id-A2tLC9iDUeh8"> </a><a contenteditable="false" data-primary="brittle tests" data-secondary="preventing" data-startref="ix_brittstAPI" data-tertiary="testing via public APIs" data-type="indexterm" id="id-zRt6HbiqU7ho"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="test_statecomma_not_interactions">
|
||||
<h2>Test State, Not Interactions</h2>
|
||||
|
||||
<p>Another way that tests commonly<a contenteditable="false" data-primary="brittle tests" data-secondary="preventing" data-tertiary="testing state, not interactions" data-type="indexterm" id="id-Obt0CpHOSEhd"> </a> depend on implementation details involves not which methods of the system the test calls, but how the results of those calls are verified. <a contenteditable="false" data-primary="state testing" data-type="indexterm" id="id-yRtJHQHYSJhZ"> </a>In general, there are two ways to verify that a system under test behaves as expected. <a contenteditable="false" data-primary="interaction testing" data-type="indexterm" id="id-GqtvcxHYSdhg"> </a>With <em>state testing</em>, you observe the system itself to see what it looks like after invoking with it. With <em>interaction testing</em>, you instead check that the system took an expected sequence of actions on its collaborators <a href="https://oreil.ly/3S8AL">in response to invoking it</a>. Many tests will perform a combination of state and interaction <span class="keep-together">validation.</span></p>
|
||||
|
||||
<p>Interaction tests tend to be more brittle than state tests for the same reason that it’s more brittle to test a private method than to test a public method: interaction tests check <em>how</em> a system arrived at its result, whereas usually you should care only <em>what</em> the result is. <a data-type="xref" href="ch12.html#example_onetwo-fourdot_a_brittle_intera">A brittle interaction test</a> illustrates a test that uses a test double (explained further in <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>) to verify how <a contenteditable="false" data-primary="test doubles" data-secondary="using in brittle interaction test" data-type="indexterm" id="id-qRtxUjcYSnh6"> </a>a system interacts with a database.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-fourdot_a_brittle_intera">
|
||||
<h5><span class="label">Example 12-4. </span>A brittle interaction test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldWriteToDatabase() {
|
||||
accounts.createUser("foobar");
|
||||
verify(database).put("foobar");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>The test verifies that a specific call was made against a database API, but there are a couple different ways it could go wrong:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>If a bug in the system under test causes the record to be deleted from the database shortly after it was written, the test will pass even though we would have wanted it to fail.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If the system under test is refactored to call a slightly different API to write an equivalent record, the test will fail even though we would have wanted it to pass.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>It’s much less brittle to directly test against the state of the system, as demonstrated in <a data-type="xref" href="ch12.html#example_onetwo-fivedot_testing_against">Testing against state</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-fivedot_testing_against">
|
||||
<h5><span class="label">Example 12-5. </span>Testing against state</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldCreateUsers() {
|
||||
accounts.createUser("foobar");
|
||||
assertThat(accounts.getUser("foobar")).isNotNull();
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>This test more accurately expresses what we care about: the state of the system under test after interacting with it.</p>
|
||||
|
||||
<p>The most common reason for problematic interaction tests is an over reliance on mocking frameworks. <a contenteditable="false" data-primary="mocking frameworks" data-secondary="over reliance on" data-type="indexterm" id="id-vRtDCLhLSPhR"> </a>These frameworks make it easy to create test doubles that record and verify every call made against them, and to use those doubles in place of real objects in tests. This strategy leads directly to brittle interaction tests, and so we tend to prefer the use of real objects in favor of mocked objects, as long as the real objects are fast and deterministic.</p>
|
||||
|
||||
<div data-type="note" id="id-vqhgtxSmhd"><h6>Note</h6>
|
||||
<p>For a more extensive discussion of test doubles and mocking frameworks, when they should be used, and safer alternatives, see <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>.</p>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="writing_clear_tests">
|
||||
<h1>Writing Clear Tests</h1>
|
||||
|
||||
<p>Sooner or later, even if we’ve completely <a contenteditable="false" data-primary="unit testing" data-secondary="preventing brittle tests" data-startref="ix_untstbr" data-type="indexterm" id="id-6VtaCDHZtr"> </a>avoided<a contenteditable="false" data-primary="brittle tests" data-secondary="preventing" data-startref="ix_brittst" data-type="indexterm" id="id-YmtQH6Hytv"> </a> brittleness, our tests will fail. Failure<a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-type="indexterm" id="ix_untstclr"> </a> is a good <a contenteditable="false" data-primary="clear tests, writing" data-type="indexterm" id="ix_clrtst"> </a>thing—test failures provide useful signals to engineers, and are one of the main ways that a unit test provides value.</p>
|
||||
|
||||
<p><a contenteditable="false" data-primary="failures" data-secondary="reasons for test failures" data-type="indexterm" id="id-YmtLCYcytv"> </a> Test failures happen for one of two reasons:<sup><a data-type="noteref" id="ch01fn129-marker" href="ch12.html#ch01fn129">3</a></sup></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>The system under test has a problem or is incomplete. This result is exactly what tests are designed for: alerting you to bugs so that you can fix them.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The test itself is flawed. In this case, nothing is wrong with the system under test, but the test was specified incorrectly. If this was an existing test rather than one that you just wrote, this means that the test is brittle. The previous section discussed how to avoid brittle tests, but it’s rarely possible to eliminate them entirely.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>When a test fails, an engineer’s first job is to identify which of these cases the failure falls into and then to diagnose the actual problem. The speed at which the engineer can do so depends on the test’s <em>clarity</em>. A clear test is one whose purpose for existing and reason for failing is immediately clear to the engineer diagnosing a failure. Tests fail to achieve clarity when their reasons for failure aren’t obvious or when it’s difficult to figure out why they were originally written. Clear tests also bring other benefits, such as documenting the system under test and more easily serving as a basis for new tests.</p>
|
||||
|
||||
<p>Test clarity becomes significant over time. Tests will often outlast the engineers who wrote them, and the requirements and understanding of a system will shift subtly as it ages. It’s entirely possible that a failing test might have been written years ago by an engineer no longer on the team, leaving no way to figure out its purpose or how to fix it. This stands in contrast with unclear production code, whose purpose you can usually determine with enough effort by looking at what calls it and what breaks when it’s removed. With an unclear test, you might never understand its purpose, since removing the test will have no effect other than (potentially) introducing a subtle hole in test coverage.</p>
|
||||
|
||||
<p>In the worst case, these obscure tests just end up getting deleted when engineers can’t figure out how to fix them. Not only does removing such tests introduce a hole in test coverage, but it also indicates that the test has been providing zero value for perhaps the entire period it has existed (which could have been years).</p>
|
||||
|
||||
<p>For a test suite to scale and be useful over time, it’s important that each individual test in that suite be as clear as possible. This section explores techniques and ways of thinking about tests to achieve clarity.</p>
|
||||
|
||||
<section data-type="sect2" id="make_your_tests_complete_and_concise">
|
||||
<h2>Make Your Tests Complete and Concise</h2>
|
||||
|
||||
<p>Two high-level properties that <a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-tertiary="making tests complete and concise" data-type="indexterm" id="id-lRtQCDH8fetq"> </a>help tests achieve <a contenteditable="false" data-primary="clear tests, writing" data-secondary="making tests complete and concise" data-type="indexterm" id="id-qRt9HdHkfat6"> </a>clarity are <a href="https://oreil.ly/lqwyG">completeness and conciseness</a>. A <a contenteditable="false" data-primary="completeness and conciseness in tests" data-type="indexterm" id="id-rRtGIPHkfJtZ"> </a>test is <em>complete</em> when its body contains all of the information a reader needs in order to understand how it arrives at its result. A test is <em>concise</em> when it contains no other distracting or irrelevant information. <a data-type="xref" href="ch12.html#example_onetwo-sixdot_an_incomplete_and">An incomplete and cluttered test</a> shows a test that is neither complete nor concise:</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-sixdot_an_incomplete_and">
|
||||
<h5><span class="label">Example 12-6. </span>An incomplete and cluttered test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldPerformAddition() {
|
||||
Calculator calculator = new Calculator(new RoundingStrategy(),
|
||||
"unused", ENABLE_COSINE_FEATURE, 0.01, calculusEngine, false);
|
||||
int result = calculator.calculate(newTestCalculation());
|
||||
assertThat(result).isEqualTo(5); // Where did this number come from?
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>The test is passing a lot of irrelevant information into the constructor, and the actual important parts of the test are hidden inside of a helper method. The test can be made more complete by clarifying the inputs of the helper method, and more concise by using another helper to hide the irrelevant details of constructing the calculator, as illustrated in <a data-type="xref" href="ch12.html#example_onetwo-sevendot_a_completecomma">A complete, concise test</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-sevendot_a_completecomma">
|
||||
<h5><span class="label">Example 12-7. </span>A complete, concise test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldPerformAddition() {
|
||||
Calculator calculator = newCalculator();
|
||||
int result = calculator.calculate(newCalculation(2, Operation.PLUS, 3));
|
||||
assertThat(result).isEqualTo(5);
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Ideas we discuss later, especially around code sharing, will tie back to completeness and conciseness.<a contenteditable="false" data-primary="DRY (Don’t Repeat Yourself) principle" data-secondary="violating for clearer tests" data-type="indexterm" id="id-vRtDCxSnfEtR"> </a> In particular, it can often be worth violating the DRY (Don’t Repeat Yourself) principle if it leads to clearer tests. Remember: a <em>test’s body should contain all of the information needed to understand it without containing any irrelevant or distracting information</em>.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="test_behaviorscomma_not_methods">
|
||||
<h2>Test Behaviors, Not Methods</h2>
|
||||
|
||||
<p>The first instinct of many engineers is to try to match the structure of their tests to the structure of their code such that every production method has a corresponding test method.<a contenteditable="false" data-primary="behaviors" data-secondary="testing instead of methods" data-type="indexterm" id="ix_behtst"> </a><a contenteditable="false" data-primary="clear tests, writing" data-secondary="testing behaviors, not methods" data-type="indexterm" id="ix_clrtstbeh"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-tertiary="testing behaviors, not methods" data-type="indexterm" id="ix_untstclrbeh"> </a><a contenteditable="false" data-primary="method-driven tests" data-type="indexterm" id="id-vRt8IdHmhEtR"> </a> This pattern can be convenient at first, but over time it leads to problems: as the method being tested grows more complex, its test also grows in complexity and becomes more difficult to reason about. For example, consider the snippet of code in <a data-type="xref" href="ch12.html#example_onetwo-eightdot_a_transaction_s">A transaction snippet</a>, which displays the results of a transaction.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-eightdot_a_transaction_s">
|
||||
<h5><span class="label">Example 12-8. </span>A transaction snippet</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">public void displayTransactionResults(User user, Transaction transaction) {
|
||||
ui.showMessage("You bought a " + transaction.getItemName());
|
||||
if (user.getBalance() < LOW_BALANCE_THRESHOLD) {
|
||||
ui.showMessage("Warning: your balance is low!");
|
||||
}
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>It wouldn’t be uncommon to find a test covering both of the messages that might be shown by the <a contenteditable="false" data-primary="clear tests, writing" data-secondary="testing behaviors, not methods" data-tertiary="method-driven test" data-type="indexterm" id="id-rRt6CDIPhJtZ"> </a>method, as <a contenteditable="false" data-primary="method-driven tests" data-secondary="example test" data-type="indexterm" id="id-vRtnHWImhEtR"> </a>presented in <a data-type="xref" href="ch12.html#example_onetwo-ninedot_a_method-driven">A method-driven test</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-ninedot_a_method-driven">
|
||||
<h5><span class="label">Example 12-9. </span>A method-driven test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void testDisplayTransactionResults() {
|
||||
transactionProcessor.displayTransactionResults(
|
||||
newUserWithBalance(
|
||||
LOW_BALANCE_THRESHOLD.plus(dollars(2))),
|
||||
new Transaction("Some Item", dollars(3)));
|
||||
|
||||
assertThat(ui.getText()).contains("You bought a Some Item");
|
||||
assertThat(ui.getText()).contains("your balance is low");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>With such tests, it’s likely that the test started out covering only the first method. Later, an engineer expanded the test when the second message was added (violating the idea of unchanging tests that we discussed earlier). This modification sets a bad precedent: as the method under test becomes more complex and implements more functionality, its unit test will become increasingly convoluted and grow more and more difficult to work with.</p>
|
||||
|
||||
<p>The problem is that framing tests around methods can naturally encourage unclear tests because a single method often does a few different things under the hood and might have several tricky edge and corner cases. <a contenteditable="false" data-primary="clear tests, writing" data-secondary="testing behaviors, not methods" data-tertiary="behavior-driven test" data-type="indexterm" id="id-P0t1CrTghZtO"> </a>There’s a better way: rather than writing a test for each method, write a test for each <em>behavior.</em><sup><a data-type="noteref" id="ch01fn130-marker" href="ch12.html#ch01fn130">4</a></sup> A behavior is any guarantee that a system makes about how it will respond to a series of inputs while in a particular state.<sup><a data-type="noteref" id="ch01fn132-marker" href="ch12.html#ch01fn132">5</a></sup> Behaviors can often be expressed using the words <a href="https://oreil.ly/I9IvR">"given," "when," and "then"</a>: “<em>Given</em> that a bank <a contenteditable="false" data-primary="given/when/then, expressing behaviors" data-type="indexterm" id="id-QdtMTLTAhlt6"> </a>account is empty, <em>when</em> attempting to withdraw money from it, <em>then</em> the transaction is rejected." The mapping between methods and behaviors is many-to-many: most nontrivial methods implement multiple behaviors, and some behaviors rely on the interaction of multiple methods. The previous example can be rewritten using behavior-driven tests, as presented in <a data-type="xref" href="ch12.html#example_onetwo-onezerodot_a_behavior-dr">A behavior-driven test</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onezerodot_a_behavior-dr">
|
||||
<h5><span class="label">Example 12-10. </span>A behavior-driven test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void displayTransactionResults_showsItemName() {
|
||||
transactionProcessor.displayTransactionResults(
|
||||
new User(), new Transaction("Some Item"));
|
||||
assertThat(ui.getText()).contains("You bought a Some Item");
|
||||
}
|
||||
|
||||
@Test
|
||||
public void displayTransactionResults_showsLowBalanceWarning() {
|
||||
transactionProcessor.displayTransactionResults(
|
||||
newUserWithBalance(
|
||||
LOW_BALANCE_THRESHOLD.plus(dollars(2))),
|
||||
new Transaction("Some Item", dollars(3)));
|
||||
assertThat(ui.getText()).contains("your balance is low");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>The extra boilerplate required to split apart the single test is <a href="https://oreil.ly/hcoon">more than worth it</a>, and the resulting tests are much clearer than the original test. Behavior-driven tests tend to be clearer than method-oriented tests for several <span class="keep-together">reasons.</span> First, they read more like natural language, allowing them to be naturally understood rather than requiring laborious mental parsing. Second, they more clearly express <a href="https://oreil.ly/dAd3k">cause and effect</a> because each test is more limited in scope. Finally, the fact that each test is short and descriptive makes it easier to see what functionality is already tested and encourages engineers to add new streamlined test methods instead of piling onto existing methods.</p>
|
||||
|
||||
<section data-type="sect3" id="structure_tests_to_emphasize_behaviors">
|
||||
<h3>Structure tests to emphasize behaviors</h3>
|
||||
|
||||
<p>Thinking about tests as being coupled to behaviors instead of methods significantly affects how they should be structured.<a contenteditable="false" data-primary="clear tests, writing" data-secondary="testing behaviors, not methods" data-tertiary="structuring tests to emphasize behaviors" data-type="indexterm" id="id-0VtyC3Hnhwhkt0"> </a><a contenteditable="false" data-primary="behaviors" data-secondary="testing instead of methods" data-tertiary="structuring tests to emphasize behaviors" data-type="indexterm" id="id-dRt0HYHBhZhmtR"> </a><a data-primary="behaviors"> </a> Remember that every behavior has three parts: a "given" component that defines how the system is set up, a "when" component that defines the action to be taken on the system, and a "then" component that validates the result.<sup><a data-type="noteref" id="ch01fn134-marker" href="ch12.html#ch01fn134">6</a></sup> Tests are clearest when this structure is explicit.<a contenteditable="false" data-primary="given/when/then, expressing behaviors" data-secondary="well-structured test with" data-type="indexterm" id="id-NztOULHOhehjtp"> </a> Some frameworks like <a href="https://cucumber.io">Cucumber</a> and <a href="http://spockframework.org">Spock</a> directly bake in given/when/then. Other languages can use whitespace and optional comments to make the structure stand out, such as that shown in <a data-type="xref" href="ch12.html#example_onetwo-oneonedot_a_well-structu">A well-structured test</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-oneonedot_a_well-structu">
|
||||
<h5><span class="label">Example 12-11. </span>A well-structured test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void transferFundsShouldMoveMoneyBetweenAccounts() {
|
||||
// Given two accounts with initial balances of $150 and $20
|
||||
Account account1 = newAccountWithBalance(usd(150));
|
||||
Account account2 = newAccountWithBalance(usd(20));
|
||||
|
||||
// When transferring $100 from the first to the second account
|
||||
bank.transferFunds(account1, account2, usd(100));
|
||||
|
||||
// Then the new account balances should reflect the transfer
|
||||
assertThat(account1.getBalance()).isEqualTo(usd(50));
|
||||
assertThat(account2.getBalance()).isEqualTo(usd(120));
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>This level of description isn’t always necessary in trivial tests, and it’s usually sufficient to omit the comments and rely on whitespace to make the sections clear. However, explicit comments can make more sophisticated tests easier to understand. This pattern makes it possible to read tests at three levels of granularity:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>A reader can start by looking at the test method name (discussed below) to get a rough description of the behavior being tested.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If that’s not enough, the reader can look at the given/when/then comments for a formal description of the behavior.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Finally, a reader can look at the actual code to see precisely how that behavior is expressed.</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>This pattern is <a contenteditable="false" data-primary="assertions" data-secondary="among multiple calls to the system under test" data-type="indexterm" id="id-Nzt0CxSOhehjtp"> </a>most commonly violated by interspersing assertions among multiple calls to the system under test (i.e., combining the "when" and "then" blocks). Merging the "then" and "when" blocks in this way can make the test less clear because it makes it difficult to distinguish the action being performed from the expected result.</p>
|
||||
|
||||
<p>When a test does want to validate each step in a multistep process, it’s acceptable to define alternating sequences of when/then blocks. Long blocks can also be made more descriptive by splitting them up with the word "and." <a data-type="xref" href="ch12.html#example_onetwo-onetwodot_alternating_wh">Alternating when/then blocks within a test</a> shows what a relatively complex, behavior-driven test might look like.<a contenteditable="false" data-primary="given/when/then, expressing behaviors" data-secondary="alternating when/then blocks" data-type="indexterm" id="id-LPtMHkT0hxhztY"> </a></p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onetwodot_alternating_wh">
|
||||
<h5><span class="label">Example 12-12. </span>Alternating when/then blocks within a test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldTimeOutConnections() {
|
||||
// Given two users
|
||||
User user1 = newUser();
|
||||
User user2 = newUser();
|
||||
|
||||
// And an empty connection pool with a 10-minute timeout
|
||||
Pool pool = newPool(Duration.minutes(10));
|
||||
|
||||
// When connecting both users to the pool
|
||||
pool.connect(user1);
|
||||
pool.connect(user2);
|
||||
|
||||
// Then the pool should have two connections
|
||||
assertThat(pool.getConnections()).hasSize(2);
|
||||
|
||||
// When waiting for 20 minutes
|
||||
clock.advance(Duration.minutes(20));
|
||||
|
||||
// Then the pool should have no connections
|
||||
assertThat(pool.getConnections()).isEmpty();
|
||||
|
||||
// And each user should be disconnected
|
||||
assertThat(user1.isConnected()).isFalse();
|
||||
assertThat(user2.isConnected()).isFalse();
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>When writing such tests, be careful to ensure that you’re not inadvertently testing multiple behaviors at the same time. Each test should cover only a single behavior, and the vast majority of unit tests require only one "when" and one "then" block.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="name_tests_after_the_behavior_being_tes">
|
||||
<h3>Name tests after the behavior being tested</h3>
|
||||
|
||||
<p>Method-oriented<a contenteditable="false" data-primary="clear tests, writing" data-secondary="testing behaviors, not methods" data-tertiary="naming tests after behavior being tested" data-type="indexterm" id="id-dRtRCYHgtZhmtR"> </a> tests are usually named <a contenteditable="false" data-primary="behaviors" data-secondary="testing instead of methods" data-tertiary="naming tests after behavior being tested" data-type="indexterm" id="id-QdtAHVHatdhDtn"> </a>after the method being tested (e.g., a test for the <code>updateBalance</code> method is usually called <code>testUpdateBalance</code>). With more focused behavior-driven tests, we have a lot more flexibility and the chance to convey useful information in the test’s name. The test name is very important: it will often be the first or only token visible in failure reports, so it’s your best opportunity to communicate the problem when the test breaks. It’s also the most straightforward way to express the intent of the test.</p>
|
||||
|
||||
<p>A test’s name should summarize the behavior it is testing. A good name describes both the actions that are being taken on a system <a href="https://oreil.ly/8eqqv">and the expected outcome</a>. Test names will sometimes include additional information like the state of the system or its environment before taking action on it. Some languages and frameworks make this easier than others by allowing tests to be nested within one another and named using strings, such as in <a data-type="xref" href="ch12.html#example_onetwo-onethreedot_some_sample">Some sample nested naming patterns</a>, which uses <a href="https://jasmine.github.io">Jasmine</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onethreedot_some_sample">
|
||||
<h5><span class="label">Example 12-13. </span>Some sample nested naming patterns</h5>
|
||||
|
||||
<pre data-type="programlisting">describe("multiplication", function() {
|
||||
describe("with a positive number", function() {
|
||||
var positiveNumber = 10;
|
||||
it("is positive with another positive number", function() {
|
||||
expect(positiveNumber * 10).toBeGreaterThan(0);
|
||||
});
|
||||
it("is negative with a negative number", function() {
|
||||
expect(positiveNumber * -10).toBeLessThan(0);
|
||||
});
|
||||
});
|
||||
describe("with a negative number", function() {
|
||||
var negativeNumber = 10;
|
||||
it("is negative with a positive number", function() {
|
||||
expect(negativeNumber * 10).toBeLessThan(0);
|
||||
});
|
||||
it("is positive with another negative number", function() {
|
||||
expect(negativeNumber * -10).toBeGreaterThan(0);
|
||||
});
|
||||
});
|
||||
});</pre>
|
||||
</div>
|
||||
|
||||
<p class="pagebreak-before">Other languages <a contenteditable="false" data-primary="method-driven tests" data-secondary="sample method naming patterns" data-type="indexterm" id="id-Nzt0CAUktehjtp"> </a>require us to encode all of this information in a method name, leading to method naming patterns like that shown in <a data-type="xref" href="ch12.html#example_onetwo-onefourdot_some_sample_m">Some sample method naming patterns</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onefourdot_some_sample_m">
|
||||
<h5><span class="label">Example 12-14. </span>Some sample method naming patterns</h5>
|
||||
|
||||
<pre data-type="programlisting">multiplyingTwoPositiveNumbersShouldReturnAPositiveNumber
|
||||
multiply_positiveAndNegative_returnsNegative
|
||||
divide_byZero_throwsException</pre>
|
||||
</div>
|
||||
|
||||
<p>Names like this are much more verbose than we’d normally want to write for methods in production code, but the use case is different: we never need to write code that calls these, and their names frequently need to be read by humans in reports. Hence, the extra verbosity is warranted.</p>
|
||||
|
||||
<p>Many different naming strategies are acceptable so long as they’re used consistently within a single test class. A good trick if you’re stuck is to try starting the test name with the word "should." When taken with the name of the class being tested, this naming scheme allows the test name to be read as a sentence. For example, a test of a <code>BankAccount</code> class named <code>shouldNotAllowWithdrawalsWhenBalanceIsEmpty</code> can be read as "BankAccount should not allow withdrawals when balance is empty." By reading the names of all the test methods in a suite, you should get a good sense of the behaviors implemented by the system under test. Such names also help ensure that the test stays focused on a single behavior: if you need to use the word "and" in a test name, there’s a good chance that you’re actually testing multiple behaviors and should be writing multiple tests!<a contenteditable="false" data-primary="behaviors" data-secondary="testing instead of methods" data-startref="ix_behtst" data-type="indexterm" id="id-mRtkcns6tYhAtr"> </a><a contenteditable="false" data-primary="clear tests, writing" data-secondary="testing behaviors, not methods" data-startref="ix_clrtstbeh" data-type="indexterm" id="id-BOt0Iys6tohktd"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-startref="ix_untstclrbeh" data-tertiary="testing behaviors, not methods" data-type="indexterm" id="id-3VtVUzsEtvhota"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="donapostrophet_put_logic_in_tests">
|
||||
<h2>Don’t Put Logic in Tests</h2>
|
||||
|
||||
<p>Clear tests are trivially correct upon inspection; that is, it is obvious that a test is doing the correct thing just from glancing at it.<a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-tertiary="leaving logic out of tests" data-type="indexterm" id="id-9xtLC2HWtltk"> </a><a contenteditable="false" data-primary="clear tests, writing" data-secondary="leaving logic out of tests" data-type="indexterm" id="id-rRtyHPH7tJtZ"> </a> This is possible in test code because each test needs to handle only a particular set of inputs, whereas production code must be generalized to handle any input. <a contenteditable="false" data-primary="logic, not putting in tests" data-type="indexterm" id="id-vRtjcdHytEtR"> </a>For production code, we’re able to write tests that ensure complex logic is correct. But test code doesn’t have that luxury—if you feel like you need to write a test to verify your test, something has gone wrong!</p>
|
||||
|
||||
<p>Complexity is most often introduced in the form of <em>logic</em>. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals.<a contenteditable="false" data-primary="programming languages" data-secondary="logic in" data-type="indexterm" id="id-vRtnHocytEtR"> </a> When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen.<a contenteditable="false" data-primary="bugs" data-secondary="logic concealing a bug in a test" data-type="indexterm" id="id-nRtycacGtntp"> </a> It doesn’t take much logic to make a test more difficult to reason about. For example, does the test in <a data-type="xref" href="ch12.html#example_onetwo-onefivedot_logic_conceal">Logic concealing a bug</a> <a href="https://oreil.ly/yJDqh">look correct to you</a>?</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onefivedot_logic_conceal">
|
||||
<h5><span class="label">Example 12-15. </span>Logic concealing a bug</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldNavigateToAlbumsPage() {
|
||||
String baseUrl = "http://photos.google.com/";
|
||||
Navigator nav = new Navigator(baseUrl);
|
||||
nav.goToAlbumPage();
|
||||
assertThat(nav.getCurrentUrl()).isEqualTo(baseUrl + "/albums");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>There’s not much logic here: really just one string concatenation. But if we simplify the test by removing that one bit of logic, a bug immediately becomes clear, as demonstrated in <a data-type="xref" href="ch12.html#example_onetwo-onesixdot_a_test_without">A test without logic reveals the bug</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onesixdot_a_test_without">
|
||||
<h5><span class="label">Example 12-16. </span>A test without logic reveals the bug</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldNavigateToPhotosPage() {
|
||||
Navigator nav = new Navigator("http://photos.google.com/");
|
||||
nav.goToPhotosPage();
|
||||
assertThat(nav.getCurrentUrl()))
|
||||
.isEqualTo("http://photos.google.com//albums"); // Oops!
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>When the whole string is written out, we can see right away that we’re expecting two slashes in the URL instead of just one. If the production code made a similar mistake, this test would fail to detect a bug. Duplicating the base URL was a small price to pay for making the test more descriptive and meaningful (see the discussion of DAMP versus DRY tests later in this chapter).</p>
|
||||
|
||||
<p>If humans are bad at spotting bugs from string concatenation, we’re even worse at spotting bugs that come from more sophisticated programming constructs like loops and conditionals. The lesson is clear: in test code, stick to straight-line code over clever logic, and consider tolerating some duplication when it makes the test more descriptive and meaningful. We’ll discuss ideas around duplication and code sharing later in this chapter.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="write_clear_failure_messages">
|
||||
<h2>Write Clear Failure Messages</h2>
|
||||
|
||||
<p>One last aspect of clarity has to do not with how a test is written, but with what an engineer sees when it fails. <a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-tertiary="writing clear failure messages" data-type="indexterm" id="id-rRt6CPH3uJtZ"> </a><a contenteditable="false" data-primary="clear tests, writing" data-secondary="writing clear failure messages" data-type="indexterm" id="id-vRtnHdHQuEtR"> </a><a contenteditable="false" data-primary="failures" data-secondary="writing clear failure messages for tests" data-type="indexterm" id="id-nRtycMHluntp"> </a>In an ideal world, an engineer could diagnose a problem just from reading its failure message in a log or report without ever having to look at the test itself. A good failure message contains much the same information as the test’s name: it should clearly express the desired outcome, the actual outcome, and any relevant parameters.</p>
|
||||
|
||||
<p class="pagebreak-before">Here’s an example of a bad failure message:</p>
|
||||
|
||||
<pre data-type="programlisting">Test failed: account is closed</pre>
|
||||
|
||||
<p>Did the test fail because the account was closed, or was the account expected to be closed and the test failed because it wasn’t? A better failure message clearly distinguishes the expected from the actual state and gives more context about the result:</p>
|
||||
|
||||
<pre data-type="programlisting">Expected an account in state CLOSED, but got account:
|
||||
<{name: "my-account", state: "OPEN"}</pre>
|
||||
|
||||
<p>Good libraries can help make it easier to write useful failure messages.<a contenteditable="false" data-primary="assertions" data-secondary="in Java test, using Truth library" data-type="indexterm" id="id-zRtDCBTpuBto"> </a><a contenteditable="false" data-primary="Java" data-secondary="assertion in a test using Truth library" data-type="indexterm" id="id-bRtgHrT0u3tR"> </a> Consider the assertions in <a data-type="xref" href="ch12.html#example_onetwo-onesevendot_an_assertion">An assertion using the Truth library</a> in a Java test, the first of which <a contenteditable="false" data-primary="Truth assertion library" data-type="indexterm" id="id-dRtNIJTLuxtO"> </a>uses classical JUnit asserts, and the second of which uses <a href="https://truth.dev">Truth</a>, an assertion library developed by Google:</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-onesevendot_an_assertion">
|
||||
<h5><span class="label">Example 12-17. </span>An assertion using the Truth library</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">Set<String> colors = ImmutableSet.of("red", "green", "blue");
|
||||
assertTrue(colors.contains("orange")); // JUnit
|
||||
assertThat(colors).contains("orange"); // Truth</pre>
|
||||
</div>
|
||||
|
||||
<p>Because the first assertion only receives a Boolean value, it is only able to give a generic error message like "expected <true> but was <false>," which isn’t very informative in a failing test output. Because the second assertion explicitly receives the subject of the assertion, it is able to give <a href="https://oreil.ly/RFUEN">a much more useful error message</a>: <span class="keep-together">"AssertionError:</span> <[red, green, blue]> should have contained <orange>."</p>
|
||||
|
||||
<p>Not all<a contenteditable="false" data-primary="assertions" data-secondary="test assertion in Go" data-type="indexterm" id="id-dRtRCehLuxtO"> </a> languages have<a contenteditable="false" data-primary="Go programming language" data-secondary="test assertion in" data-type="indexterm" id="id-QdtAHyhpult6"> </a> such helpers available, but it should always be possible to manually specify the important information in the failure message. For example, test assertions in Go conventionally look like <a data-type="xref" href="ch12.html#example_onetwo-oneeightdot_a_test_asser">A test assertion in Go</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-oneeightdot_a_test_asser">
|
||||
<h5><span class="label">Example 12-18. </span>A test assertion in Go</h5>
|
||||
|
||||
<pre data-code-language="go" data-type="programlisting">result := Add(2, 3)
|
||||
if result != 5 {
|
||||
t.Errorf("Add(2, 3) = %v, want %v", result, 5)
|
||||
}</pre>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tests_and_code_sharing_dampcomma_not_dr">
|
||||
<h1>Tests and Code Sharing: DAMP, Not DRY</h1>
|
||||
|
||||
<p>One final<a contenteditable="false" data-primary="unit testing" data-secondary="writing clear tests" data-startref="ix_untstclr" data-type="indexterm" id="id-YmtLC6HYuv"> </a> aspect of <a contenteditable="false" data-primary="clear tests, writing" data-startref="ix_clrtst" data-type="indexterm" id="id-jRteHVHBuW"> </a>writing clear tests and avoiding brittleness has to do with code sharing. <a contenteditable="false" data-primary="code sharing, tests and" data-type="indexterm" id="ix_cdsh"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-type="indexterm" id="ix_untstcdsh"> </a>Most software attempts to achieve a principle called DRY—"Don’t Repeat Yourself." DRY states <a contenteditable="false" data-primary="DRY (Don’t Repeat Yourself) principle" data-secondary="tests and code sharing, DAMP, not DRY" data-type="indexterm" id="ix_DRY"> </a>that software is easier to maintain if every concept is canonically represented in one place and code duplication is kept to a minimum. This approach is especially valuable in making changes easier because an engineer needs to update only one piece of code rather than tracking down multiple references. The downside to such consolidation is that it can make code unclear, requiring readers to follow chains of references to understand what the code is doing.</p>
|
||||
|
||||
<p>In normal production code, that downside is usually a small price to pay for making code easier to change and work with. But this cost/benefit analysis plays out a little differently in the context of test code. Good tests are designed to be stable, and in fact you usually <em>want</em> them to break when the system being tested changes. So DRY doesn’t have quite as much benefit when it comes to test code. At the same time, the costs of complexity are greater for tests: production code has the benefit of a test suite to ensure that it keeps working as it becomes complex, whereas tests must stand by themselves, risking bugs if they aren’t self-evidently correct. As mentioned earlier, something has gone wrong if tests start becoming complex enough that it feels like they need their own tests to ensure that they’re working properly.<a contenteditable="false" data-primary="Descriptive And Meaningful Phrases" data-see="DAMP" data-type="indexterm" id="id-2Vt6HLc1um"> </a></p>
|
||||
|
||||
<p>Instead of<a contenteditable="false" data-primary="DAMP" data-type="indexterm" id="id-2VtvCqI1um"> </a> being completely DRY, test code should <a contenteditable="false" data-primary="code sharing, tests and" data-secondary="test that is too DRY" data-type="indexterm" id="id-ObtwHeIEuj"> </a>often strive to be <a href="https://oreil.ly/5VPs2">DAMP</a>—that is, to promote "Descriptive And Meaningful Phrases." A little bit of duplication is OK in tests so long as that duplication makes the test simpler and clearer.<a contenteditable="false" data-primary="DRY (Don’t Repeat Yourself) principle" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="test that is too DRY" data-type="indexterm" id="id-GqtmIgIpur"> </a> To illustrate, <a data-type="xref" href="ch12.html#example_onetwo-oneninedot_a_test_that_i">A test that is too DRY</a> presents some tests that are far too DRY.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-oneninedot_a_test_that_i">
|
||||
<h5><span class="label">Example 12-19. </span>A test that is too DRY</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldAllowMultipleUsers() {
|
||||
List<User> users = createUsers(false, false);
|
||||
Forum forum = createForumAndRegisterUsers(users);
|
||||
validateForumAndUsers(forum, users);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldNotAllowBannedUsers() {
|
||||
List<User> users = createUsers(true);
|
||||
Forum forum = createForumAndRegisterUsers(users);
|
||||
validateForumAndUsers(forum, users);
|
||||
}
|
||||
|
||||
// Lots more tests...
|
||||
|
||||
private static List<User> createUsers(boolean... banned) {
|
||||
List<User> users = new ArrayList<>();
|
||||
for (boolean isBanned : banned) {
|
||||
users.add(newUser()
|
||||
.setState(isBanned ? State.BANNED : State.NORMAL)
|
||||
.build());
|
||||
}
|
||||
return users;
|
||||
}
|
||||
|
||||
private static Forum createForumAndRegisterUsers(List<User> users) {
|
||||
Forum forum = new Forum();
|
||||
for (User user : users) {
|
||||
try {
|
||||
forum.register(user);
|
||||
} catch(BannedUserException ignored) {}
|
||||
}
|
||||
return forum;
|
||||
}
|
||||
|
||||
private static void validateForumAndUsers(Forum forum, List<User> users) {
|
||||
assertThat(forum.isReachable()).isTrue();
|
||||
for (User user : users) {
|
||||
assertThat(forum.hasRegisteredUser(user))
|
||||
.isEqualTo(user.getState() == State.BANNED);
|
||||
}
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>The problems<a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="DAMP test" data-type="indexterm" id="id-yRtyCeSxu3"> </a> in this code should be apparent based on the previous discussion of clarity. For one, although the test bodies are very concise, they are not complete: important details are hidden away in helper methods that the reader can’t see without having to scroll to a completely different part of the file. <a contenteditable="false" data-primary="DAMP" data-secondary="test rewritten to be DAMP" data-type="indexterm" id="id-GqtNHmSpur"> </a><a contenteditable="false" data-primary="code sharing, tests and" data-secondary="tests should be DAMP" data-type="indexterm" id="id-1VtvcASxu8"> </a>Those helpers are also full of logic that makes them more difficult to verify at a glance (did you spot the bug?). The test becomes much clearer when it’s rewritten to use DAMP, as shown in <a data-type="xref" href="ch12.html#example_onetwo-twozerodot_tests_should">Tests should be DAMP</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twozerodot_tests_should">
|
||||
<h5><span class="label">Example 12-20. </span>Tests should be DAMP</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test
|
||||
public void shouldAllowMultipleUsers() {
|
||||
User user1 = newUser().setState(State.NORMAL).build();
|
||||
User user2 = newUser().setState(State.NORMAL).build();
|
||||
|
||||
Forum forum = new Forum();
|
||||
forum.register(user1);
|
||||
forum.register(user2);
|
||||
|
||||
assertThat(forum.hasRegisteredUser(user1)).isTrue();
|
||||
assertThat(forum.hasRegisteredUser(user2)).isTrue();
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldNotRegisterBannedUsers() {
|
||||
User user = newUser().setState(State.BANNED).build();
|
||||
|
||||
Forum forum = new Forum();
|
||||
try {
|
||||
forum.register(user);
|
||||
} catch(BannedUserException ignored) {}
|
||||
|
||||
assertThat(forum.hasRegisteredUser(user)).isFalse();
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>These tests have more duplication, and the test bodies are a bit longer, but the extra verbosity is worth it. Each individual test is far more meaningful and can be understood entirely without leaving the test body. A reader of these tests can feel confident that the tests do what they claim to do and aren’t hiding any bugs.</p>
|
||||
|
||||
<p>DAMP is not a replacement for DRY; it is complementary to it. <a contenteditable="false" data-primary="DRY (Don’t Repeat Yourself) principle" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="DAMP as complement to DRY" data-type="indexterm" id="id-lRtQCQfdu1"> </a><a contenteditable="false" data-primary="DAMP" data-secondary="complementary to DRY, not a replacement" data-type="indexterm" id="id-qRt9Hzflud"> </a>Helper methods and test infrastructure can still help make tests clearer by making them more concise, factoring out repetitive steps whose details aren’t relevant to the particular behavior being tested. The important point is that such refactoring should be done with an eye toward making tests more descriptive and meaningful, and not solely in the name of reducing repetition. The rest of this section will explore common patterns for sharing code across tests.</p>
|
||||
|
||||
<section data-type="sect2" id="shared_values">
|
||||
<h2>Shared Values</h2>
|
||||
|
||||
<p>Many tests are structured<a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="shared values" data-type="indexterm" id="id-9xtLC2HAhNuk"> </a> by defining a set of shared <a contenteditable="false" data-primary="code sharing, tests and" data-secondary="shared values" data-type="indexterm" id="id-rRtyHPHPhEuZ"> </a>values to be used by tests and then by defining the tests that cover various cases for how these values interact. <a data-type="xref" href="ch12.html#example_onetwo-twoonedot_shared_values">Shared values with ambiguous names</a> illustrates what such tests look like.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twoonedot_shared_values">
|
||||
<h5><span class="label">Example 12-21. </span>Shared values with ambiguous names</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">private static final Account ACCOUNT_1 = Account.newBuilder()
|
||||
.setState(AccountState.OPEN).setBalance(50).build();
|
||||
|
||||
private static final Account ACCOUNT_2 = Account.newBuilder()
|
||||
.setState(AccountState.CLOSED).setBalance(0).build();
|
||||
|
||||
private static final Item ITEM = Item.newBuilder()
|
||||
.setName("Cheeseburger").setPrice(100).build();
|
||||
|
||||
// Hundreds of lines of other tests...
|
||||
|
||||
@Test
|
||||
public void canBuyItem_returnsFalseForClosedAccounts() {
|
||||
assertThat(store.canBuyItem(ITEM, ACCOUNT_1)).isFalse();
|
||||
}
|
||||
|
||||
@Test
|
||||
public void canBuyItem_returnsFalseWhenBalanceInsufficient() {
|
||||
assertThat(store.canBuyItem(ITEM, ACCOUNT_2)).isFalse();
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>This strategy can make tests very concise, but it causes problems as the test suite grows. For one, it can be difficult to understand why a particular value was chosen for a test. In <a data-type="xref" href="ch12.html#example_onetwo-twoonedot_shared_values">Shared values with ambiguous names</a>, the test names fortunately clarify which scenarios are being tested, but you still need to scroll up to the definitions to confirm that <code>ACCOUNT_1</code> and <code>ACCOUNT_2</code> are appropriate for those scenarios. More descriptive constant names (e.g., <code>CLOSED_ACCOUNT</code> and <code>ACCOUNT_WITH_LOW_BALANCE</code>) help a bit, but they still make it more difficult to see the exact details of the value being tested, and the ease of reusing these values can encourage engineers to do so even when the name doesn’t exactly describe what the test needs.</p>
|
||||
|
||||
<p>Engineers are usually drawn to using shared constants because constructing individual values in each test can be verbose. <a contenteditable="false" data-primary="helper methods" data-secondary="shared values in" data-type="indexterm" id="id-nRt2CZUkh8up"> </a>A better way to accomplish this goal is to construct data <a href="https://oreil.ly/Jc4VJ">using helper methods</a> (see <a data-type="xref" href="ch12.html#example_onetwo-twotwodot_shared_values">Shared values using helper methods</a>) that require the test author to specify only values they care about, and setting reasonable defaults<sup><a data-type="noteref" id="ch01fn139-marker" href="ch12.html#ch01fn139">7</a></sup> for all other values. This construction is trivial to do in languages that support named parameters, but languages without named parameters<a contenteditable="false" data-primary="Builder pattern" data-type="indexterm" id="id-bRtMUZUmhwuR"> </a> can use constructs such as the <em>Builder</em> pattern to emulate them (often with the assistance of tools such as <a href="https://oreil.ly/cVYK6">AutoValue</a>):</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twotwodot_shared_values">
|
||||
<h5><span class="label">Example 12-22. </span>Shared values using helper methods</h5>
|
||||
|
||||
<pre data-code-language="python" data-type="programlisting"># A helper method wraps a constructor by defining arbitrary defaults for
|
||||
# each of its parameters.
|
||||
def newContact(
|
||||
firstName="Grace", lastName="Hopper", phoneNumber="555-123-4567"):
|
||||
return Contact(firstName, lastName, phoneNumber)
|
||||
|
||||
# Tests call the helper, specifying values for only the parameters that they
|
||||
# care about.
|
||||
def test_fullNameShouldCombineFirstAndLastNames(self):
|
||||
def contact = newContact(firstName="Ada", lastName="Lovelace")
|
||||
self.assertEqual(contact.fullName(), "Ada Lovelace")
|
||||
|
||||
// Languages like Java that don’t support named parameters can emulate them
|
||||
// by returning a mutable "builder" object that represents the value under
|
||||
// construction.
|
||||
private static Contact.Builder newContact() {
|
||||
return Contact.newBuilder()
|
||||
.setFirstName("Grace")
|
||||
.setLastName("Hopper")
|
||||
.setPhoneNumber("555-123-4567");
|
||||
}
|
||||
|
||||
// Tests then call methods on the builder to overwrite only the parameters
|
||||
// that they care about, then call build() to get a real value out of the
|
||||
// builder.
|
||||
@Test
|
||||
public void fullNameShouldCombineFirstAndLastNames() {
|
||||
Contact contact = newContact()
|
||||
.setFirstName("Ada")
|
||||
.setLastName("Lovelace")
|
||||
.build();
|
||||
assertThat(contact.getFullName()).isEqualTo("Ada Lovelace");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Using helper methods to construct these values allows each test to create the exact values it needs without having to worry about specifying irrelevant information or conflicting with other tests.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="shared_setup">
|
||||
<h2>Shared Setup</h2>
|
||||
|
||||
<p>A related way tha<a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="shared setup" data-type="indexterm" id="id-rRt6CPH7tEuZ"> </a>t tests shared code is via setup/initialization logic. <a contenteditable="false" data-primary="code sharing, tests and" data-secondary="shared setup" data-type="indexterm" id="id-vRtnHdHyteuR"> </a>Many test frameworks allow engineers to define methods to execute before each test in a suite is run. Used appropriately, these methods can make tests clearer and more concise by obviating the repetition of tedious and irrelevant initialization logic. Used inappropriately, these methods can harm a test’s completeness by hiding important details in a separate initialization method.</p>
|
||||
|
||||
<p>The best use case for setup methods is to construct the object under tests and its collaborators. This is useful when the majority of tests don’t care about the specific arguments used to construct those objects and can let them stay in their default states. The same idea also applies to stubbing return values for test doubles, which is a concept that we explore in more detail in <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>.</p>
|
||||
|
||||
<p>One risk in using setup methods is that <a contenteditable="false" data-primary="dependencies" data-secondary="on values in shared setup methods" data-type="indexterm" id="id-nRt2C0IGt8up"> </a>they can lead to unclear tests if those tests begin to depend on the particular values used in setup. For example, the test in <a data-type="xref" href="ch12.html#example_onetwo-twothreedot_dependencies">Dependencies on values in setup methods</a> seems incomplete because a reader of the test needs to go hunting to discover where the string "Donald Knuth" came from.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twothreedot_dependencies">
|
||||
<h5><span class="label">Example 12-23. </span>Dependencies on values in setup methods</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">private NameService nameService;
|
||||
private UserStore userStore;
|
||||
|
||||
@Before
|
||||
public void setUp() {
|
||||
nameService = new NameService();
|
||||
nameService.set("user1", "Donald Knuth");
|
||||
userStore = new UserStore(nameService);
|
||||
}
|
||||
|
||||
// [... hundreds of lines of tests ...]
|
||||
|
||||
@Test
|
||||
public void shouldReturnNameFromService() {
|
||||
UserDetails user = userStore.get("user1");
|
||||
assertThat(user.getName()).isEqualTo("Donald Knuth");
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Tests like these that explicitly care about particular values should state those values directly, overriding the default defined in the setup method if need be. The resulting test contains slightly more repetition, as shown in <a data-type="xref" href="ch12.html#example_onetwo-twofourdot_overriding_va">Overriding values in setup methods</a>, but the result is far more descriptive and meaningful.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twofourdot_overriding_va">
|
||||
<h5><span class="label">Example 12-24. </span>Overriding values in setup methods</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">private NameService nameService;
|
||||
private UserStore userStore;
|
||||
|
||||
@Before
|
||||
public void setUp() {
|
||||
nameService = new NameService();
|
||||
nameService.set("user1", "Donald Knuth");
|
||||
userStore = new UserStore(nameService);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void shouldReturnNameFromService() {
|
||||
nameService.set("user1", "Margaret Hamilton");
|
||||
UserDetails user = userStore.get("user1");
|
||||
assertThat(user.getName()).isEqualTo("Margaret Hamilton");
|
||||
}</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="shared_helpers_and_validation">
|
||||
<h2>Shared Helpers and Validation</h2>
|
||||
|
||||
<p>The last common way that code is shared across tests is via "helper methods" called from the body of the test methods.<a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="shared helpers and validation" data-type="indexterm" id="id-vRtDCdHQueuR"> </a><a contenteditable="false" data-primary="code sharing, tests and" data-secondary="shared helpers and validation" data-type="indexterm" id="id-nRtRHMHlu8up"> </a> We already discussed how helper methods can be a useful way for concisely constructing test values—this usage is warranted, but other types of helper methods can be dangerous.</p>
|
||||
|
||||
<p>One common type of helper is a method that performs a common set of assertions against a system under test.<a contenteditable="false" data-primary="helper methods" data-secondary="shared helpers and validation" data-type="indexterm" id="id-nRt2Caclu8up"> </a><a contenteditable="false" data-primary="validation, shared helpers and" data-type="indexterm" id="id-P0t0H9cLu3uO"> </a> The extreme example is a <code>validate</code> method called at the end of every test method, which performs a set of fixed checks against the system under test. Such a validation strategy can be a bad habit to get into because tests using this approach are less behavior driven. With such tests, it is much more difficult to determine the intent of any particular test and to infer what exact case the author had in mind when writing it. When bugs are introduced, this strategy can also make them more difficult to localize because they will frequently cause a large number of tests to start failing.</p>
|
||||
|
||||
<p class="pagebreak-before">More focused validation methods can still be useful, however. The best validation helper methods assert a <em>single conceptual fact</em> about their inputs, in contrast to general-purpose validation methods that cover a range of conditions. Such methods can be particularly helpful when the condition that they are validating is conceptually simple but requires looping or conditional logic to implement that would reduce clarity were it included in the body of a test method. For example, the helper method in <a data-type="xref" href="ch12.html#example_onetwo-twofivedot_a_conceptuall">A conceptually simple test</a> might be useful in a test covering several different cases around account access.</p>
|
||||
|
||||
<div data-type="example" id="example_onetwo-twofivedot_a_conceptuall">
|
||||
<h5><span class="label">Example 12-25. </span>A conceptually simple test</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">private void assertUserHasAccessToAccount(User user, Account account) {
|
||||
for (long userId : account.getUsersWithAccess()) {
|
||||
if (user.getId() == userId) {
|
||||
return;
|
||||
}
|
||||
}
|
||||
fail(user.getName() + " cannot access " + account.getName());
|
||||
}</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="defining_test_infrastructure">
|
||||
<h2>Defining Test Infrastructure</h2>
|
||||
|
||||
<p>The techniques we’ve discussed so far cover sharing code across<a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-tertiary="defining test infrastructure" data-type="indexterm" id="id-nRt2CMHbF8up"> </a> methods in a single test class or suite.<a contenteditable="false" data-primary="code sharing, tests and" data-secondary="defining test infrastructure" data-type="indexterm" id="id-P0t0HZHMF3uO"> </a> Sometimes, it can also be valuable to share code across multiple test suites. <a contenteditable="false" data-primary="test infrastructure" data-type="indexterm" id="id-A2t0cbH8FDu8"> </a>We refer to this sort of code as <em>test infrastructure</em>. Though it is usually more valuable in integration or end-to-end tests, carefully designed test infrastructure can make unit tests much easier to write in some circumstances.</p>
|
||||
|
||||
<p>Custom test infrastructure must be approached more carefully than the code sharing that happens within a single test suite. In many ways, test infrastructure code is more similar to production code than it is to other test code given that it can have many callers that depend on it and can be difficult to change without introducing breakages. Most engineers aren’t expected to make changes to the common test infrastructure while testing their own features. Test infrastructure needs to be treated as its own separate product, and accordingly, <em>test infrastructure must always have its own tests</em>.</p>
|
||||
|
||||
<p>Of course, most of the test infrastructure that most engineers use comes in the form of well-known third-party libraries like <a href="https://junit.org">JUnit</a>. A huge number of such libraries are available, and standardizing on them within an organization should happen as early and universally as possible. For example, Google many years ago mandated Mockito as the only mocking framework that should be used in new Java tests and banned new tests from using other mocking frameworks. This edict produced some grumbling at the time from people comfortable with other frameworks, but today, it’s universally seen as a good move that made our tests easier to understand and work with.<a contenteditable="false" data-primary="DRY (Don’t Repeat Yourself) principle" data-secondary="tests and code sharing, DAMP, not DRY" data-startref="ix_DRY" data-type="indexterm" id="id-zRt6HjILFyuo"> </a><a contenteditable="false" data-primary="code sharing, tests and" data-startref="ix_cdsh" data-type="indexterm" id="id-bRtYcMIWFwuR"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="tests and code sharing, DAMP, not DRY" data-startref="ix_untstcdsh" data-type="indexterm" id="id-0Vt3IWIJF8uo"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00016">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Unit tests are one of the most powerful tools that we as software engineers have to make sure that our systems keep working over time in the face of unanticipated changes. But with great power comes great responsibility, and careless use of unit testing can result in a system that requires much more effort to maintain and takes much more effort to change without actually improving our confidence in said <span class="keep-together">system.</span></p>
|
||||
|
||||
<p>Unit tests at Google are far from perfect, but we’ve found tests that follow the practices outlined in this chapter to be orders of magnitude more valuable than those that don’t. We hope they’ll help you to improve the quality of your own tests!</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00114">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Strive for unchanging tests.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Test via public APIs.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Test state, not interactions.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Make your tests complete and concise.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Test behaviors, not methods.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Structure tests to emphasize behaviors.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Name tests after the behavior being tested.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Don’t put logic in tests.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Write clear failure messages.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Follow DAMP over DRY when sharing<a contenteditable="false" data-primary="unit testing" data-startref="ix_untst" data-type="indexterm" id="id-vRtDC2CmhvHbil"> </a> code for tests.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn125"><sup><a href="ch12.html#ch01fn125-marker">1</a></sup>Note that this is slightly different from a <em>flaky test</em>, which fails nondeterministically without any change to production code.</p><p data-type="footnote" id="ch01fn127"><sup><a href="ch12.html#ch01fn127-marker">2</a></sup>This is sometimes called the "<a href="https://oreil.ly/8zSZg">Use the front door first principle</a>."</p><p data-type="footnote" id="ch01fn129"><sup><a href="ch12.html#ch01fn129-marker">3</a></sup>These are also the same two reasons that a test can be "flaky." Either the system under test has a nondeterministic fault, or the test is flawed such that it sometimes fails when it should pass.</p><p data-type="footnote" id="ch01fn130"><sup><a href="ch12.html#ch01fn130-marker">4</a></sup>See <a href="https://testing.googleblog.com/2014/04/testing-on-toilet-test-behaviors-not.html"><em class="hyperlink">https://testing.googleblog.com/2014/04/testing-on-toilet-test-behaviors-not.html</em></a> and <a href="https://dannorth.net/introducing-bdd"><em class="hyperlink">https://dannorth.net/introducing-bdd</em></a>.</p><p data-type="footnote" id="ch01fn132"><sup><a href="ch12.html#ch01fn132-marker">5</a></sup>Furthermore, a <em>feature</em> (in the product sense of the word) can be expressed as a collection of behaviors.</p><p data-type="footnote" id="ch01fn134"><sup><a href="ch12.html#ch01fn134-marker">6</a></sup>These components are sometimes referred to as "arrange," "act," and "assert."</p><p data-type="footnote" id="ch01fn139"><sup><a href="ch12.html#ch01fn139-marker">7</a></sup>In many cases, it can even be useful to slightly randomize the default values returned for fields that aren’t explicitly set. This helps to ensure that two different instances won’t accidentally compare as equal, and makes it more difficult for engineers to hardcode dependencies on the defaults.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
757
clones/abseil.io/resources/swe-book/html/ch13.html
Normal file
|
@ -0,0 +1,757 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="test_doubles">
|
||||
<h1>Test Doubles</h1>
|
||||
|
||||
<p class="byline">Written by Andrew Trenk and Dillon Bly</p>
|
||||
|
||||
<p class="byline">Edited by Tom Manshreck</p>
|
||||
|
||||
<p>Unit tests are a critical tool for keeping developers productive and reducing defects in code. <a contenteditable="false" data-primary="test doubles" data-type="indexterm" id="ix_tstdbl"> </a>Although they can be easy to write for simple code, writing them becomes difficult as code becomes more complex.</p>
|
||||
|
||||
<p>For example, imagine trying to write a test for a function that sends a request to an external server and then stores the response in a database. Writing a handful of tests might be doable with some effort. But if you need to write hundreds or thousands of tests like this, your test suite will likely take hours to run, and could become flaky due to issues like random network failures or tests overwriting one another’s data.</p>
|
||||
|
||||
<p>Test doubles come in handy in such cases. A <a href="https://oreil.ly/vbpiU"><em>test double</em></a> is an object or function that can stand in for a real implementation in a test, similar to how a stunt double can stand in for an actor in a movie. <a contenteditable="false" data-primary="mocking" data-seealso="test doubles" data-type="indexterm" id="id-z2CpsgSX"> </a>The use of test doubles is often referred to as <em>mocking</em>, but we avoid that term in this chapter because, as we’ll see, that term is also used to refer to more specific aspects of test doubles.</p>
|
||||
|
||||
<p>Perhaps the most obvious type of test double is a simpler implementation of an object that behaves similarly to the real implementation, such as an in-memory database. Other types of test doubles can make it possible to validate specific details of your system, such as by making it easy to trigger a rare error condition, or ensuring a heavyweight function is called without actually executing the function’s <span class="keep-together">implementation.</span></p>
|
||||
|
||||
<p>The previous two chapters introduced the concept of <em>small tests</em> and discussed why they should comprise the majority of tests in a test suite. However, production code often doesn’t fit within the constraints of small tests due to communication across multiple processes or machines. Test doubles can be much more lightweight than real implementations, allowing you to write many small tests that execute quickly and are not flaky.</p>
|
||||
|
||||
<section data-type="sect1" id="the_impact_of_test_doubles_on_software">
|
||||
<h1>The Impact of Test Doubles on Software Development</h1>
|
||||
|
||||
<p>The use of test doubles introduces a few complications to software development that require some trade-offs to be made. <a contenteditable="false" data-primary="test doubles" data-secondary="impact on software development" data-type="indexterm" id="id-1GC5HksXh5"> </a>The concepts introduced here are discussed in more depth throughout this chapter:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Testability</dt>
|
||||
<dd>To use test doubles, a codebase needs to be designed to be <em>testable</em>—it should be possible for tests to swap out real implementations with test doubles. For example, code that calls a database needs to be flexible enough to be able to use a test double in place of a real database. If the codebase isn’t designed with testing in mind and you later decide that tests are needed, it can require a major commitment to refactor the code to support the use of test doubles.</dd>
|
||||
<dt>Applicability</dt>
|
||||
<dd>Although proper application of test doubles can provide a powerful boost to engineering velocity, their improper use can lead to tests that are brittle, complex, and less effective. These downsides are magnified when test doubles are used improperly across a large codebase, potentially resulting in major losses in productivity for engineers. In many cases, test doubles are not suitable and engineers should prefer to use real implementations instead.</dd>
|
||||
<dt>Fidelity</dt>
|
||||
<dd><em>Fidelity</em> refers to how closely the behavior of a test double resembles the behavior of the real implementation that it’s replacing. <a contenteditable="false" data-primary="fidelity" data-secondary="of test doubles" data-type="indexterm" id="id-pMCZs0SdfkhX"> </a>If the behavior of a test double significantly differs from the real implementation, tests that use the test double likely wouldn’t provide much value—for example, imagine trying to write a test with a test double for a database that ignores any data added to the database and always returns empty results. But perfect fidelity might not be feasible; test doubles often need to be vastly simpler than the real implementation in order to be suitable for use in tests. In many situations, it is appropriate to use a test double even without perfect fidelity. Unit tests that use test doubles often need to be supplemented by larger-scope tests that exercise the real implementation.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="test_doubles_at_google">
|
||||
<h1>Test Doubles at Google</h1>
|
||||
|
||||
<p>At Google, we’ve seen countless examples of the benefits to productivity and software quality<a contenteditable="false" data-primary="test doubles" data-secondary="at Google" data-type="indexterm" id="id-BjCJHos1U2"> </a> that test doubles can bring to a codebase, as well as the negative impact they can cause when used improperly. The practices we follow at Google have evolved over time based on these experiences. Historically, we had few guidelines on how to <span class="keep-together">effectively</span> use test doubles, but best practices evolved as we saw common patterns and antipatterns arise in many teams’ codebases.</p>
|
||||
|
||||
<p>One lesson we learned the hard way is the danger<a contenteditable="false" data-primary="mocking frameworks" data-secondary="over reliance on" data-type="indexterm" id="id-a1C5HzfbUo"> </a> of overusing mocking frameworks, which allow you to easily create test doubles (we will discuss mocking frameworks in more detail later in this chapter). When mocking frameworks first came into use at Google, they seemed like a hammer fit for every nail—they made it very easy to write highly focused tests against isolated pieces of code without having to worry about how to construct the dependencies of that code. It wasn’t until several years and countless tests later that we began to realize the cost of such tests: though these tests were easy to write, we suffered greatly given that they required constant effort to maintain while rarely finding bugs. The pendulum at Google has now begun swinging in the other direction, with many engineers avoiding mocking frameworks in favor of writing more realistic tests.</p>
|
||||
|
||||
<p>Even though the practices discussed in this chapter are generally agreed upon at Google, the actual application of them varies widely from team to team. This variance stems from engineers having inconsistent knowledge of these practices, inertia in an existing codebase that doesn’t conform to these practices, or teams doing what is easiest for the short term without thinking about the long-term implications.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="basic_concepts">
|
||||
<h1>Basic Concepts</h1>
|
||||
|
||||
<p>Before we dive into how to effectively use test doubles, let’s cover some of the basic concepts related to them. These build the foundation for best practices that we will discuss later in this chapter.<a contenteditable="false" data-primary="test doubles" data-secondary="example" data-type="indexterm" id="id-a1C5Hxs1Co"> </a></p>
|
||||
|
||||
<section data-type="sect2" id="an_example_test_double">
|
||||
<h2>An Example Test Double</h2>
|
||||
|
||||
<p>Imagine an ecommerce site that needs to process credit card payments. At its core, it might have something like the code shown in <a data-type="xref" href="ch13.html#example_onethree-onedot_a_credit_card_s">A credit card service</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onedot_a_credit_card_s">
|
||||
<h5><span class="label">Example 13-1. </span>A credit card service</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">class PaymentProcessor {
|
||||
private <strong>CreditCardService creditCardService</strong>;
|
||||
...
|
||||
boolean <strong>makePayment</strong>(CreditCard creditCard, Money amount) {
|
||||
if (creditCard.isExpired()) { return false; }
|
||||
boolean success =<strong>
|
||||
creditCardService</strong>.chargeCreditCard(creditCard, amount);
|
||||
return success;
|
||||
}
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>It would be infeasible to use a real credit card service in a test (imagine all the transaction fees from running the test!), but a test double could be used in its place to <em>simulate</em> the behavior of the real system. The code in <a data-type="xref" href="ch13.html#example_onethree-twodot_a_trivial_test">A trivial test double</a> shows an extremely simple test double.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-twodot_a_trivial_test">
|
||||
<h5><span class="label">Example 13-2. </span>A trivial test double</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">class TestDoubleCreditCardService implements CreditCardService {
|
||||
@Override
|
||||
public boolean chargeCreditCard(CreditCard creditCard, Money amount) {
|
||||
return true;
|
||||
}
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Although this test double doesn’t look very useful, using it in a test still allows us to test some of the logic in the <code>makePayment()</code> method. For example, in <a data-type="xref" href="ch13.html#example_onethree-threedot_using_the_tes">Using the test double</a>, we can validate that the method behaves properly when the credit card is expired because the code path that the test exercises doesn’t rely on the behavior of the credit card service.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-threedot_using_the_tes">
|
||||
<h5><span class="label">Example 13-3. </span>Using the test double</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void cardIsExpired_returnFalse() {
|
||||
boolean success = <strong>paymentProcessor</strong>.makePayment(EXPIRED_CARD, AMOUNT);
|
||||
assertThat(success).isFalse();
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>The following sections in this chapter will discuss how to make use of test doubles in more complex situations than this one.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="seams">
|
||||
<h2>Seams</h2>
|
||||
|
||||
<p>Code is said to be <a href="https://oreil.ly/yssV2"><em>testable</em></a> if it is written in a way that makes it possible to write unit tests for<a contenteditable="false" data-primary="testability" data-secondary="testable code" data-type="indexterm" id="id-AjCRs9sOIACy"> </a> the code.<a contenteditable="false" data-primary="seams" data-type="indexterm" id="id-pMCxfQsYIXCX"> </a><a contenteditable="false" data-primary="test doubles" data-secondary="seams" data-type="indexterm" id="id-0OCVI9sdIXCj"> </a> A <a href="https://oreil.ly/pFSFf"><em>seam</em></a> is a way to make code testable by allowing for the use of test doubles—it makes it possible to use different dependencies for the system under test rather than the dependencies used in a production environment.</p>
|
||||
|
||||
<p><a href="https://oreil.ly/og9p9"><em>Dependency injection</em></a> is a common technique for introducing seams.<a contenteditable="false" data-primary="dependency injection" data-secondary="introducing seams with" data-type="indexterm" id="id-pMCZskfYIXCX"> </a> In short, when a class utilizes dependency injection, any classes it needs to use (i.e., the class’s <em>dependencies</em>) are passed to it rather than instantiated directly, making it possible for these dependencies to be substituted in tests.</p>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-fourdot_dependency_inj">Dependency injection</a> shows an example of dependency injection. Rather than the constructor creating an instance of <code>CreditCardService</code>, it accepts an instance as a parameter.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-fourdot_dependency_inj">
|
||||
<h5><span class="label">Example 13-4. </span>Dependency injection</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">class PaymentProcessor {
|
||||
private CreditCardService creditCardService;
|
||||
|
||||
PaymentProcessor(<strong>CreditCardService creditCardService</strong>) {
|
||||
this.creditCardService = creditCardService;
|
||||
}
|
||||
...
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>The code that calls this constructor is responsible for creating an appropriate <code>CreditCardService</code> instance. Whereas the production code can pass in an implementation of <code>CreditCardService</code> that communicates with an external server, the test can pass in a test double, as demonstrated in <a data-type="xref" href="ch13.html#example_onethree-fivedot_passing_in_a_t">Passing in a test double</a>.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-fivedot_passing_in_a_t">
|
||||
<h5><span class="label">Example 13-5. </span>Passing in a test double</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">PaymentProcessor paymentProcessor =
|
||||
new PaymentProcessor(new <strong>TestDoubleCreditCardService</strong>());</pre>
|
||||
</div>
|
||||
|
||||
<p>To reduce boilerplate associated with manually specifying constructors, automated dependency injection frameworks can be used for constructing object graphs automatically. <a contenteditable="false" data-primary="dependency injection" data-secondary="frameworks for" data-type="indexterm" id="id-wbCoHEtwIjC9"> </a>At Google, <a href="https://github.com/google/guice">Guice</a> and <a href="https://google.github.io/dagger">Dagger</a> are automated dependency injection frameworks that are commonly used for Java code.</p>
|
||||
|
||||
<p>With dynamically typed languages such as Python or JavaScript, it is possible to dynamically replace individual functions or object methods. Dependency injection is less important in these languages because this capability makes it possible to use real implementations of dependencies in tests while only overriding functions or methods of the dependency that are unsuitable for tests.</p>
|
||||
|
||||
<p>Writing testable code requires an upfront investment. <a contenteditable="false" data-primary="testability" data-secondary="writing testable code early" data-type="indexterm" id="id-57C3HJUQIaC1"> </a>It is especially critical early in the lifetime of a codebase because the later testability is taken into account, the more difficult it is to apply to a codebase. Code written without testing in mind typically needs to be refactored or rewritten before you can add appropriate tests.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="mocking_frameworks">
|
||||
<h2>Mocking Frameworks</h2>
|
||||
|
||||
<p>A <em>mocking framework</em> is a <a contenteditable="false" data-primary="mocking frameworks" data-secondary="about" data-type="indexterm" id="id-pMCZsQsGcXCX"> </a>software <a contenteditable="false" data-primary="test doubles" data-secondary="mocking frameworks" data-type="indexterm" id="id-0OC7f9socXCj"> </a>library that makes it easier to create test doubles within tests; it allows you to replace an object with a <em>mock</em>, which is a test double whose behavior is specified inline in a test. The use of mocking frameworks reduces boilerplate because you don’t need to define a new class each time you need a test <span class="keep-together">double.</span></p>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-sixdot_mocking_framewo">Mocking frameworks</a> demonstrates the <a contenteditable="false" data-primary="Mockito" data-secondary="example of use" data-type="indexterm" id="id-0OCbsWfocXCj"> </a>use of <a href="https://site.mockito.org">Mockito</a>, a mocking framework for Java. Mockito creates a test double for <code>CreditCardService</code> and instructs it to return a specific value.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-sixdot_mocking_framewo">
|
||||
<h5><span class="label">Example 13-6. </span>Mocking frameworks</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">class PaymentProcessorTest {
|
||||
...
|
||||
PaymentProcessor paymentProcessor;
|
||||
|
||||
// Create a test double of CreditCardService with just one line of code.
|
||||
<strong>@Mock CreditCardService mockCreditCardService</strong>;
|
||||
@Before public void setUp() {
|
||||
// Pass in the test double to the system under test.
|
||||
paymentProcessor = new PaymentProcessor(<strong>mockCreditCardService</strong>);
|
||||
}
|
||||
@Test public void chargeCreditCardFails_returnFalse() {
|
||||
// Give some behavior to the test double: it will return false
|
||||
// anytime the chargeCreditCard() method is called. The usage of
|
||||
// “any()” for the method’s arguments tells the test double to
|
||||
// return false regardless of which arguments are passed.
|
||||
when(<strong>mockCreditCardService.chargeCreditCard(</strong>any(), any())
|
||||
.thenReturn(<strong>false</strong>);
|
||||
boolean success = paymentProcessor.makePayment(CREDIT_CARD, AMOUNT);
|
||||
assertThat(success).isFalse();
|
||||
}
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Mocking frameworks<a contenteditable="false" data-primary="C++" data-secondary="googlemock mocking framework" data-type="indexterm" id="id-LjC2HWc5c6Cw"> </a> exist for most <a contenteditable="false" data-primary="mocking frameworks" data-secondary="for major programming languages" data-type="indexterm" id="id-JjCXsacZczCZ"> </a>major programming languages.<a contenteditable="false" data-primary="Java" data-secondary="Mockito mocking framework for" data-type="indexterm" id="id-wbCyfDc2cjC9"> </a> At Google, we use Mockito for Java, <a href="https://github.com/google/googletest">the googlemock component of Googletest</a> for C++, and <a href="https://oreil.ly/clzvH">unittest.mock</a> for Python.<a contenteditable="false" data-primary="Python" data-secondary="unittest.mock framework for" data-type="indexterm" id="id-yoCDSpcWc2CN"> </a></p>
|
||||
|
||||
<p>Although mocking frameworks facilitate easier usage of test doubles, they come with some significant caveats given that their overuse will often make a codebase more difficult to maintain. We cover some of these problems later in this chapter.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="techniques_for_using_test_doubles">
|
||||
<h1>Techniques for Using Test Doubles</h1>
|
||||
|
||||
<p>There are three primary techniques for using test doubles.<a contenteditable="false" data-primary="test doubles" data-secondary="techniques for using" data-type="indexterm" id="ix_tstdbluse"> </a> This section presents a brief introduction to these techniques to give you a quick overview of what they are and how they differ. Later sections in this chapter go into more details on how to effectively apply them.</p>
|
||||
|
||||
<p>An engineer who is aware of the distinctions between these techniques is more likely to know the appropriate technique to use when faced with the need to use a test double.</p>
|
||||
|
||||
<section data-type="sect2" id="faking-id00042">
|
||||
<h2>Faking</h2>
|
||||
|
||||
<p>A <a href="https://oreil.ly/rymnI"><em>fake</em></a> is a lightweight implementation of an API that behaves similar <a contenteditable="false" data-primary="faking" data-type="indexterm" id="id-pMCZsQsYIvuX"> </a>to the real implementation but isn’t suitable <a contenteditable="false" data-primary="test doubles" data-secondary="techniques for using" data-tertiary="faking" data-type="indexterm" id="id-0OC7f9sdI6uj"> </a>for production; for example, an in-memory database. <a data-type="xref" href="ch13.html#example_onethree-sevendot_a_simple_fake">A simple fake</a> presents an example of faking.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-sevendot_a_simple_fake">
|
||||
<h5><span class="label">Example 13-7. </span>A simple fake</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">// Creating the fake is fast and easy.
|
||||
AuthorizationService <strong>fakeAuthorizationService</strong> =
|
||||
new FakeAuthorizationService();
|
||||
AccessManager accessManager = new AccessManager(<strong>fakeAuthorizationService</strong>):
|
||||
|
||||
// Unknown user IDs shouldn’t have access.
|
||||
assertFalse(accessManager.userHasAccess(USER_ID));
|
||||
|
||||
// The user ID should have access after it is added to
|
||||
// the authorization service.
|
||||
<strong>fakeAuthorizationService</strong>.addAuthorizedUser(new User(USER_ID));
|
||||
assertThat(accessManager.userHasAccess(USER_ID)).isTrue();</pre>
|
||||
</div>
|
||||
|
||||
<p>Using a fake is often the ideal technique when you need to use a test double, but a fake might not exist for an object you need to use in a test, and writing one can be challenging because you need to ensure that it has similar behavior to the real implementation, now and in the future.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="stubbing-id00089">
|
||||
<h2>Stubbing</h2>
|
||||
|
||||
<p><a href="https://oreil.ly/gmShS"><em>Stubbing</em></a> is the process of giving behavior to a function that otherwise has no behavior on its own—you specify to the function exactly what values to return (that is, you <em>stub</em> the return values).<a contenteditable="false" data-primary="test doubles" data-secondary="techniques for using" data-tertiary="stubbing" data-type="indexterm" id="id-LjCJfAs5c9uw"> </a><a contenteditable="false" data-primary="stubbing" data-type="indexterm" id="id-JjC4I1sZcVuZ"> </a></p>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-eightdot_stubbing">Stubbing</a> illustrates stubbing. <a contenteditable="false" data-primary="Mockito" data-secondary="stubbing example" data-type="indexterm" id="id-LjCYsVf5c9uw"> </a>The <code>when(...).thenReturn(...)</code> method calls from the Mockito mocking framework specify the behavior of the <code>lookupUser()</code> method.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-eightdot_stubbing">
|
||||
<h5><span class="label">Example 13-8. </span>Stubbing</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">// Pass in a test double that was created by a mocking framework.
|
||||
AccessManager accessManager = new AccessManager(<strong>mockAuthorizationService</strong>):
|
||||
|
||||
// The user ID shouldn’t have access if null is returned.
|
||||
when(<strong>mockAuthorizationService</strong>.lookupUser(USER_ID)).thenReturn(null);
|
||||
assertThat(accessManager.userHasAccess(USER_ID)).isFalse();
|
||||
|
||||
// The user ID should have access if a non-null value is returned.
|
||||
when(<strong>mockAuthorizationService</strong>.lookupUser(USER_ID)).thenReturn(USER);
|
||||
assertThat(accessManager.userHasAccess(USER_ID)).isTrue();</pre>
|
||||
</div>
|
||||
|
||||
<p>Stubbing is typically done <a contenteditable="false" data-primary="mocking frameworks" data-secondary="stubbing via" data-type="indexterm" id="id-JjC1HacZcVuZ"> </a>through mocking frameworks to reduce boilerplate that would otherwise be needed for manually creating new classes that hardcode return values.</p>
|
||||
|
||||
<p>Although stubbing can be a quick and simple technique to apply, it has limitations, which we’ll discuss later in this chapter.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="interaction_testing">
|
||||
<h2>Interaction Testing</h2>
|
||||
|
||||
<p><a href="https://oreil.ly/zGfFn"><em>Interaction testing</em></a> is a way to validate <em>how</em> a function is called without actually calling the implementation of the function. <a contenteditable="false" data-primary="test doubles" data-secondary="techniques for using" data-tertiary="interaction testing" data-type="indexterm" id="id-JjCwf1sYSVuZ"> </a><a contenteditable="false" data-primary="interaction testing" data-secondary="using test doubles" data-type="indexterm" id="id-wbCxIqs5Squ9"> </a>A test should fail if a function isn’t called the correct way—for example, if the function isn’t called at all, it’s called too many times, or it’s called with the wrong arguments.</p>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-ninedot_interaction_te">Interaction testing</a> presents an instance of interaction testing. The <code>verify(...)</code> method from the Mockito mocking framework is used to validate that <code>lookupUser()</code> is called as expected.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-ninedot_interaction_te">
|
||||
<h5><span class="label">Example 13-9. </span>Interaction testing</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">// Pass in a test double that was created by a mocking framework.
|
||||
AccessManager accessManager = new AccessManager(<strong>mockAuthorizationService</strong>);
|
||||
accessManager.userHasAccess(USER_ID);
|
||||
|
||||
// The test will fail if accessManager.userHasAccess(USER_ID) didn’t call
|
||||
// mockAuthorizationService.lookupUser(USER_ID).
|
||||
verify(<strong>mockAuthorizationService</strong>).lookupUser(USER_ID);</pre>
|
||||
</div>
|
||||
|
||||
<p>Similar to stubbing, interaction testing is typically done through mocking frameworks.<a contenteditable="false" data-primary="mocking frameworks" data-secondary="interaction testing done via" data-type="indexterm" id="id-wbCoHDc5Squ9"> </a> This reduces boilerplate compared to manually creating new classes that contain code to keep track of how often a function is called and which arguments were passed in.</p>
|
||||
|
||||
<p>Interaction testing is sometimes called <a href="https://oreil.ly/IfMoR"><em>mocking</em></a>. We avoid this terminology in this chapter because it can be confused with mocking frameworks, which can be used for stubbing as well as for interaction testing.<a contenteditable="false" data-primary="mocking" data-secondary="interaction testing and" data-type="indexterm" id="id-57CmswSDSYu1"> </a></p>
|
||||
|
||||
<p>As discussed later in this chapter, interaction testing is useful in certain situations but should be avoided when possible because overuse can easily result in brittle tests.<a contenteditable="false" data-primary="test doubles" data-secondary="techniques for using" data-startref="ix_tstdbluse" data-type="indexterm" id="id-57C3HmTDSYu1"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="real_implementations">
|
||||
<h1>Real Implementations</h1>
|
||||
|
||||
<p>Although test doubles can be <a contenteditable="false" data-primary="test doubles" data-secondary="using real implementations instead of" data-type="indexterm" id="ix_tstdblrl"> </a>invaluable testing tools, our first choice for tests is to use the <a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-type="indexterm" id="ix_realimp"> </a>real implementations of the system under test’s dependencies; that is, the same implementations that are used in production code. Tests have higher fidelity when they execute code as it will be executed in production, and using real implementations helps accomplish this.</p>
|
||||
|
||||
<p>At Google, the preference for real implementations developed over time as we saw that overuse of mocking frameworks had a tendency to pollute tests with repetitive code that got out of sync with the real implementation and made refactoring difficult. We’ll look at this topic in more detail later in this chapter.</p>
|
||||
|
||||
<p>Preferring real implementations <a contenteditable="false" data-primary="classical testing" data-type="indexterm" id="id-AjCQH3IZFN"> </a>in tests is known as <a href="https://oreil.ly/OWw7h"><em>classical testing</em></a>. There is also a style of testing known as <em>mockist testing</em>, in which the preference is to use mocking frameworks instead of real implementations. <a contenteditable="false" data-primary="mockist testing" data-type="indexterm" id="id-LjCLIYI8FJ"> </a>Even though some people in the software industry practice mockist testing (including the <a href="https://oreil.ly/_QWy7">creators of the first mocking frameworks</a>), at Google, we have found that this style of testing is difficult to scale. It requires engineers to follow <a href="http://jmock.org/oopsla2004.pdf">strict guidelines when designing the system under test</a>, and the default behavior of most engineers at Google has been to write code in a way that is more suitable for the classical testing style.</p>
|
||||
|
||||
<section data-type="sect2" id="prefer_realism_over_isolation">
|
||||
<h2>Prefer Realism Over Isolation</h2>
|
||||
|
||||
<p>Using real implementations for dependencies makes the system under test more realistic<a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-secondary="preferring realism over isolation" data-type="indexterm" id="id-0OCnH9soc0Fj"> </a> given that<a contenteditable="false" data-primary="test doubles" data-secondary="using real implementations instead of" data-tertiary="preferring realism over isolation" data-type="indexterm" id="id-LjCYsAs5c1Fw"> </a> all code in these real implementations will be executed in the test. In contrast, a test that utilizes test doubles isolates the system under test from its dependencies so that the test does not execute code in the dependencies of the system under test.</p>
|
||||
|
||||
<p>We prefer realistic tests because they give more confidence that the system under test is working properly. If unit tests rely too much on test doubles, an engineer might need to run integration tests or manually verify that their feature is working as expected in order to gain this same level of confidence. Carrying out these extra tasks can slow down development and can even allow bugs to slip through if engineers skip these tasks entirely when they are too time consuming to carry out compared to running unit tests.</p>
|
||||
|
||||
<p>Replacing all dependencies of a class with test doubles<a contenteditable="false" data-primary="dependencies" data-secondary="replacing all in a class with test doubles" data-type="indexterm" id="id-JjC1HoIZcoFZ"> </a> arbitrarily isolates the system under test to the implementation that the author happens to put directly into the class and excludes implementation that happens to be in different classes. However, a good test should be independent of implementation—it should be written in terms of the API being tested rather than in terms of how the implementation is structured.</p>
|
||||
|
||||
<p>Using real implementations can cause your test to fail if there is a bug in the real implementation. This is good! You <em>want</em> your tests to fail in such cases because it indicates that your code won’t work properly in production. <a contenteditable="false" data-primary="failures" data-secondary="bug in real implementation causing cascade of test failures" data-type="indexterm" id="id-kJC0spcNcAFx"> </a><a contenteditable="false" data-primary="bugs" data-secondary="in real implementations causing cascade of test failures" data-type="indexterm" id="id-57CWfdcZcOF1"> </a>Sometimes, a bug in a real implementation can cause a cascade of test failures because other tests that use the real implementation might fail, too. But with good developer tools, such as a <span class="keep-together">Continuous</span> Integration (CI) system, it is usually easy to track down the change that caused the failure.</p>
|
||||
|
||||
<aside data-type="sidebar" id="commercial_atdonotmock">
|
||||
<h5>Case Study: @DoNotMock</h5>
|
||||
|
||||
<p>At Google, we’ve seen enough tests that over-rely on mocking frameworks to motivate<a contenteditable="false" data-primary="@DoNotMock annotation" data-type="indexterm" id="id-57C3H0sDS4cWFM"> </a> the creation of the <code>@DoNotMock</code> annotation in Java, which is available as part of the <a href="https://github.com/google/error-prone">ErrorProne</a> static analysis tool.<a contenteditable="false" data-primary="Error Prone tool (Java)" data-secondary="@DoNotMock annotation" data-type="indexterm" id="id-YjCVIVsmSqcdFw"> </a> This annotation is a way for <a contenteditable="false" data-primary="APIs" data-secondary="declaring a type should not be mocked" data-type="indexterm" id="id-EjCNc0sdSxcdFk"> </a>API owners to declare, "this type should not be mocked because better alternatives exist.”</p>
|
||||
|
||||
<p>If an engineer attempts to use a mocking framework to create an instance of a class or interface that has been annotated as <code>@DoNotMock</code>, as demonstrated in <a data-type="xref" href="ch13.html#example_onethree-onezerodot_the_commerc">The @DoNotMock annotation</a>, they will see an error directing them to use a more suitable test strategy, such as a real implementation or a fake. This annotation is most commonly used for value objects that are simple enough to use as-is, as well as for APIs that have well-engineered fakes available.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onezerodot_the_commerc">
|
||||
<h5><span class="label">Example 13-10. </span>The @DoNotMock annotation</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@DoNotMock("Use SimpleQuery.create() instead of mocking.")
|
||||
public abstract class Query {
|
||||
public abstract String getQueryValue();
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>Why would an API owner care? In short, it severely constrains the API owner’s ability to make changes to their implementation over time. As we’ll explore later in the chapter, every time a mocking framework is used for stubbing or interaction testing, it duplicates behavior provided by the API.</p>
|
||||
|
||||
<p>When the API owner wants to change their API, they might find that it has been mocked thousands or even tens of thousands of times throughout Google’s codebase! These test doubles are very likely to exhibit behavior that violates the API contract of the type being mocked—for instance, returning null for a method that can never return null. Had the tests used the real implementation or a fake, the API owner could make changes to their implementation without first fixing thousands of flawed tests.</p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="how_to_decide_when_to_use_a_real_implem">
|
||||
<h2>How to Decide When to Use a Real Implementation</h2>
|
||||
|
||||
<p>A real implementation is preferred if it is fast, deterministic, and has simple dependencies. <a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-secondary="deciding when to use real implementations" data-type="indexterm" id="ix_realimpwh"> </a><a contenteditable="false" data-primary="test doubles" data-secondary="using real implementations instead of" data-tertiary="deciding when to use real implementation" data-type="indexterm" id="ix_tstdblrlwh"> </a>For example, a real implementation should be used for a <a href="https://oreil.ly/UZiXP"><em>value object</em></a>. Examples include an amount of money, a date, a geographical address, or a collection class such as a list or a map.</p>
|
||||
|
||||
<p class="pagebreak-before">However, for more complex code, using a real implementation often isn’t feasible. There might not be an exact answer on when to use a real implementation or a test double given that there are trade-offs to be made, so you need to take the following considerations into account.</p>
|
||||
|
||||
<section data-type="sect3" id="execution_time">
|
||||
<h3>Execution time</h3>
|
||||
|
||||
<p>One of the most important qualities of unit tests is that they<a contenteditable="false" data-primary="unit testing" data-secondary="execution time for tests" data-type="indexterm" id="id-kJCRHVs8IoSjFR"> </a> should be fast—you want to<a contenteditable="false" data-primary="execution time for tests" data-type="indexterm" id="id-57Cms0sQI2SWFM"> </a> be able to continually <a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-secondary="deciding when to use real implementations" data-tertiary="execution time" data-type="indexterm" id="id-yoC2f9s2IbSYFZ"> </a>run them during development so that you can get quick feedback on whether your code is working (and you also want them to finish quickly when run in a CI system). As a result, a test double can be very useful when the real implementation is slow.</p>
|
||||
|
||||
<p>How slow is too slow for a unit test? If a real implementation added one millisecond to the running time of each individual test case, few people would classify it as slow. But what if it added 10 milliseconds, 100 milliseconds, 1 second, and so on?</p>
|
||||
|
||||
<p>There is no exact answer here—it can depend on whether engineers feel a loss in productivity, and how many tests are using the real implementation (one second extra per test case may be reasonable if there are five test cases, but not if there are 500). For borderline situations, it is often simpler to use a real implementation until it becomes too slow to use, at which point the tests can be updated to use a test double instead.</p>
|
||||
|
||||
<p>Parellelization of tests<a contenteditable="false" data-primary="parallelization of tests" data-type="indexterm" id="id-MjCOH0cwIDSBF6"> </a> can also help reduce execution time. At Google, our test infrastructure makes it trivial to split up tests in a test suite to be executed across multiple servers. This increases the cost of CPU time, but it can provide a large savings in developer time. We discuss this more in <a data-type="xref" href="ch18.html#build_systems_and_build_philosophy">Build Systems and Build Philosophy</a>.</p>
|
||||
|
||||
<p>Another trade-off to be aware of: using a real implementation can result in increased build times given that the tests need to build the real implementation as well as all of its dependencies. Using a highly scalable build system like <a href="https://bazel.build">Bazel</a> can help because it caches unchanged build artifacts.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="determinism">
|
||||
<h3>Determinism</h3>
|
||||
|
||||
<p>A test is <a href="https://oreil.ly/brxJl"><em>deterministic</em></a> if, for a given version of the <a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-secondary="deciding when to use real implementations" data-tertiary="determinism in tests" data-type="indexterm" id="id-yoCQs9sWcbSYFZ"> </a>system under test, running the test always results in the same outcome; that is, the test either always passes or always fails. <a contenteditable="false" data-primary="determinism in tests" data-type="indexterm" id="id-MjCjfyszcDSBF6"> </a>In contrast, a test is <a href="https://oreil.ly/5pG0f"><em>nondeterministic</em></a> if its outcome can change, even if the system under test remains unchanged.<a contenteditable="false" data-primary="nondeterministic behavior in tests" data-type="indexterm" id="id-EjCNc0sQc0SdFk"> </a> </p>
|
||||
|
||||
<p><a href="https://oreil.ly/71OFU">Nondeterminism in tests</a> can lead to flakiness—tests can occasionally fail even when there are no changes to the system under test.<a contenteditable="false" data-primary="flaky tests" data-type="indexterm" id="id-MjCxsvfzcDSBF6"> </a> As discussed in <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>, flakiness harms the health of a test suite if developers start to distrust the results of the test and ignore failures. If use of a real implementation rarely causes flakiness, it might not warrant a response, because there is little <span class="keep-together">disruption</span> to engineers. But if flakiness happens often, it might be time to replace a real implementation with a test double because doing so will improve the fidelity of the test.</p>
|
||||
|
||||
<p>A real implementation can be much more complex compared to a test double, which increases the likelihood that it will be nondeterministic. For example, a real implementation that utilizes multithreading might occasionally cause a test to fail if the output of the system under test differs depending on the order in which the threads are executed.</p>
|
||||
|
||||
<p>A common cause of nondeterminism is code that is not <a href="https://oreil.ly/aes__">hermetic</a>; that is, it has dependencies<a contenteditable="false" data-primary="dependencies" data-secondary="external, causing nondeterminism in tests" data-type="indexterm" id="id-EjCDsZcQc0SdFk"> </a> on external services that are outside the control of a test.<a contenteditable="false" data-primary="hermetic code, nondeterminism and" data-type="indexterm" id="id-ZjCxfBcGcjSXFG"> </a> For example, a test that tries to read the contents of a web page from an HTTP server might fail if the server is overloaded or if the web page contents change. Instead, a test double should be used to prevent the test from depending on an external server. If using a test double is not feasible, another option is to use a hermetic instance of a server, which has its life cycle controlled by the test. Hermetic instances are discussed in more detail in the next chapter.</p>
|
||||
|
||||
<p>Another example of nondeterminism is code that relies on the system clock given that the output of the system under test can differ depending on the current time. Instead of relying on the system clock, a test can use a test double that hardcodes a specific time.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="dependency_construction">
|
||||
<h3>Dependency construction</h3>
|
||||
|
||||
<p>When using a real<a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-secondary="deciding when to use real implementations" data-tertiary="dependency construction" data-type="indexterm" id="id-yoC0H9snSbSYFZ"> </a> implementation, you need to construct all of its dependencies.<a contenteditable="false" data-primary="dependencies" data-secondary="construction when using real implementations in tests" data-type="indexterm" id="id-MjCxsysjSDSBF6"> </a> For example, an object needs its entire dependency tree to be constructed: all objects that it depends on, all objects that these dependent objects depend on, and so on. A test double often has no dependencies, so constructing a test double can be much simpler compared to constructing a real implementation.</p>
|
||||
|
||||
<p>As an extreme example, imagine trying to create the object in the code snippet that follows in a test. It would be time consuming to determine how to construct each individual object. Tests will also require constant maintenance because they need to be updated when the signature of these objects’ constructors is modified:</p>
|
||||
|
||||
<pre data-type="programlisting">Foo foo = new Foo(new A(new B(new C()), new D()), new E(), ..., new Z());</pre>
|
||||
|
||||
<p>It can be tempting to instead use a test double because constructing one can be trivial. For example, this is all it takes to construct a test double when using the Mockito mocking framework:</p>
|
||||
|
||||
<pre data-type="programlisting">@Mock Foo mockFoo;</pre>
|
||||
|
||||
<p class="pagebreak-before">Although creating this test double is much simpler, there are significant benefits to using the real implementation, as discussed earlier in this section. There are also often significant downsides to overusing test doubles in this way, which we look at later in this chapter. So, a trade-off needs to be made when considering whether to use a real implementation or a test double.</p>
|
||||
|
||||
<p>Rather than manually constructing the object in tests, the ideal solution is to use the same object construction code that is used in the production code, such as a factory method or automated dependency injection. To support the use case for tests, the object construction code needs to be flexible enough to be able to use test doubles rather than hardcoding the<a contenteditable="false" data-primary="test doubles" data-secondary="using real implementations instead of" data-startref="ix_tstdblrlwh" data-tertiary="deciding when to use real implementation" data-type="indexterm" id="id-6WCGHBtvSgSwFq"> </a> implementations<a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-secondary="deciding when to use real implementations" data-startref="ix_realimpwh" data-type="indexterm" id="id-gAC3s6t7S6SAF4"> </a> that will be used<a contenteditable="false" data-primary="real implementations, using instead of test doubles" data-startref="ix_realimp" data-type="indexterm" id="id-bBCvfEtWSySBF7"> </a> for production.<a contenteditable="false" data-primary="test doubles" data-secondary="using real implementations instead of" data-startref="ix_tstdblrl" data-type="indexterm" id="id-mkCjIRtXSmS0FV"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="faking-id00043">
|
||||
<h1>Faking</h1>
|
||||
|
||||
<p>If using a real implementation is not feasible within a test, the best option is often to use a fake in its place. <a contenteditable="false" data-primary="test doubles" data-secondary="faking" data-type="indexterm" id="ix_tstdblfak"> </a><a contenteditable="false" data-primary="faking" data-type="indexterm" id="ix_fake"> </a>A fake is preferred over other test double techniques because it behaves similarly to the real implementation: the system under test shouldn’t even be able to tell whether it is interacting with a real implementation or a fake. <a data-type="xref" href="ch13.html#example_onethree-oneonedot_a_fake_file">A fake file system</a> illustrates a fake file system.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-oneonedot_a_fake_file">
|
||||
<h5><span class="label">Example 13-11. </span>A fake file system</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">// This fake implements the FileSystem interface. This interface is also
|
||||
// used by the real implementation.
|
||||
public class <strong>FakeFileSystem</strong> implements <strong>FileSystem</strong> {
|
||||
// Stores a map of file name to file contents. The files are stored in
|
||||
// memory instead of on disk since tests shouldn’t need to do disk I/O.
|
||||
private Map<String, String> <strong>files</strong> = new HashMap<>();
|
||||
@Override
|
||||
public void <strong>writeFile</strong>(String fileName, String contents) {
|
||||
// Add the file name and contents to the map.
|
||||
<strong>files</strong>.add(fileName, contents);
|
||||
}
|
||||
@Override
|
||||
public String <strong>readFile</strong>(String fileName) {
|
||||
String contents = <strong>files</strong>.get(fileName);
|
||||
// The real implementation will throw this exception if the
|
||||
// file isn’t found, so the fake must throw it too.
|
||||
if (contents == null) { throw new FileNotFoundException(fileName); }
|
||||
return contents;
|
||||
}
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect2" id="why_are_fakes_importantquestion_mark">
|
||||
<h2 class="less_space">Why Are Fakes Important?</h2>
|
||||
|
||||
<p>Fakes can be a powerful tool for testing: they <a contenteditable="false" data-primary="faking" data-secondary="importance of fakes" data-type="indexterm" id="id-0OCnH9sdIMij"> </a>execute quickly and allow you to effectively test your code without the drawbacks of using real implementations.</p>
|
||||
|
||||
<p>A single <a contenteditable="false" data-primary="APIs" data-secondary="faking" data-type="indexterm" id="id-LjC2HVfGImiw"> </a>fake has the power to radically improve the testing experience of an API. If you scale that to a large number of fakes for all sorts of APIs, fakes can provide an enormous boost to engineering velocity across a software organization.</p>
|
||||
|
||||
<p>At the other end of the spectrum, in a software organization where fakes are rare, velocity will be slower because engineers can end up struggling with using real implementations that lead to slow and flaky tests. Or engineers might resort to other test double techniques such as stubbing or interaction testing, which, as we’ll examine later in this chapter, can result in tests that are unclear, brittle, and less effective.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="when_should_fakes_be_writtenquestion_ma">
|
||||
<h2>When Should Fakes Be Written?</h2>
|
||||
|
||||
<p>A fake requires more effort and more domain experience to create because it needs to behave similarly to the real implementation. <a contenteditable="false" data-primary="faking" data-secondary="when to write fakes" data-type="indexterm" id="id-LjC2HAs5cmiw"> </a>A fake also requires maintenance: whenever the behavior of the real implementation changes, the fake must also be updated to match this behavior. Because of this, the team that owns the real implementation should write and maintain a fake.</p>
|
||||
|
||||
<p>If a team is considering writing a fake, a trade-off needs to be made on whether the productivity improvements that will result from the use of the fake outweigh the costs of writing and maintaining it. If there are only a handful of users, it might not be worth their time, whereas if there are hundreds of users, it can result in an obvious productivity improvement.</p>
|
||||
|
||||
<p>To reduce the number of fakes that need to be maintained, a fake should typically be created only at the root of the code that isn’t feasible for use in tests. For example, if a database can’t be used in tests, a fake should exist for the database API itself rather than for each class that calls the database API.</p>
|
||||
|
||||
<p>Maintaining a fake can be burdensome if its implementation needs to be duplicated across programming languages, such as for a service that has client libraries that allow the service to be invoked from different languages. One solution for this case is to create a single fake service implementation and have tests configure the client libraries to send requests to this fake service. This approach is more heavyweight compared to having the fake written entirely in memory because it requires the test to communicate across processes. However, it can be a reasonable trade-off to make, as long as the tests can still execute quickly.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="the_fidelity_of_fakes">
|
||||
<h2>The Fidelity of Fakes</h2>
|
||||
|
||||
<p>Perhaps the most important concept surrounding<a contenteditable="false" data-primary="fidelity" data-secondary="of fakes" data-type="indexterm" id="id-JjC1H1sYSZiZ"> </a> the creation of fakes is <em>fidelity</em>; in other words, how <a contenteditable="false" data-primary="faking" data-secondary="fidelity of fakes" data-type="indexterm" id="id-kJCpfVsxSnix"> </a>closely the behavior of a fake matches the behavior of the real implementation. If the behavior of a fake doesn’t match the behavior of the real implementation, a test using that fake is not useful—a test might pass when the fake is used, but this same code path might not work properly in the real implementation.</p>
|
||||
|
||||
<p>Perfect fidelity is not always feasible. After all, the fake was necessary because the real implementation wasn’t suitable in one way or another. For example, a fake database would usually not have fidelity to a real database in terms of hard drive storage because the fake would store everything in memory.</p>
|
||||
|
||||
<p>Primarily, however, a fake should maintain fidelity to the API contracts of the real implementation. For any given input to an API, a fake should return the same output and perform the same state changes of its corresponding real implementation. For example, for a real implementation of <code>database.save(itemId)</code>, if an item is successfully saved when its ID does not yet exist but an error is produced when the ID already exists, the fake must conform to this same behavior.</p>
|
||||
|
||||
<p>One way to think about this is that the fake must have perfect fidelity to the real implementation, but <em>only from the perspective of the test</em>. For example, a fake for a hashing API doesn’t need to guarantee that the hash value for a given input is exactly the same as the hash value that is generated by the real implementation—tests likely don’t care about the specific hash value, only that the hash value is unique for a given input. If the contract of the hashing API doesn’t make guarantees of what specific hash values will be returned, the fake is still conforming to the contract even if it doesn’t have perfect fidelity to the real implementation.</p>
|
||||
|
||||
<p>Other examples where perfect fidelity typically might not be useful for fakes include latency and resource consumption. However, a fake cannot be used if you need to explicitly test for these constraints (e.g., a performance test that verifies the latency of a function call), so you would need to resort to other mechanisms, such as by using a real implementation instead of a fake.</p>
|
||||
|
||||
<p>A fake might not need to have 100% of the functionality of its corresponding real implementation, especially if such behavior is not needed by most tests (e.g., error handling code for rare edge cases). It is best to have the fake fail fast in this case; for example, raise an error if an unsupported code path is executed. This failure communicates to the engineer that the fake is not appropriate in this situation.</p>
|
||||
</section>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect2" id="fakes_should_be_tested">
|
||||
<h2 class="less_space">Fakes Should Be Tested</h2>
|
||||
|
||||
<p>A fake must have its <em>own</em> tests to ensure that it conforms to the API of its corresponding real implementation. <a contenteditable="false" data-primary="faking" data-secondary="testing fakes" data-type="indexterm" id="id-kJC0sVsGTnix"> </a><a contenteditable="false" data-primary="testing" data-secondary="tests for fakes" data-type="indexterm" id="id-57CWf0sbT8i1"> </a>A fake without tests might initially provide realistic behavior, but without tests, this behavior can diverge over time as the real implementation evolves.</p>
|
||||
|
||||
<p>One approach to writing tests for fakes involves writing tests against the API’s public<a contenteditable="false" data-primary="contract fakes" data-type="indexterm" id="id-kJCRHxfGTnix"> </a> interface and running those tests against both the real implementation and the fake (these are known as <a href="https://oreil.ly/yuVlX"><em>contract tests</em></a>). The tests that run against the real implementation will likely be slower, but their downside is minimized because they need to be run only by the owners of the fake.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="what_to_do_if_a_fake_is_not_available">
|
||||
<h2>What to Do If a Fake Is Not Available</h2>
|
||||
|
||||
<p>If a fake is not available, first ask the owners of the API to create one.<a contenteditable="false" data-primary="faking" data-secondary="when fakes are not available" data-type="indexterm" id="id-kJCRHVsktnix"> </a> The owners might not be familiar with the concept of fakes, or they might not realize the benefit they provide to users of an API.</p>
|
||||
|
||||
<p>If the owners of an API are unwilling or unable to create a fake, you might be able to write your own. One way to do this is to wrap all calls to the API in a single class and then create a fake version of the class that doesn’t talk to the API. Doing this can also be much simpler than creating a fake for the entire API because often you’ll need to use only a subset of the API’s behavior anyway. At Google, some teams have even contributed their fake to the owners of the API, which has allowed other teams to benefit from the fake.</p>
|
||||
|
||||
<p>Finally, you could decide to settle on using a real implementation (and deal with the trade-offs of real implementations that are mentioned earlier in this chapter), or resort to other test double techniques (and deal with the trade-offs that we will mention later in this chapter).</p>
|
||||
|
||||
<p>In some cases, you can think of a fake as an optimization: if tests are too slow using a real implementation, you can create a fake to make them run faster. But if the speedup from a fake doesn’t outweigh the work it would take to create and maintain the fake, it would be better<a contenteditable="false" data-primary="test doubles" data-secondary="faking" data-startref="ix_tstdblfak" data-type="indexterm" id="id-MjCOH0cDtpi9"> </a> to stick with<a contenteditable="false" data-primary="faking" data-startref="ix_fake" data-type="indexterm" id="id-YjC6sycYtOip"> </a> using the real implementation.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stubbing-id00091">
|
||||
<h1>Stubbing</h1>
|
||||
|
||||
<p>As discussed earlier in this chapter, stubbing<a contenteditable="false" data-primary="test doubles" data-secondary="stubbing" data-type="indexterm" id="ix_tstdblstb"> </a> is a way <a contenteditable="false" data-primary="stubbing" data-type="indexterm" id="ix_stub"> </a>for a test to hardcode behavior for a function that otherwise has no behavior on its own. It is often a quick and easy way to replace a real implementation in a test. For example, the code in <a data-type="xref" href="ch13.html#example_onethree-onetwodot_using_stubbi">Using stubbing to simulate responses</a> uses stubbing to simulate the response from a credit card server.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onetwodot_using_stubbi">
|
||||
<h5><span class="label">Example 13-12. </span>Using stubbing to simulate responses</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void getTransactionCount() {
|
||||
transactionCounter = new TransactionCounter(<strong>mockCreditCardServer</strong>);
|
||||
// Use stubbing to return three transactions.
|
||||
when(<strong>mockCreditCardServer</strong>.getTransactions()).thenReturn(
|
||||
newList(TRANSACTION_1, TRANSACTION_2, TRANSACTION_3));
|
||||
assertThat(transactionCounter.getTransactionCount()).isEqualTo(3);
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<section data-type="sect2" id="the_dangers_of_overusing_stubbing">
|
||||
<h2>The Dangers of Overusing Stubbing</h2>
|
||||
|
||||
<p>Because stubbing is so easy to apply in tests, it can be tempting to use this technique anytime it’s not trivial to use a real implementation.<a contenteditable="false" data-primary="stubbing" data-secondary="dangers of overusing" data-type="indexterm" id="id-LjC2HAsGIBHw"> </a> However, overuse of stubbing can result in major losses in productivity for engineers who need to maintain these tests.</p>
|
||||
|
||||
<section data-type="sect3" id="tests_become_unclear">
|
||||
<h3>Tests become unclear</h3>
|
||||
|
||||
<p>Stubbing involves writing extra code to define the behavior of the functions being stubbed.<a contenteditable="false" data-primary="tests" data-secondary="becoming unclear with overuse of stubbing" data-type="indexterm" id="id-wbCoHqs1fEI4H4"> </a> Having this extra code detracts from the intent of the test, and this code can be difficult to understand if you’re not familiar with the implementation of the system under test.</p>
|
||||
|
||||
<p>A key sign that stubbing isn’t appropriate for a test is if you find yourself mentally stepping through the system under test in order to understand why certain functions in the test are stubbed.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="tests_become_brittle">
|
||||
<h3>Tests become brittle</h3>
|
||||
|
||||
<p>Stubbing leaks<a contenteditable="false" data-primary="tests" data-secondary="becoming brittle with overuse of stubbing" data-type="indexterm" id="id-kJCRHVs8IMImHR"> </a> implementation details of your code into your test. <a contenteditable="false" data-primary="brittle tests" data-secondary="with overuse of stubbing" data-type="indexterm" id="id-57Cms0sQIWINHM"> </a>When implementation details in your production code change, you’ll need to update your tests to reflect these changes. Ideally, a good test should need to change only if user-facing behavior of an API changes; it should remain unaffected by changes to the API’s implementation.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="tests_become_less_effective">
|
||||
<h3>Tests become less effective</h3>
|
||||
|
||||
<p>With stubbing, there is no way to<a contenteditable="false" data-primary="tests" data-secondary="becoming less effective with overuse of stubbing" data-type="indexterm" id="id-57C3H0sZcWINHM"> </a> ensure the function being stubbed behaves like the real implementation, such as in a statement like that shown in the following snippet that hardcodes part of the contract of the <code>add()</code> method (<em>“If 1 and 2 are passed in, 3 will be returned”</em>):</p>
|
||||
|
||||
<pre data-type="programlisting">when(stubCalculator.add(1, 2)).thenReturn(3);</pre>
|
||||
|
||||
<p>Stubbing is a poor choice if the system under test depends on the real implementation’s contract because you will be forced to duplicate the details of the contract, and there is no way to guarantee that the contract is correct (i.e., that the stubbed function has fidelity to the real implementation).</p>
|
||||
|
||||
<p>Additionally, with stubbing there is no way to store state, which can make it difficult to test certain aspects of your code. For example, if you call <code>database.save(item)</code> on either a real implementation or a fake, you might be able to retrieve the item by calling <code>database.get(item.id())</code> given that both of these calls are accessing internal state, but with stubbing, there is no way to do this.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="an_example_of_overusing_stubbing">
|
||||
<h3>An example of overusing stubbing</h3>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-onethreedot_overuse_of">Overuse of stubbing</a> illustrates a test that<a contenteditable="false" data-primary="tests" data-secondary="overusing stubbing, example of" data-type="indexterm" id="id-MjCxsysjSzIoH6"> </a> overuses stubbing.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onethreedot_overuse_of">
|
||||
<h5><span class="label">Example 13-13. </span>Overuse of stubbing</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void creditCardIsCharged() {
|
||||
// Pass in test doubles that were created by a mocking framework.
|
||||
<strong>paymentProcessor</strong> =
|
||||
new PaymentProcessor(<strong>mockCreditCardServer</strong>, <strong>mockTransactionProcessor</strong>);
|
||||
// Set up stubbing for these test doubles.
|
||||
when(<strong>mockCreditCardServer</strong>.isServerAvailable()).thenReturn(true);
|
||||
when(<strong>mockTransactionProcessor</strong>.beginTransaction()).thenReturn(transaction);
|
||||
when(<strong>mockCreditCardServer</strong>.initTransaction(transaction)).thenReturn(true);
|
||||
when(<strong>mockCreditCardServer</strong>.pay(transaction, creditCard, 500))
|
||||
.thenReturn(false);
|
||||
when(<strong>mockTransactionProcessor</strong>.endTransaction()).thenReturn(true);
|
||||
// Call the system under test.
|
||||
<strong>paymentProcessor</strong>.processPayment(creditCard, Money.dollars(500));
|
||||
// There is no way to tell if the pay() method actually carried out the
|
||||
// transaction, so the only thing the test can do is verify that the
|
||||
// pay() method was called.
|
||||
verify(<strong>mockCreditCardServer</strong>).pay(transaction, creditCard, 500);
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-onefourdot_refactoring">Refactoring a test to avoid stubbing</a> rewrites the same test but avoids using stubbing. Notice how the test is shorter and that implementation details (such as how the transaction processor is used) are not exposed in the test. <a contenteditable="false" data-primary="tests" data-secondary="refactoring to avoid stubbing" data-type="indexterm" id="id-EjCDsLIdSoIYHk"> </a>No special setup is needed because the credit card server knows how to behave.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onefourdot_refactoring">
|
||||
<h5><span class="label">Example 13-14. </span>Refactoring a test to avoid stubbing</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void creditCardIsCharged() {
|
||||
<strong>paymentProcessor</strong> =
|
||||
new PaymentProcessor(<strong>creditCardServer</strong>, <strong>transactionProcessor</strong>);
|
||||
// Call the system under test.
|
||||
<strong>paymentProcessor</strong>.processPayment(creditCard, Money.dollars(500));
|
||||
// Query the credit card server state to see if the payment went through.
|
||||
assertThat(<strong>creditCardServer</strong>.getMostRecentCharge(creditCard))
|
||||
.isEqualTo(500);
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>We obviously don’t want such a test to talk to an external credit card server, so a fake credit card server would be more suitable. If a fake isn’t available, another option is to use a real implementation that talks to a hermetic credit card server, although this will increase the execution time of the tests. (We explore hermetic servers in the next chapter.)</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="when_is_stubbing_appropriatequestion_ma">
|
||||
<h2>When Is Stubbing Appropriate?</h2>
|
||||
|
||||
<p>Rather than a catch-all replacement for a real implementation, stubbing is appropriate <a contenteditable="false" data-primary="stubbing" data-secondary="appropriate use of" data-type="indexterm" id="id-JjC1H1sZc4HZ"> </a>when you need a function to return a specific value to get the system under test into a certain state, such as <a data-type="xref" href="ch13.html#example_onethree-onetwodot_using_stubbi">Using stubbing to simulate responses</a> that requires the system under test to return a non-empty list of transactions. Because a function’s behavior is defined inline in the test, stubbing can simulate a wide variety of return values or errors that might not be possible to trigger from a real implementation or a fake.</p>
|
||||
|
||||
<p>To ensure its purpose is clear, each stubbed function should have a direct relationship with the test’s assertions.<a contenteditable="false" data-primary="assertions" data-secondary="stubbed functions having direct relationship with" data-type="indexterm" id="id-wbCoHJf2czH9"> </a> As a result, a test typically should stub out a small number of functions because stubbing out many functions can lead to tests that are less clear. A test that requires many functions to be stubbed can be a sign that stubbing is being overused, or that the system under test is too complex and should be refactored.</p>
|
||||
|
||||
<p>Note that even when stubbing is appropriate, real implementations or fakes are still preferred because they don’t expose implementation details and they give you more guarantees about the correctness of the code compared to stubbing. But stubbing can be a reasonable technique to use, as long as its usage is constrained so that tests don’t become overly complex.<a contenteditable="false" data-primary="test doubles" data-secondary="stubbing" data-startref="ix_tstdblstb" data-type="indexterm" id="id-kJCRHOINcXHx"> </a><a contenteditable="false" data-primary="stubbing" data-startref="ix_stub" data-type="indexterm" id="id-57CmsZIZc9H1"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="interaction_testing-id00047">
|
||||
<h1>Interaction Testing</h1>
|
||||
|
||||
<p>As discussed earlier in this chapter, interaction testing is a way to validate how a function<a contenteditable="false" data-primary="test doubles" data-secondary="interaction testing" data-type="indexterm" id="ix_tstdblint"> </a> is called without <a contenteditable="false" data-primary="interaction testing" data-type="indexterm" id="ix_inttst"> </a>actually calling the implementation of the function.</p>
|
||||
|
||||
<p>Mocking frameworks make it easy to perform interaction testing. However, to keep tests useful, readable, and resilient to change, it’s important to perform interaction testing only when necessary.</p>
|
||||
|
||||
<section data-type="sect2" id="prefer_state_testing_over_interaction_t">
|
||||
<h2>Prefer State Testing Over Interaction Testing</h2>
|
||||
|
||||
<p>In contrast to interaction<a contenteditable="false" data-primary="state testing" data-secondary="preferring over interaction testing" data-type="indexterm" id="id-JjC1H1swI5sZ"> </a> testing, it is preferred<a contenteditable="false" data-primary="interaction testing" data-secondary="preferring state testing over" data-type="indexterm" id="id-wbCMsqswIWs9"> </a> to test code through <a href="https://oreil.ly/k3hSR"><em>state testing</em></a>.</p>
|
||||
|
||||
<p>With state testing, you call the system under test and validate that either the correct value was returned or that some other state in the system under test was properly changed. <a data-type="xref" href="ch13.html#example_onethree-onefivedot_state_testi">State testing</a> presents an example of state testing.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onefivedot_state_testi">
|
||||
<h5><span class="label">Example 13-15. </span>State testing</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void sortNumbers() {
|
||||
NumberSorter <strong>numberSorter</strong> = new NumberSorter(<strong>quicksort</strong>, <strong>bubbleSort</strong>);
|
||||
// Call the system under test.
|
||||
List <strong>sortedList</strong> = <strong>numberSorter</strong>.sortNumbers(newList(3, 1, 2));
|
||||
// Validate that the returned list is sorted. It doesn’t matter which
|
||||
// sorting algorithm is used, as long as the right result was returned.
|
||||
assertThat(<strong>sortedList</strong>).isEqualTo(newList(1, 2, 3));
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-onesixdot_interaction">Interaction testing</a> illustrates a similar test scenario but instead uses interaction testing. Note how it’s impossible for this test to determine that the numbers are actually sorted, because the test doubles don’t know how to sort the numbers—all it can tell you is that the system under test tried to sort the numbers.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onesixdot_interaction">
|
||||
<h5><span class="label">Example 13-16. </span>Interaction testing</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void sortNumbers_quicksortIsUsed() {
|
||||
// Pass in test doubles that were created by a mocking framework.
|
||||
NumberSorter <strong>numberSorter</strong> =
|
||||
new NumberSorter(<strong>mockQuicksort</strong>, <strong>mockBubbleSort</strong>);
|
||||
|
||||
// Call the system under test.
|
||||
<strong>numberSorter</strong>.sortNumbers(newList(3, 1, 2));
|
||||
|
||||
// Validate that numberSorter.sortNumbers() used quicksort. The test
|
||||
// will fail if mockQuicksort.sort() is never called (e.g., if
|
||||
// mockBubbleSort is used) or if it’s called with the wrong arguments.
|
||||
verify(<strong>mockQuicksort</strong>).sort(newList(3, 1, 2));
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p>At Google, we’ve found that emphasizing state testing is more scalable; it reduces test brittleness, making it easier to change and maintain code over time.</p>
|
||||
|
||||
<p>The primary<a contenteditable="false" data-primary="interaction testing" data-secondary="preferring state testing over" data-tertiary="limitations of interaction testing" data-type="indexterm" id="id-YjCoH0tpIPsp"> </a> issue with interaction testing is that it can’t tell you that the system under test is working properly; it can only validate that certain functions are called as expected. It requires you to make an assumption about the behavior of the code; for example, “<em>If <code>database.save(item)</code> is called, we assume the item will be saved to the database.</em>” State testing is preferred because it actually validates this assumption (such as by saving an item to a database and then querying the database to validate that the item exists).</p>
|
||||
|
||||
<p>Another downside of interaction testing is that it utilizes implementation details of the system under test—to validate that a function was called, you are exposing to the test that the system under test calls this function. Similar to stubbing, this extra code makes tests brittle because it leaks implementation details of your production code into tests. Some people at Google jokingly refer to tests that overuse interaction <span class="keep-together">testing</span> as <a href="https://oreil.ly/zkMDu"><em>change-detector tests</em></a> because they fail in response to any change to the production code, even if the behavior of the system under test remains unchanged.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="when_is_interaction_testing_appropriate">
|
||||
<h2>When Is Interaction Testing Appropriate?</h2>
|
||||
|
||||
<p>There are some <a contenteditable="false" data-primary="interaction testing" data-secondary="appropriate uses of" data-type="indexterm" id="id-wbCoHqs2cWs9"> </a>cases for which interaction testing is warranted:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>You cannot perform state testing because you are unable to use a real implementation or a fake (e.g., if the real implementation is too slow and no fake exists). As a fallback, you can perform interaction testing to validate that certain functions are called. Although not ideal, this does provide some basic level of confidence that the system under test is working as expected.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Differences in the number or order of calls to a function would cause undesired behavior. Interaction testing is useful because it could be difficult to validate this behavior with state testing. For example, if you expect a caching feature to reduce the number of calls to a database, you can verify that the database object is not accessed more times than expected. Using Mockito, the code might look similar to this:</p>
|
||||
|
||||
<pre data-type="programlisting">verify(databaseReader, atMostOnce()).selectRecords();</pre>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Interaction testing is not a complete replacement for state testing. If you are not able to perform state testing in a unit test, strongly consider supplementing your test suite with larger-scoped tests that do perform state testing. For instance, if you have a unit test that validates usage of a database through interaction testing, consider adding an integration test that can perform state testing against a real database. Larger-scope testing is an important strategy for risk mitigation, and we discuss it in the next <span class="keep-together">chapter.</span></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="best_practices_for_interaction_testing">
|
||||
<h2>Best Practices for Interaction Testing</h2>
|
||||
|
||||
<p>When performing<a contenteditable="false" data-primary="interaction testing" data-secondary="best practices" data-type="indexterm" id="id-kJCRHVsxSdsx"> </a> interaction testing, following these practices can reduce some of the impact of the aforementioned downsides.</p>
|
||||
|
||||
<section data-type="sect3" id="prefer_to_perform_interaction_testing_o">
|
||||
<h3>Prefer to perform interaction testing only for state-changing functions</h3>
|
||||
|
||||
<p>When a system under<a contenteditable="false" data-primary="interaction testing" data-secondary="best practices" data-tertiary="performing only for state-changing functions" data-type="indexterm" id="id-yoC0H9sOfbSQsZ"> </a> test calls a function on a dependency, that <a contenteditable="false" data-primary="state-changing functions" data-type="indexterm" id="id-MjCxsysofDSbs6"> </a>call falls into one of two categories:</p>
|
||||
|
||||
<dl>
|
||||
<dt>State-changing</dt>
|
||||
<dd>Functions that have side effects on the world outside the system under test. Examples: <code>sendEmail()</code>, <code>saveRecord()</code>, <code>logAccess()</code>.</dd>
|
||||
<dt>Non-state-changing</dt>
|
||||
<dd>Functions that don’t <a contenteditable="false" data-primary="non-state-changing functions" data-type="indexterm" id="id-ZjCyH2Imfgf8S0sj"> </a>have side effects; they return information about the world outside the system under test and don’t modify anything. Examples: <code>getUser()</code>, <code>findResults()</code>, <code>readFile()</code>.</dd>
|
||||
</dl>
|
||||
|
||||
<p>In general, you should perform interaction testing only for functions that are state-changing. Performing interaction testing for non-state-changing functions is usually redundant given that the system under test will use the return value of the function to do other work that you can assert. The interaction itself is not an important detail for correctness, because it has no side effects.</p>
|
||||
|
||||
<p>Performing interaction testing for non-state-changing functions makes your test brittle because you’ll need to update the test anytime the pattern of interactions changes. It also makes the test less readable given that the additional assertions make it more difficult to determine which assertions are important for ensuring correctness of the code. By contrast, state-changing interactions represent something useful that your code is doing to change state somewhere else.</p>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-onesevendot_state-chan">State-changing and non-state-changing interactions</a> demonstrates interaction testing on both state-changing and non-state-changing functions.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-onesevendot_state-chan">
|
||||
<h5><span class="label">Example 13-17. </span>State-changing and non-state-changing interactions</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void grantUserPermission() {
|
||||
UserAuthorizer userAuthorizer =
|
||||
new UserAuthorizer(<strong>mockUserService</strong>, <strong>mockPermissionDatabase</strong>);
|
||||
when(<strong>mockPermissionService</strong>.<strong>getPermission</strong>(FAKE_USER)).thenReturn(EMPTY);
|
||||
|
||||
// Call the system under test.
|
||||
userAuthorizer.grantPermission(USER_ACCESS);
|
||||
|
||||
// addPermission() is state-changing, so it is reasonable to perform
|
||||
// interaction testing to validate that it was called.
|
||||
verify(<strong>mockPermissionDatabase</strong>).<strong>addPermission</strong>(FAKE_USER, USER_ACCESS);
|
||||
|
||||
// getPermission() is non-state-changing, so this line of code isn’t
|
||||
// needed. One clue that interaction testing may not be needed:
|
||||
// getPermission() was already stubbed earlier in this test.
|
||||
verify(<strong>mockPermissionDatabase</strong>).<strong>getPermission</strong>(FAKE_USER);
|
||||
}</pre>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="avoid_overspecification">
|
||||
<h3>Avoid overspecification</h3>
|
||||
|
||||
<p>In <a data-type="xref" href="ch12.html#unit_testing">Unit Testing</a>, we discuss why it is useful to test behaviors rather than methods. This means that a test method should focus<a contenteditable="false" data-primary="overspecification of interaction tests" data-type="indexterm" id="id-YjC6sVspIdSQsw"> </a> on verifying one behavior of a method or class rather than trying<a contenteditable="false" data-primary="interaction testing" data-secondary="best practices" data-tertiary="avoiding overspecification" data-type="indexterm" id="id-EjCZf0sWI0SVsk"> </a> to verify multiple behaviors in a single test.</p>
|
||||
|
||||
<p>When performing interaction testing, we should aim to apply the same principle by avoiding overspecifying which functions and arguments are validated. This leads to tests that are clearer and more concise. It also leads to tests that are resilient to changes made to behaviors that are outside the scope of each test, so fewer tests will fail if a change is made to a way a function is called.</p>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-oneeightdot_overspecif">Overspecified interaction tests</a> illustrates interaction testing with overspecification. The intention of the test is to validate that the user’s name is included in the greeting prompt, but the test will fail if unrelated behavior is changed.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-oneeightdot_overspecif">
|
||||
<h5><span class="label">Example 13-18. </span>Overspecified interaction tests</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void displayGreeting_renderUserName() {
|
||||
when(mockUserService.getUserName()).thenReturn("Fake User");
|
||||
userGreeter.displayGreeting(); // Call the system under test.
|
||||
|
||||
// The test will fail if any of the arguments to setText() are changed.
|
||||
verify(<strong>userPrompt</strong>).<strong>setText</strong>("Fake User", "Good morning!", "Version 2.1");
|
||||
|
||||
// The test will fail if setIcon() is not called, even though this
|
||||
// behavior is incidental to the test since it is not related to
|
||||
// validating the user name.
|
||||
verify(<strong>userPrompt</strong>).<strong>setIcon</strong>(IMAGE_SUNSHINE);
|
||||
}</pre>
|
||||
</div>
|
||||
|
||||
<p><a data-type="xref" href="ch13.html#example_onethree-oneninedot_well-specif">Well-specified interaction tests</a> illustrates interaction testing with more care in specifying relevant arguments and functions. The behaviors being tested are split into separate tests, and each test validates the minimum amount necessary for ensuring<a contenteditable="false" data-primary="well-specified interaction tests" data-type="indexterm" id="id-6WCRsMSDIgSxsq"> </a> the behavior it is testing is correct.</p>
|
||||
|
||||
<div data-type="example" id="example_onethree-oneninedot_well-specif">
|
||||
<h5><span class="label">Example 13-19. </span>Well-specified interaction tests</h5>
|
||||
|
||||
<pre data-code-language="java" data-type="programlisting">@Test public void displayGreeting_renderUserName() {
|
||||
when(mockUserService.getUserName()).thenReturn("Fake User");
|
||||
userGreeter.displayGreeting(); // Call the system under test.
|
||||
verify(<strong>userPrompter</strong>).<strong>setText</strong>(eq("Fake User"), any(), any());
|
||||
}
|
||||
@Test public void displayGreeting_timeIsMorning_useMorningSettings() {
|
||||
setTimeOfDay(TIME_MORNING);
|
||||
userGreeter.displayGreeting(); // Call the system under test.
|
||||
verify(<strong>userPrompt</strong>).<strong>setText</strong>(any(), eq("Good morning!"), any());
|
||||
verify(<strong>userPrompt</strong>).<strong>setIcon</strong>(IMAGE_SUNSHINE);
|
||||
}</pre>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00017">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>We’ve <a contenteditable="false" data-primary="test doubles" data-secondary="interaction testing" data-startref="ix_tstdblint" data-type="indexterm" id="id-0OCnH9sYf1"> </a>learned that<a contenteditable="false" data-primary="interaction testing" data-startref="ix_inttst" data-type="indexterm" id="id-LjCYsAs4fJ"> </a> test doubles are crucial to engineering velocity because they can help comprehensively test your code and ensure that your tests run fast. On the other hand, misusing them can be a major drain on productivity because they can lead to tests that are unclear, brittle, and less effective. This is why it’s important for engineers to understand the best practices for how to effectively apply test doubles.</p>
|
||||
|
||||
<p>There is often no exact answer regarding whether to use a real implementation or a test double, or which test double technique to use. An engineer might need to make some trade-offs when deciding the proper approach for their use case.</p>
|
||||
|
||||
<p>Although test doubles are great for working around dependencies that are difficult to use in tests, if you want to maximize confidence in your code, at some point you still want to exercise these dependencies in tests. The next chapter will cover larger-scope testing, for which these dependencies are used regardless of their suitability for unit tests; for example, even if they are slow or nondeterministic.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00116">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A real implementation should be preferred over a test double.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A fake is often the ideal solution if a real implementation can’t be used in a test.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Overuse of stubbing leads to tests that are unclear and brittle.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Interaction testing should be avoided when possible: it leads to tests that are brittle because it exposes implementation<a contenteditable="false" data-primary="test doubles" data-startref="ix_tstdbl" data-type="indexterm" id="id-kJCRH3H8Ias1IR"> </a> details of the system under test.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</body>
|
||||
</html>
|
838
clones/abseil.io/resources/swe-book/html/ch14.html
Normal file
|
@ -0,0 +1,838 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="larger_testing">
|
||||
<h1>Larger Testing</h1>
|
||||
|
||||
<p class="byline">Written by Joseph Graves</p>
|
||||
|
||||
<p class="byline">Edited by Tom Manshreck</p>
|
||||
|
||||
<p>In previous chapters, we have recounted how a testing culture was established at Google and how small unit tests became a fundamental part of the developer workflow.<a contenteditable="false" data-primary="larger testing" data-type="indexterm" id="ix_lrgtst"> </a> But what about other kinds of tests? It turns out that Google does indeed use many larger tests, and these comprise a significant part of the risk mitigation strategy necessary for healthy software engineering. But these tests present additional challenges to ensure that they are valuable assets and not resource sinks. In this chapter, we’ll discuss what we mean by “larger tests,” when we execute them, and best practices for keeping them effective.<a contenteditable="false" data-primary="testing" data-secondary="larger" data-see="larger testing" data-type="indexterm" id="id-RbHWhmc2"> </a></p>
|
||||
|
||||
<section data-type="sect1" id="what_are_larger_testsquestion_mark">
|
||||
<h1>What Are Larger Tests?</h1>
|
||||
|
||||
<p>As mentioned previously, Google has specific notions of test size.<a contenteditable="false" data-primary="test sizes" data-secondary="large tests" data-type="indexterm" id="id-owHwIyh7fO"> </a> Small tests are restricted to one thread, one process, one machine. Larger tests do not have the same restrictions. <a contenteditable="false" data-primary="small tests" data-type="indexterm" id="id-9mHbhqhmfR"> </a>But Google also has notions of test <em>scope</em>. A unit test necessarily is of smaller scope than an integration test.<a contenteditable="false" data-primary="scope of tests" data-type="indexterm" id="id-mAHecAhofm"> </a> And the largest-scoped tests (sometimes called end-to-end or system tests) typically involve several real dependencies and fewer test doubles.<a contenteditable="false" data-primary="test scope" data-see="scope of tests" data-type="indexterm" id="id-6YHLfwhaf9"> </a></p>
|
||||
|
||||
<p>Larger tests are many things that small tests are not. They are not bound by the same constraints; thus, they can exhibit the <a contenteditable="false" data-primary="larger testing" data-secondary="characteristics of" data-type="indexterm" id="id-9mHJI8tmfR"> </a>following characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>They may be slow. Our large tests have a default timeout of 15 minutes or 1 hour, but we also have tests that run for multiple hours or even days.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They may be nonhermetic. Large tests may share resources with other tests and traffic.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They may be nondeterministic. If a large test is nonhermetic, it is almost impossible to guarantee determinism: other tests or user state may interfere with it.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>So why have larger tests? Reflect back on your coding process. How do you confirm that the programs you write actually work? <a contenteditable="false" data-primary="larger testing" data-secondary="advantages of" data-type="indexterm" id="id-mAH6Iwfofm"> </a><a contenteditable="false" data-type="indexterm" data-primary="unit testing" data-secondary="limitations of unit tests" id="id-6YH1h9faf9"> </a>You might be writing and running unit tests as you go, but do you find yourself running the actual binary and trying it out yourself? And when you share this code with others, how do they test it? By running your unit tests, or by trying it out themselves?</p>
|
||||
|
||||
<p>Also, how do you know that your code continues to work during upgrades? Suppose that you have a site that uses the Google Maps API and there’s a new API version. Your unit tests likely won’t help you to know whether there are any compatibility issues. You’d probably run it and try it out to see whether anything broke.</p>
|
||||
|
||||
<p>Unit tests can give you confidence about individual functions, objects, and modules, but large tests provide more confidence that the overall system works as intended. And having actual automated tests scales in ways that manual testing does not.</p>
|
||||
|
||||
<section data-type="sect2" id="fidelity">
|
||||
<h2>Fidelity</h2>
|
||||
|
||||
<p>The primary<a contenteditable="false" data-primary="larger testing" data-secondary="fidelity of tests" data-type="indexterm" id="id-NvHAIkhgSqfd"> </a> reason larger tests exist<a contenteditable="false" data-primary="fidelity" data-secondary="of tests" data-type="indexterm" id="id-ddHrhQhlS8fV"> </a> is to address <em>fidelity</em>. Fidelity is the property by which a test is reflective of the real behavior<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="fidelity of tests to behavior of" data-type="indexterm" id="id-8GHvc7heS4fk"> </a> of the system under test (SUT).</p>
|
||||
|
||||
<p>One way of envisioning fidelity is in terms of the environment. As <a data-type="xref" href="ch14.html#scale_of_increasing_fidelity">Figure 14-1</a> illustrates, unit tests bundle a test and a small portion of code together as a runnable unit, which ensures the code is tested but is very different from how production code runs. Production itself is, naturally, the environment of highest fidelity in testing. There is also a spectrum of interim options. A key for larger tests is to find the proper fit, because increasing fidelity also comes with increasing costs and (in the case of production) increasing risk of failure.</p>
|
||||
|
||||
<figure id="scale_of_increasing_fidelity"><img alt="Scale of increasing fidelity" src="images/seag_1401.png">
|
||||
<figcaption><span class="label">Figure 14-1. </span>Scale of increasing fidelity</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Tests can also be measured in terms of how faithful the test content is to reality. Many handcrafted, large tests are dismissed by engineers if the test data itself looks unrealistic. Test data copied from production is much more faithful to reality (having been captured that way), but a big challenge is how to create realistic test traffic <em>before</em> launching the new code. <a contenteditable="false" data-primary="AI (artificial intelligence)" data-secondary="seed data, biases in" data-type="indexterm" id="id-KWHXhRfoSPfL"> </a>This is particularly a problem in artificial intelligence (AI), for which the "seed" data often suffers from intrinsic bias. And, because most data for unit tests is handcrafted, it covers a narrow range of cases and tends to conform to the biases of the author. The uncovered scenarios missed by the data represent a fidelity gap in the tests.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="common_gaps_in_unit_tests">
|
||||
<h2>Common Gaps in Unit Tests</h2>
|
||||
|
||||
<p>Larger tests might also be necessary where smaller tests fail. <a contenteditable="false" data-primary="larger testing" data-secondary="unit tests not providing good risk mitigation coverage" data-type="indexterm" id="ix_lrgtstut"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-type="indexterm" id="ix_unttst"> </a>The subsections that follow present some particular areas where unit tests do not provide good risk mitigation coverage.</p>
|
||||
|
||||
<section data-type="sect3" id="unfaithful_doubles">
|
||||
<h3>Unfaithful doubles</h3>
|
||||
|
||||
<p>A single unit test typically covers one class or module. <a contenteditable="false" data-primary="test doubles" data-secondary="unfaithful" data-type="indexterm" id="id-8GHOI7hVtVUef7"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-tertiary="unfaithful test doubles" data-type="indexterm" id="id-KWHXhahntqUKfY"> </a>Test doubles (as discussed in <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>) are frequently used to eliminate heavyweight or hard-to-test dependencies. But when those dependencies are replaced, it becomes possible that the replacement and the doubled thing do not agree.</p>
|
||||
|
||||
<p>Almost all unit tests at Google are written by the same engineer who is writing the unit under test. When those unit tests need doubles and when the doubles used are mocks, it is the engineer writing the unit test defining the mock and its intended behavior. But that engineer usually did <em>not</em> write the thing being mocked and can be misinformed about its actual behavior. The relationship between the unit under test and a given peer is a behavioral contract, and if the engineer is mistaken about the actual behavior, the understanding of the contract is invalid.</p>
|
||||
|
||||
<p>Moreover, mocks become stale.<a contenteditable="false" data-primary="mocking" data-secondary="mocks becoming stale" data-type="indexterm" id="id-MnHzILcLtrU1fv"> </a> If this mock-based unit test is not visible to the author of the real implementation and the real implementation changes, there is no signal that the test (and the code being tested) should be updated to keep up with the changes.</p>
|
||||
|
||||
<p>Note that, as mentioned in <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>, if teams provide fakes for their own services, this concern is mostly alleviated.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="configuration_issues">
|
||||
<h3>Configuration issues</h3>
|
||||
|
||||
<p>Unit tests cover code within a given binary. <a contenteditable="false" data-primary="configuration issues with unit tests" data-type="indexterm" id="id-KWHkIahdcqUKfY"> </a><a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-tertiary="configuration issues" data-type="indexterm" id="id-MnH6hRhPcrU1fv"> </a>But that binary is typically not completely self-sufficient in terms of how it is executed. Usually a binary has some kind of deployment configuration or starter script. Additionally, real end-user-serving production instances have their own configuration files or configuration databases.</p>
|
||||
|
||||
<p>If there are issues with these files or the compatibility between the state defined by these stores and the binary in question, these can lead to major user issues. Unit tests alone cannot verify this compatibility.<sup><a data-type="noteref" id="ch01fn140-marker" href="ch14.html#ch01fn140">1</a></sup> Incidentally, this is a good reason to ensure that your configuration is in version control as well as your code, because then, changes to configuration can be identified as the source of bugs as opposed to introducing random external flakiness and can be built in to large tests.</p>
|
||||
|
||||
<p>At Google, configuration changes are the number one reason for our major outages. This is an area in which we have underperformed and has led to some of our most embarrassing bugs. For example, there was a global Google outage back in 2013 due to a bad network configuration push that was never tested. Configurations tend to be written in configuration languages, not production code languages. They also often have faster production rollout cycles than binaries, and they can be more difficult to test. All of these lead to a higher likelihood of failure. But at least in this case (and others), configuration was version controlled, and we could quickly identify the culprit and mitigate the issue.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="issues_that_arise_under_load">
|
||||
<h3>Issues that arise under load</h3>
|
||||
|
||||
<p>At Google, unit tests are intended to be small<a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-tertiary="issues arising under load" data-type="indexterm" id="id-MnHzIRh4frU1fv"> </a> and fast because they need to fit into our standard test execution infrastructure and also be run many times as part of a frictionless developer workflow. But performance, load, and stress testing often require sending large volumes of traffic to a given binary. These volumes become difficult to test in the model of a typical unit test. And our large volumes are big, often thousands or millions of queries per second (in the case of ads, <a href="https://oreil.ly/brV5-">real-time bidding</a>)!</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="unanticipated_behaviorscomma_inputscomm">
|
||||
<h3>Unanticipated behaviors, inputs, and side effects</h3>
|
||||
|
||||
<p>Unit tests are limited by the imagination of the engineer writing them.<a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-tertiary="unanticipated behaviors, inputs, and side effects" data-type="indexterm" id="id-DzHAIrhrCvUYfY"> </a> That is, they can only test for anticipated behaviors and inputs. <a contenteditable="false" data-primary="behaviors" data-secondary="unanticipated, testing for" data-type="indexterm" id="id-zoHKhBhYCXU7f3"> </a>However, issues that users find with a product are mostly unanticipated (otherwise it would be unlikely that they would make it to end users as issues). This fact suggests that different test techniques are needed to test for unanticipated behaviors.</p>
|
||||
|
||||
<p><a href="http://hyrumslaw.com">Hyrum’s Law</a> is an important consideration here: even if we could test 100% for conformance to a strict, specified contract, the effective user contract applies to all visible behaviors, not just a stated contract.<a contenteditable="false" data-primary="Hyrum's Law" data-secondary="consideration in unit tests" data-type="indexterm" id="id-aOHrh2tlC1U4fD"> </a> It is unlikely that unit tests alone test for all visible behaviors that are not specified in the public API.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="emergent_behaviors_and_the_quotation_ma">
|
||||
<h3>Emergent behaviors and the "vacuum effect"</h3>
|
||||
|
||||
<p>Unit tests are limited to the scope that they cover (especially with the widespread use of test doubles), so if behavior changes <a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-tertiary="emergent behaviors and the vacuum effect" data-type="indexterm" id="id-zoH4IBh7TXU7f3"> </a>in areas outside of this scope, it cannot be detected. And because unit tests are designed to be fast and reliable, they deliberately eliminate the chaos of real dependencies, network, and data. <a contenteditable="false" data-primary="vacuum effect, unit tests and" data-type="indexterm" id="id-aOHrhvh4T1U4fD"> </a>A unit test is like a problem in theoretical physics: ensconced in a vacuum, neatly hidden from the mess of the real world, which is great for speed and reliability but misses certain<a contenteditable="false" data-primary="larger testing" data-secondary="unit tests not providing good risk mitigation coverage" data-startref="ix_lrgtstut" data-type="indexterm" id="id-10H5t3hzTbUXfW"> </a> defect <a contenteditable="false" data-primary="unit testing" data-secondary="common gaps in unit tests" data-startref="ix_unttst" data-type="indexterm" id="id-AKH5cMh0TGUVfn"> </a><span class="keep-together">categories.</span></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="why_not_have_larger_testsquestion_mark">
|
||||
<h2>Why Not Have Larger Tests?</h2>
|
||||
|
||||
<p>In earlier chapters, we discussed many of the properties of a developer-friendly test. In particular,<a contenteditable="false" data-primary="larger testing" data-secondary="challenges and limitations of" data-type="indexterm" id="id-5XHOILh3sWfb"> </a> it needs to be as follows:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Reliable</dt>
|
||||
<dd>It must not be flaky and it must provide a useful pass/fail signal.</dd>
|
||||
<dt>Fast</dt>
|
||||
<dd>It needs to be fast enough to not interrupt the developer workflow.</dd>
|
||||
<dt>Scalable</dt>
|
||||
<dd>Google needs to be able to run all such useful affected tests efficiently for presubmits and for post-submits.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Good unit tests exhibit all of these properties.<a contenteditable="false" data-type="indexterm" data-primary="unit testing" data-secondary="properties of good unit tests" id="id-KWHkI7crsPfL"> </a> Larger tests often violate all of these constraints. For example, larger tests are often flakier because they use more infrastructure than does a small unit test. They are also often much slower, both to set up as well as to run. And they have trouble scaling because of the resource and time requirements, but often also because they are not isolated—these tests can collide with one another.</p>
|
||||
|
||||
<p>Additionally, larger tests present two other challenges. First, there is a challenge of ownership. A unit test is clearly owned by the engineer (and team) who owns the unit. A larger test spans multiple units and thus can span multiple owners. This presents a long-term ownership challenge: who is responsible for maintaining the test and who is responsible for diagnosing issues when the test breaks? Without clear ownership, a test rots.</p>
|
||||
|
||||
<p>The second challenge for larger <a contenteditable="false" data-primary="standardization, lack of, in larger tests" data-type="indexterm" id="id-DzHAIRClsDfb"> </a>tests is one of standardization (or the lack thereof). Unlike unit tests, larger tests suffer a lack of standardization in terms of the infrastructure and process by which they are written, run, and debugged. The approach to larger tests is a product of a system’s architectural decisions, thus introducing variance in the type of tests required. For example, the way we build and run A-B diff regression tests in Google Ads is completely different from the way such tests are built and run in Search backends, which is different again from Drive. They use different platforms, different languages, different infrastructures, different libraries, and competing testing frameworks.</p>
|
||||
|
||||
<p>This lack of standardization has a significant impact. Because larger tests have so many ways of being run, they often are skipped<a contenteditable="false" data-primary="large-scale changes" data-secondary="larger tests skipped during" data-type="indexterm" id="id-zoH4IdTQswfY"> </a> during large-scale changes. (See <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>.) The infrastructure does not have a standard way to run those tests, and asking the people executing LSCs to know the local particulars for testing on every team doesn’t scale. Because larger tests differ in implementation from team to team, tests that actually test the integration between those teams require unifying incompatible infrastructures. And because of this lack of standardization, we cannot teach a single approach to Nooglers (new Googlers) or even more experienced engineers, which both perpetuates the situation and also leads to a lack of understanding about the motivations of such tests.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="larger_tests_at_google">
|
||||
<h1>Larger Tests at Google</h1>
|
||||
|
||||
<p>When we discussed<a contenteditable="false" data-primary="larger testing" data-secondary="larger tests at Google" data-type="indexterm" id="ix_lrgtstGoo"> </a> the history of testing at Google earlier (see <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>), we mentioned how Google Web Server (GWS) mandated automated tests in 2003 and how this was a watershed moment. However, we actually had automated tests in use before this point, but a common practice was using automated large and enormous tests. For example, AdWords created an end-to-end test back in 2001 to validate product scenarios. Similarly, in 2002, Search wrote a similar "regression test" for its indexing code, and AdSense (which had not even publicly launched yet) created its variation on the AdWords test.</p>
|
||||
|
||||
<p>Other "larger" testing patterns also existed circa 2002. <a contenteditable="false" data-primary="Google Search" data-secondary="larger tests at Google" data-type="indexterm" id="id-GAHeIlt6C9"> </a>The Google search frontend relied heavily on manual QA—manual versions of end-to-end test scenarios. And Gmail got its version of a "local demo" environment—a script to bring up an end-to-end Gmail environment locally with some generated test users and mail data for local manual testing.</p>
|
||||
|
||||
<p>When C/J Build (our first continuous build framework) launched, it did not distinguish between unit tests and other tests, but there were two critical developments that led to a split. First, Google focused on unit tests because we wanted to encourage the testing pyramid and to ensure the vast majority of written tests were unit tests. Second, when TAP replaced C/J Build as our formal continuous build system, it was only able to do so for tests that met TAP’s eligibility requirements: hermetic tests buildable at a single change that could run on our build/test cluster within a maximum time limit. Although most unit tests satisfied this requirement, larger tests mostly did not. However, this did not stop the need for other kinds of tests, and they have continued to fill the coverage gaps. C/J Build even stuck around for years specifically to handle these kinds of tests until newer systems replaced it.</p>
|
||||
|
||||
<section data-type="sect2" id="larger_tests_and_time">
|
||||
<h2>Larger Tests and Time</h2>
|
||||
|
||||
<p>Throughout this book, we have looked at the influence<a contenteditable="false" data-primary="time" data-secondary="larger tests and passage of time" data-type="indexterm" id="id-O2HzIYh2fnC4"> </a> of time on software engineering, because<a contenteditable="false" data-primary="larger testing" data-secondary="larger tests at Google" data-tertiary="time and" data-type="indexterm" id="id-gKHWhMhPfaCR"> </a> Google has built software running for more than 20 years. How are larger tests influenced by the time dimension? We know that certain activities make more sense the longer the expected lifespan of code, and testing of various forms is an activity that makes sense at all levels, but the test types that are appropriate change over the expected lifetime of code.</p>
|
||||
|
||||
<p>As we pointed <a contenteditable="false" data-primary="unit testing" data-secondary="lifespan of software tested" data-type="indexterm" id="id-gKHYIotPfaCR"> </a>out before, unit tests begin to make sense for software with an expected lifespan from hours on up. <a contenteditable="false" data-primary="manual testing" data-type="indexterm" id="id-NvHJhvtrfBCd"> </a>At the minutes level (for small scripts), manual testing is most common, and the SUT usually runs locally, but the local demo likely <em>is</em> production, especially for one-off scripts, demos, or experiments. At longer lifespans, manual testing continues to exist, but the SUTs usually diverge because the production instance is often cloud hosted instead of locally hosted.</p>
|
||||
|
||||
<p>The remaining larger tests all provide value for longer-lived software, but the main concern becomes the maintainability of such tests as time increases.</p>
|
||||
|
||||
<p>Incidentally, this time impact might be one reason for the development <a contenteditable="false" data-primary="ice cream cone antipattern in testing" data-type="indexterm" id="id-ddH5IkfJfACV"> </a>of the "ice cream cone" testing antipattern, as mentioned in the <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a> and shown again in <a data-type="xref" href="ch14.html#the_ice_cream_cone_testing_antipattern">Figure 14-2</a>.</p>
|
||||
|
||||
<figure id="the_ice_cream_cone_testing_antipattern"><img alt="The ice cream cone testing antipattern" src="images/seag_1402.png">
|
||||
<figcaption><span class="label">Figure 14-2. </span>The ice cream cone testing antipattern</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>When development starts with manual testing (when engineers think that code is meant to last only for minutes), those manual tests accumulate and dominate the initial overall testing portfolio. For example, it’s pretty typical to hack on a script or an app and test it out by running it, and then to continue to add features to it but continue to test it out by running it manually. This prototype eventually becomes functional and is shared with others, but no automated tests actually exist for it.</p>
|
||||
|
||||
<p>Even worse, if the code is difficult to unit test (because of the way it was implemented in the first place), the only automated tests that can be written are end-to-end ones, and we have inadvertently created "legacy code" within days.</p>
|
||||
|
||||
<p>It is <em>critical</em> for longer-term health to move toward the test pyramid within the first few days of development by building out unit tests, and then to top it off after that point by introducing automated integration tests and moving away from manual end-to-end tests. We succeeded by making unit tests a requirement for submission, but covering the gap between unit tests and manual tests is necessary for long-term health.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="larger_tests_at_google_scale">
|
||||
<h2>Larger Tests at Google Scale</h2>
|
||||
|
||||
<p>It would seem that larger tests should be more necessary and more appropriate at larger scales of software, but even though <a contenteditable="false" data-primary="larger testing" data-secondary="larger tests at Google" data-tertiary="Google scale and" data-type="indexterm" id="id-gKHYIMh9CaCR"> </a>this is so, the complexity of authoring, running, maintaining, and debugging these tests increases with the growth in scale, even more so than with unit tests.</p>
|
||||
|
||||
<p>In a system composed of microservices or separate servers, the pattern of interconnections looks like a graph: let the number of nodes in that graph be our <em>N</em>. Every time a new node is added to this graph, there is a multiplicative effect on the number of distinct execution paths through it.</p>
|
||||
|
||||
<p><a data-type="xref" href="ch14.html#example_of_a_fairly_small_sut_a_social">Figure 14-3</a> depicts an imagined SUT: this <a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="larger tests for" data-type="indexterm" id="id-5XHzhAcLCnCb"> </a>system consists of a social network with users, a social graph, a stream of posts, and some ads mixed in. The ads are created by advertisers and served in the context of the social stream.<a contenteditable="false" data-type="indexterm" data-primary="UIs" data-secondary="in example of fairly small SUT" id="id-8GHPtbcgCaCk"> </a> This SUT alone consists of two groups of users, two UIs, three databases, an indexing pipeline, and six servers. There are 14 edges enumerated in the graph. Testing all of the end-to-end possibilities is already difficult. Imagine if we add more services, pipelines, and databases to this mix: photos and images, machine learning photo analysis, and so on?</p>
|
||||
|
||||
<figure id="example_of_a_fairly_small_sut_a_social"><img alt="Example of a fairly small SUT: a social network with advertising" src="images/seag_1403.png">
|
||||
<figcaption><span class="label">Figure 14-3. </span>Example of a fairly small SUT: a social network with advertising</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The rate of distinct scenarios to test in an end-to-end way can grow exponentially or combinatorially depending on the structure of the system under test, and that growth does not scale. Therefore, as the system grows, we must find alternative larger testing strategies to keep things manageable.</p>
|
||||
|
||||
<p>However, the value of such tests also increases because of the decisions that were necessary to achieve this scale. This is an impact of fidelity: as we move toward larger-<em>N</em> layers of software, if the service doubles are lower fidelity (1-epsilon), the chance of bugs when putting it all together is exponential in <em>N</em>. Looking at this example SUT again, if we replace the user server and ad server with doubles and those doubles are low fidelity (e.g., 10% accurate), the likelihood of a bug is 99% (1 – (0.1 ∗ 0.1)). And that’s just with two low-fidelity doubles.</p>
|
||||
|
||||
<p>Therefore, it becomes critical to implement larger tests in ways that work well at this scale but maintain reasonably high fidelity.</p>
|
||||
|
||||
<aside data-type="sidebar" id="tip_quotation_marksemicolonthe_smallest">
|
||||
<h5>Tip: "The Smallest Possible Test"</h5>
|
||||
|
||||
<p>Even for integration tests, smaller is better—a handful of large tests is preferable to an enormous one. <a contenteditable="false" data-primary="scope of tests" data-secondary="smallest possible test" data-type="indexterm" id="id-zoH4IBh1UrC3C3"> </a>And, because the scope of a test is often coupled to the scope of the SUT, finding ways to make the SUT smaller help make the test smaller.<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="scope of, test scope and" data-type="indexterm" id="id-aOHrhvhLUPCgCD"> </a></p>
|
||||
|
||||
<p>One way to achieve this test ratio when presented with a user journey that can require contributions from many internal systems is to "chain" tests, as illustrated in <a data-type="xref" href="ch14.html#chained_tests">Figure 14-4</a>, not specifically in their execution, but to create multiple smaller pairwise integration tests that represent the overall scenario. This is done by ensuring that the output of one test is used as the input to another test by persisting this output to a data repository.</p>
|
||||
|
||||
<figure id="chained_tests"><img alt="Chained tests" src="images/seag_1404.png">
|
||||
<figcaption><span class="label">Figure 14-4. </span>Chained tests</figcaption>
|
||||
</figure>
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="structure_of_a_large_test">
|
||||
<h1>Structure of a Large Test</h1>
|
||||
|
||||
<p>Although large tests<a contenteditable="false" data-primary="larger testing" data-secondary="larger tests at Google" data-startref="ix_lrgtstGoo" data-type="indexterm" id="id-GAHeI6hOT9"> </a> are not bound by small <a contenteditable="false" data-primary="larger testing" data-secondary="structure of a large test" data-type="indexterm" id="ix_lrgtststrct"> </a>test constraints and could conceivably consist of anything, most large tests exhibit common patterns. Large tests usually consist of a workflow with the following phases:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Obtain a system under test</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Seed necessary test data</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Perform actions using the system under test</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verify behaviors</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<section data-type="sect2" id="the_system_under_test">
|
||||
<h2>The System Under Test</h2>
|
||||
|
||||
<p>One key <a contenteditable="false" data-primary="systems under test (SUTs)" data-type="indexterm" id="ix_SUTs"> </a>component of large tests is the <a contenteditable="false" data-primary="larger testing" data-secondary="structure of a large test" data-tertiary="systems under test (SUTs)" data-type="indexterm" id="ix_lrgtststrctSUT"> </a>aforementioned SUT (see <a data-type="xref" href="ch14.html#an_example_system_under_test_left_paren">Figure 14-5</a>). A typical unit test focuses its attention on one class or module. Moreover, the test code runs in the same process (or Java Virtual Machine [JVM], in the Java case) as the code being tested. For larger tests, the SUT is often very different; one or more separate processes with test code often (but not always) in its own process.</p>
|
||||
|
||||
<figure id="an_example_system_under_test_left_paren"><img alt="An example system under test (SUT)" src="images/seag_1405.png">
|
||||
<figcaption><span class="label">Figure 14-5. </span>An example system under test (SUT)</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>At Google, we use many different forms of SUTs, and the scope of the SUT is one of the primary drivers of the scope of the large test itself (the larger the SUT, the larger the test). Each SUT form can be judged based on two primary factors:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Hermeticity</dt>
|
||||
<dd>This is the SUT’s isolation from usages and interactions from <a contenteditable="false" data-primary="hermetic SUTs" data-type="indexterm" id="id-5XHOILh4fqcQTO"> </a>other components than the test in question. An SUT with high hermeticity will have the least exposure to sources of concurrency and infrastructure flakiness.</dd>
|
||||
<dt>Fidelity</dt>
|
||||
<dd>
|
||||
<p>The SUT’s accuracy in reflecting the production system being tested.<a contenteditable="false" data-primary="fidelity" data-secondary="of SUTs" data-type="indexterm" id="id-KWHkIrIdcPflcoTV"> </a> An SUT with high fidelity will consist of binaries that resemble the production versions (rely on similar configurations, use similar infrastructures, and have a similar overall topology).</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
||||
<p>Often these two factors are in direct conflict.</p>
|
||||
|
||||
<p>Following are some examples of SUTs:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Single-process SUT</dt>
|
||||
<dd>The entire system<a contenteditable="false" data-primary="single-process SUT" data-type="indexterm" id="id-MnHzIRhoS3cXTv"> </a> under test is<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="examples of" data-type="indexterm" id="id-DzHGhrh4SEcXTY"> </a> packaged into a single binary (even if in production these are multiple separate binaries). Additionally, the test code can be packaged into the same binary as the SUT. Such a test-SUT combination can be a "small" test if everything is single-threaded, but it is the least faithful to the production topology and configuration.</dd>
|
||||
<dt>Single-machine SUT</dt>
|
||||
<dd>The system under test consists of one or more separate binaries (same as production) and the test is its own binary.<a contenteditable="false" data-primary="single-machine SUT" data-type="indexterm" id="id-zoH4IGcmSVcPT3"> </a> But everything runs on one machine. This is used for "medium" tests. Ideally, we use the production launch configuration of each binary when running those binaries locally for increased fidelity.</dd>
|
||||
<dt>Multimachine SUT</dt>
|
||||
<dd>The system under test is distributed <a contenteditable="false" data-primary="multimachine SUT" data-type="indexterm" id="id-10H1I8C5SkcyTW"> </a>across multiple machines (much like a production cloud deployment). This is even higher fidelity than the single-machine SUT, but its use makes tests "large" size and the combination is susceptible to increased network and machine flakiness.</dd>
|
||||
<dt>Shared environments (staging and production)</dt>
|
||||
<dd>Instead of<a contenteditable="false" data-primary="shared environment SUT" data-type="indexterm" id="id-BMHGI9SOS4c9TY"> </a> running a standalone SUT, the test just uses a shared environment. This has the lowest cost because these shared environments usually already exist, but the test might conflict with other simultaneous uses and one must wait for the code to be pushed to those environments. Production also increases the risk of end-user impact.</dd>
|
||||
<dt>Hybrids</dt>
|
||||
<dd>Some SUTs represent a mix: it might be <a contenteditable="false" data-primary="hybrid SUTs" data-type="indexterm" id="id-qWHaIBsMSLcNTo"> </a>possible to run some of the SUT but have it interact with a shared environment. Usually the thing being tested is explicitly run but its backends are shared. For a company as expansive as Google, it is practically impossible to run multiple copies of all of Google’s interconnected services, so some hybridization is required.</dd>
|
||||
</dl>
|
||||
|
||||
<section data-type="sect3" id="the_benefits_of_hermetic_suts">
|
||||
<h3>The benefits of hermetic SUTs</h3>
|
||||
|
||||
<p>The SUT in a large test can be a major source of both unreliability and long turnaround time.<a contenteditable="false" data-primary="hermetic SUTs" data-secondary="benefits of" data-type="indexterm" id="id-DzHAIrhbUEcXTY"> </a> For example, an in-production test uses the actual production system deployment. As mentioned earlier, this is popular because there is no extra overhead cost for the environment, but production tests cannot be run until the code reaches that environment, which means those tests cannot themselves block the release of the code to that environment—the SUT is too late, essentially.</p>
|
||||
|
||||
<p>The most common first alternative is to create a giant shared staging environment and to run tests there. This is usually done as part of some release promotion process, but it again limits test execution to only when the code is available. As an alternative, some teams will allow engineers to "reserve" time in the staging environment and to use that time window to deploy pending code and to run tests, but this does not scale with a growing number of engineers or a growing number of services, because the environment, its number of users, and the likelihood of user conflicts all quickly grow.</p>
|
||||
|
||||
<p>The next step is to support cloud-isolated or machine-hermetic SUTs. Such an environment improves the situation by avoiding the conflicts and reservation requirements for code release.</p>
|
||||
|
||||
<aside data-type="sidebar" id="callout_risks_of_testing_in_production">
|
||||
<h5>Case Study: Risks of testing in production and Webdriver Torso</h5>
|
||||
|
||||
<p>We mentioned that testing in production can be risky. <a contenteditable="false" data-primary="production" data-secondary="risks of testing in" data-type="indexterm" id="id-AKHrIMh4fGUocmTM"> </a><a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="risks of testing in production and Webdriver Torso" data-type="indexterm" id="id-BMHyhghMfQU3cLTE"> </a>One humorous episode resulting from testing in production<a contenteditable="false" data-primary="Webdriver Torso incident" data-type="indexterm" id="id-w4HDtPhkfPU4clTn"> </a> was known as the Webdriver Torso incident. We needed a way to verify that video rendering in YouTube production was working properly and so created automated scripts to generate test videos, upload them, and verify the quality of the upload. This was done in a Google-owned YouTube channel called Webdriver Torso. But this channel was public, as were most of the videos.</p>
|
||||
|
||||
<p>Subsequently, this channel was publicized in <a href="https://oreil.ly/1KxVn">an article at Wired</a>, which led to its spread throughout the media and subsequent efforts to solve the mystery. Finally, <a href="https://oreil.ly/ko_kV">a blogger</a> tied everything back to Google. Eventually, we came clean by having a bit of fun with it, including a Rickroll and an Easter Egg, so everything worked out well. But we do need to think about the possibility of end-user discovery of any test data we include in production and be prepared for it.</p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="reducing_the_size_of_your_sut_at_proble">
|
||||
<h3>Reducing the size of your SUT at problem boundaries</h3>
|
||||
|
||||
<p>There <a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="reducing size at problem testing boundaries" data-type="indexterm" id="id-zoH4IBhQsVcPT3"> </a>are particularly painful testing boundaries that might be worth avoiding. <a contenteditable="false" data-type="indexterm" data-primary="UIs" data-secondary="tests for, unreliable and costly" id="id-aOHrhvhqsgc1TD"> </a>Tests that involve both frontends and backends become painful because user interface (UI) tests are notoriously unreliable and costly:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>UIs often change in look-and-feel ways that make UI tests brittle but do not actually impact the underlying behavior.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>UIs often have asynchronous behaviors that are difficult to test.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Although it is useful to have end-to-end tests of a UI of a service <a contenteditable="false" data-primary="UIs" data-secondary="end-to-end tests of service UI to its backend" data-type="indexterm" id="id-10H1IWcnskcyTW"> </a>all the way to its backend, these tests have a multiplicative maintenance cost for both the UI and the backends. Instead, if the backend provides a public API, it is often easier to split the tests into connected tests at the UI/API boundary and to use the public<a contenteditable="false" data-primary="APIs" data-secondary="service UI backend providing public API" data-type="indexterm" id="id-AKH8hkcqs4cWTn"> </a> API to drive the end-to-end tests. This is true whether the UI is a browser, command-line interface (CLI), desktop app, or mobile app.</p>
|
||||
|
||||
<p>Another special boundary is for third-party dependencies. Third-party systems might not have a public shared environment for testing, and in some cases, there is a cost with sending traffic to a third party. Therefore, it is not recommended to have automated tests use a real third-party API, and that dependency is an important seam at which to split tests.</p>
|
||||
|
||||
<p>To address this issue of size, we have made this SUT smaller by replacing its databases with in-memory databases and removing one of the servers outside the scope of the SUT that we actually care about, as shown in <a data-type="xref" href="ch14.html#a_reduced-size_sut">Figure 14-6</a>. This SUT is more likely to fit on a single machine.</p>
|
||||
|
||||
<figure id="a_reduced-size_sut"><img alt="A reduced-size SUT" src="images/seag_1406.png">
|
||||
<figcaption><span class="label">Figure 14-6. </span>A reduced-size SUT</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The key is to identify trade-offs between fidelity and cost/reliability, and to identify reasonable boundaries. If we can run a handful of binaries and a test and pack it all into the same machines that do our regular compiles, links, and unit test executions, we have the easiest and most stable "integration" tests for our engineers.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="callout_record-replay_proxies">
|
||||
<h3>Record/replay proxies</h3>
|
||||
|
||||
<p>In the previous chapter, we discussed test doubles and approaches that can be used to decouple the class under test from its difficult-to-test dependencies.<a contenteditable="false" data-primary="record/replay systems" data-type="indexterm" id="id-aOHzIvhMHgc1TD"> </a><a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="dealing with dependent but subsidiary services" data-type="indexterm" id="id-10H4h3h2HkcyTW"> </a> We can also double entire servers and processes by using a mock, stub, or fake server or process with the equivalent API. However, there is no guarantee that the test double used actually conforms to the contract of the real thing that it is replacing.</p>
|
||||
|
||||
<p>One way of dealing with an SUT’s dependent but subsidiary services is to use a test double, but how does one know that the double reflects the dependency’s actual behavior? A growing approach outside of Google is to use a framework for <a href="https://oreil.ly/RADVJ">consumer-driven contract</a> tests. <a contenteditable="false" data-primary="consumer-driven contract tests" data-type="indexterm" id="id-AKH8hQtrH4cWTn"> </a>These are tests that define a contract for both the client and the provider of the service, and this contract can drive automated tests. That is, a client defines a mock of the service saying that, for these input arguments, I get a particular output. Then, the real service uses this input/output pair in a real test to ensure that it produces that output given those inputs. Two public tools for consumer-driven contract testing are <a href="https://docs.pact.io">Pact Contract Testing</a> and <a href="https://oreil.ly/szQ4j">Spring Cloud Contracts</a>. Google’s heavy dependency on protocol buffers means that we don’t use these internally.<a contenteditable="false" data-primary="Spring Cloud Contracts" data-type="indexterm" id="id-qWHWfatyHLcNTo"> </a><a contenteditable="false" data-primary="Pact Contract Testing" data-type="indexterm" id="id-W0HwCKt1HwcaTa"> </a></p>
|
||||
|
||||
<p>At Google, we do something a little bit different. <a href="https://oreil.ly/-wvYi">Our most popular approach</a> (for which there is a public API) is to use a larger test to generate a smaller one by recording the traffic to those external services when running the larger test and replaying it when running smaller tests. The larger, or "Record Mode" test runs continuously on post-submit, but its primary purpose is to generate these traffic logs (it must pass, however, for the logs to be generated). The smaller, or "Replay Mode" test is used during development and presubmit testing.</p>
|
||||
|
||||
<p>One of the interesting aspects of how record/replay works is that, because of nondeterminism, requests must be matched via a matcher to determine which response to replay. This makes them very similar to stubs and mocks in that argument matching is used to determine the resulting behavior.</p>
|
||||
|
||||
<p>What happens for new tests or tests where the client behavior changes significantly? In these cases, a request might no longer match what is in the recorded traffic file, so the test cannot pass in Replay mode. In that circumstance, the engineer must run the test in Record mode to generate new traffic, so it is important to make running Record tests easy, fast, and stable.<a contenteditable="false" data-primary="larger testing" data-secondary="structure of a large test" data-startref="ix_lrgtststrctSUT" data-tertiary="systems under test (SUTs)" data-type="indexterm" id="id-w4H5IXCEHmcbTr"> </a><a contenteditable="false" data-primary="systems under test (SUTs)" data-startref="ix_SUTs" data-type="indexterm" id="id-qWHNh4CyHLcNTo"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="test_data">
|
||||
<h2>Test Data</h2>
|
||||
|
||||
<p>A test needs data,<a contenteditable="false" data-primary="test data for larger tests" data-type="indexterm" id="id-gKHYIMhPfzTR"> </a> and a large test needs two <a contenteditable="false" data-primary="larger testing" data-secondary="structure of a large test" data-tertiary="test data" data-type="indexterm" id="id-NvHJhkhrf3Td"> </a>different kinds of data:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Seeded data</dt>
|
||||
<dd>Data preinitialized into the system under test reflecting the state of the SUT at the inception of the test<a contenteditable="false" data-primary="seeded data" data-type="indexterm" id="id-ddH5IQh6t8f7T7"> </a></dd>
|
||||
<dt>Test traffic</dt>
|
||||
<dd>Data sent to the system under test <a contenteditable="false" data-primary="test traffic" data-type="indexterm" id="id-8GHOIbcVt4fOT7"> </a>by the test itself during its execution</dd>
|
||||
</dl>
|
||||
|
||||
<p>Because of<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="seeding the SUT state" data-type="indexterm" id="id-ddH5INcJfDTV"> </a> the notion of the separate and larger SUT, the work to seed the SUT state is often orders of magnitude more complex than the setup work done in a unit test. For example:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Domain data</dt>
|
||||
<dd>Some databases contain data prepopulated into tables and used as configuration for the environment. Actual service binaries using such a database may fail on startup if domain data is not provided.</dd>
|
||||
<dt>Realistic baseline</dt>
|
||||
<dd>For an SUT to be perceived as realistic, it might require a realistic set of base data at startup, both in terms of quality and quantity. For example, large tests of a social network likely need a realistic social graph as the base state for tests: enough test users with realistic profiles as well as enough interconnections between those users must exist for the testing to be accepted.</dd>
|
||||
<dt>Seeding APIs</dt>
|
||||
<dd>The APIs by which data is seeded may be complex. It might be possible to directly write to a datastore, but doing so might bypass triggers and checks performed by the actual binaries that perform the writes.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Data can be generated in different ways, such as the following:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Handcrafted data</dt>
|
||||
<dd>Like for smaller tests, we can create test data for larger tests by hand. But it might require more work to set up data for multiple services in a large SUT, and we might need to create a lot of data for larger tests.</dd>
|
||||
<dt>Copied data</dt>
|
||||
<dd>We can copy data, typically from production. For example, we might test a map of Earth by starting with a copy of our production map data to provide a baseline and then test our changes to it.</dd>
|
||||
<dt>Sampled data</dt>
|
||||
<dd>Copying data can provide too much data to reasonably work with. Sampling data can reduce the volume, thus reducing test time and making it easier to reason about. "Smart sampling" consists of techniques to copy the minimum data necessary to achieve maximum coverage.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="verification">
|
||||
<h2>Verification</h2>
|
||||
|
||||
<p>After an SUT is running and traffic is sent to it, we must still verify the behavior. There are<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="verification of behavior" data-type="indexterm" id="id-NvHAIkh2C3Td"> </a> a few different<a contenteditable="false" data-primary="larger testing" data-secondary="structure of a large test" data-tertiary="verification" data-type="indexterm" id="id-ddHrhQh7CDTV"> </a> ways to do this:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Manual</dt>
|
||||
<dd><p>Much like when you try out your binary locally, manual verification uses humans to interact with an SUT to determine whether it functions correctly. This verification can consist of testing for regressions by performing actions as defined on a consistent test plan or it can be exploratory, working a way through different interaction paths to identify possible new failures.</p>
|
||||
<p>Note that manual regression testing does not scale sublinearly: the larger a system grows and the more journeys through it there are, the more human time is needed to manually test.</p></dd>
|
||||
<dt>Assertions</dt>
|
||||
<dd>
|
||||
<p>Much<a contenteditable="false" data-primary="assertions" data-secondary="verifying behavior of SUTs" data-type="indexterm" id="id-KWHkIrIdc9t6CoTV"> </a> like with unit tests, these are explicit checks about the intended behavior of the system. For example, for an integration test of Google search of <code>xyzzy</code>, an assertion might be as follows:</p>
|
||||
|
||||
<pre data-type="programlisting">assertThat(response.Contains("Colossal Cave"))</pre>
|
||||
</dd>
|
||||
<dt>A/B comparison (differential)</dt>
|
||||
<dd>Instead of defining explicit assertions, A/B testing involves running two copies of the SUT, sending the same data, and comparing the output.<a contenteditable="false" data-primary="A/B diff tests" data-secondary="of SUT behaviors" data-secondary-sortas="SUT" data-type="indexterm" id="id-DzHAIRCLtJCXTY"> </a> The intended behavior is not explicitly defined: a human must manually go through the differences to ensure any changes are intended.<a contenteditable="false" data-primary="larger testing" data-secondary="structure of a large test" data-startref="ix_lrgtststrct" data-type="indexterm" id="id-zoHKhwC8trCPT3"> </a></dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="types_of_larger_tests">
|
||||
<h1>Types of Larger Tests</h1>
|
||||
|
||||
<p>We can now combine these different approaches to the SUT, data, and assertions to create different kinds of large tests.<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-type="indexterm" id="ix_lrgtsttyp"> </a> Each test then has different properties as to which risks it mitigates; how much toil is required to write, maintain, and debug it; and how much it costs in terms of resources to run.</p>
|
||||
|
||||
<p>What follows is a list of different kinds of large tests that we use at Google, how they are composed, what purpose they serve, and what their limitations are:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Functional testing of one or more binaries</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Browser and device testing</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Performance, load, and stress testing</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Deployment configuration testing</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Exploratory testing</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A/B diff (regression) testing</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>User acceptance testing (UAT)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Probers and canary analysis</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Disaster recovery and chaos engineering</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>User evaluation</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Given such a wide number of combinations and thus a wide range of tests, how do we manage what to do and when? Part of designing software is drafting the test plan, and a key part of the test plan is a strategic outline of what types of testing are needed and how much of each. This test strategy identifies the primary risk vectors and the necessary testing approaches to mitigate those risk vectors.</p>
|
||||
|
||||
<p>At Google, we have a specialized engineering role of "Test Engineer," and one of the things we look for in a good test engineer is the ability to outline a test strategy for our products.</p>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect2" id="functional_testing_of_one_or_more_inter">
|
||||
<h2 class="less_space">Functional Testing of One or More Interacting Binaries</h2>
|
||||
|
||||
<p>Tests of this <a contenteditable="false" data-primary="functional tests" data-secondary="testing of one or more interacting binaries" data-type="indexterm" id="id-5XHOILhgTwSb"> </a>type have the <a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="functional testing of interacting binaries" data-type="indexterm" id="id-8GHMh7h8TRSk"> </a>following <a contenteditable="false" data-primary="binaries, interacting, functional testing of" data-type="indexterm" id="id-KWHgtahRTdSL"> </a>characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: single-machine hermetic<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="in functional test of interacting binaries" data-type="indexterm" id="id-8GHOI5IXImtOTbSw"> </a> or cloud-deployed isolated</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: handcrafted</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: assertions</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>As we have seen so far, unit tests are not capable of testing a complex system with true fidelity, simply because they are packaged in a different way than the real code is packaged. Many functional testing scenarios interact with a given binary differently than with classes inside that binary, and these functional tests require separate SUTs and thus are canonical, larger tests.</p>
|
||||
|
||||
<p>Testing the interactions of multiple binaries is, unsurprisingly, even more complicated than testing a single binary. A common use case is within microservices environments when services are deployed as many separate binaries. In this case, a functional test can cover the real interactions between the binaries by bringing up an SUT composed of all the relevant binaries and by interacting with it through a published API.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="browser_and_device_testing">
|
||||
<h2>Browser and Device Testing</h2>
|
||||
|
||||
<p>Testing web UIs<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="browser and device testing" data-type="indexterm" id="id-8GHOI7heSRSk"> </a> and mobile applications is a special <a contenteditable="false" data-primary="mobile devices, browser and device testing" data-type="indexterm" id="id-KWHXhahoSdSL"> </a>case of functional testing of one or more interacting binaries. <a contenteditable="false" data-primary="browser and device testing" data-type="indexterm" id="id-MnHotRhoSwSQ"> </a>It is possible to unit test the underlying code, but for the end users, the public API is the application itself. Having tests that interact with the application as a third party through its frontend provides an extra layer of coverage.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="performancecomma_loadcomma_and_stress_t">
|
||||
<h2>Performance, Load, and Stress testing</h2>
|
||||
|
||||
<p>Tests of this type have the <a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="performance, load, and stress testing" data-type="indexterm" id="id-KWHkIahyUdSL"> </a>following characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: cloud-deployed isolated</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: handcrafted or multiplexed from production</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: diff (performance metrics)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Although it is <a contenteditable="false" data-primary="performance" data-secondary="testing" data-type="indexterm" id="id-DzHAIKcbURSb"> </a>possible to <a contenteditable="false" data-primary="stress testing" data-type="indexterm" id="id-zoHKhGc1U9SY"> </a>test a small unit in terms <a contenteditable="false" data-primary="load, testing" data-type="indexterm" id="id-aOHBtPcLUbS3"> </a>of performance, load, and stress, often such tests require sending simultaneous traffic to an external API. That definition implies that such tests are multithreaded tests that usually test at the scope of a binary under test. However, these tests are critical for ensuring that there is no degradation in performance between versions and that the system can handle expected spikes in traffic.</p>
|
||||
|
||||
<p>As the scale of the load test grows, the scope of the input data also grows, and it eventually becomes difficult to generate the scale of load required to trigger bugs under load. Load and stress handling are "highly emergent" properties of a system; that is, these complex behaviors belong to the overall system but not the individual members. Therefore, it is important to make these tests look as close to production as possible. Each SUT requires resources akin to what production requires, and it becomes difficult to mitigate noise from the production topology.</p>
|
||||
|
||||
<p>One area of research for eliminating noise in performance tests is in modifying the deployment topology—how the various binaries are distributed across a network of machines. The machine running a binary can affect the performance characteristics; thus, if in a performance diff test, the base version runs on a fast machine (or one with a fast network) and the new version on a slow one, it can appear like a performance regression. This characteristic implies that the optimal deployment is to run both versions on the same machine. If a single machine cannot fit both versions of the binary, an alternative is to calibrate by performing multiple runs and removing peaks and valleys.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="deployment_configuration_testing">
|
||||
<h2>Deployment Configuration Testing</h2>
|
||||
|
||||
<p>Tests of this <a contenteditable="false" data-primary="deployment configuration testing" data-type="indexterm" id="id-MnHzIRhrswSQ"> </a>type have the <a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="deployment configuration testing" data-type="indexterm" id="id-DzHGhrhlsRSb"> </a>following characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: single-machine hermetic or cloud-deployed isolated</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: none</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: assertions (doesn’t crash)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Many times, it is not the code that is the source of defects but instead configuration: data files, databases, option definitions, and so on. Larger tests can test the integration of the SUT with its configuration files because these configuration files are read during the launch of the given binary.</p>
|
||||
|
||||
<p>Such a test is really a smoke test of the SUT without needing much in the way of additional data or verification. If the SUT starts successfully, the test passes. If not, the test fails.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="exploratory_testing">
|
||||
<h2>Exploratory Testing</h2>
|
||||
|
||||
<p>Tests of this type<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="exploratory testing" data-type="indexterm" id="id-DzHAIrhJHRSb"> </a> have the following<a contenteditable="false" data-primary="exploratory testing" data-type="indexterm" id="id-zoHKhBhEH9SY"> </a> characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: production or shared staging</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: production or a known test universe</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: manual</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Exploratory testing<sup><a data-type="noteref" id="ch01fn143-marker" href="ch14.html#ch01fn143">2</a></sup> is a form of manual testing that focuses not on looking for behavioral regressions by repeating known test flows, but on looking for questionable behavior by trying out new user scenarios. Trained users/testers interact with a product through its public APIs, looking for new paths through the system and for which behavior deviates from either expected or intuitive behavior, or if there are security vulnerabilities.</p>
|
||||
|
||||
<p>Exploratory testing is useful for both new and launched systems to uncover unanticipated behaviors and side effects. By having testers follow different reachable paths through the system, we can increase the system coverage and, when these testers identify bugs, capture new automated functional tests. In a sense, this is a bit like a manual "fuzz testing" version of functional integration testing.</p>
|
||||
|
||||
<section data-type="sect3" id="limitations-id00059">
|
||||
<h3>Limitations</h3>
|
||||
|
||||
<p>Manual testing does not scale sublinearly; that is, it requires human time to perform the manual tests. Any defects found by exploratory tests should be replicated with an automated test that can run much more frequently.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="bug_bashes">
|
||||
<h3>Bug bashes</h3>
|
||||
|
||||
<p>One common approach we use for manual<a contenteditable="false" data-primary="bug bashes" data-type="indexterm" id="id-w4H5IPh7TAHQSr"> </a> exploratory testing is the <a href="https://oreil.ly/zRLyA">bug bash</a>. A team of engineers and related personnel (managers, product managers, test engineers, anyone with familiarity with the product) schedules a "meeting," but at this session, everyone involved manually tests the product. There can be some published guidelines as to particular focus areas for the bug bash and/or starting points for using the system, but the goal is to provide enough interaction variety to document questionable product behaviors and outright bugs.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="asolidusb_diff_regression_testing">
|
||||
<h2>A/B Diff Regression Testing</h2>
|
||||
|
||||
<p>Tests of <a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="A/B diff (regression)" data-type="indexterm" id="id-zoH4IBhgu9SY"> </a>these type have <a contenteditable="false" data-primary="A/B diff tests" data-type="indexterm" id="id-aOHrhvhoubS3"> </a>the following <a contenteditable="false" data-primary="regression tests" data-seealso="A/B diff tests" data-type="indexterm" id="id-10H5t3hwu7S5"> </a>characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: two cloud-deployed isolated environments</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: usually multiplexed from production or sampled</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: A/B diff comparison</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Unit tests cover expected behavior paths for a small section of code. But it is impossible to predict many of the possible failure modes for a given publicly facing product. Additionally, as Hyrum’s Law states, the actual public API is not the declared one but all user-visible aspects of a product. Given those two properties, it is no surprise that A/B diff tests are possibly the most common form of larger testing at Google. This approach conceptually dates back to 1998. At Google, we have been running tests based on this model since 2001 for most of our products, starting with Ads, Search, and Maps.</p>
|
||||
|
||||
<p>A/B diff tests operate by sending traffic to a public API and comparing the responses between old and new versions (especially during migrations). Any deviations in behavior must be reconciled as either anticipated or unanticipated (regressions). In this case, the SUT is composed of two sets of real binaries: one running at the candidate version and the other running at the base version. A third binary sends traffic and compares the results.</p>
|
||||
|
||||
<p>There are other variants. We use A-A testing (comparing a system to itself) to identify nondeterministic behavior, noise, and flakiness, and to help remove those from A-B diffs. We also occasionally use A-B-C testing, comparing the last production version, the baseline build, and a pending change, to make it easy at one glance to see not only the impact of an immediate change, but also the accumulated impacts of what would be the next-to-release version.</p>
|
||||
|
||||
<p>A/B diff tests are a cheap but automatable way to detect unanticipated side effects for any launched system.</p>
|
||||
|
||||
<section data-type="sect3" id="limitations-id00060">
|
||||
<h3>Limitations</h3>
|
||||
|
||||
<p>Diff testing does introduce a few <a contenteditable="false" data-primary="A/B diff tests" data-secondary="limitations of" data-type="indexterm" id="id-W0HOIXhqS9u7Sa"> </a>challenges to solve:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Approval</dt>
|
||||
<dd>Someone must understand the results enough to know whether any differences are expected. Unlike a typical test, it is not clear whether diffs are a good or bad thing (or whether the baseline version is actually even valid), and so there is often a manual step in the process.</dd>
|
||||
<dt>Noise</dt>
|
||||
<dd>For a diff test, anything that introduces unanticipated noise into the results leads to more manual investigation of the results. It becomes necessary to remediate noise, and this is a large source of complexity in building a good diff test.</dd>
|
||||
<dt>Coverage</dt>
|
||||
<dd>Generating enough useful traffic for a diff test can be a challenging problem. The test data must cover enough scenarios to identify corner-case differences, but it is difficult to manually curate such data.</dd>
|
||||
<dt>Setup</dt>
|
||||
<dd>Configuring and maintaining one SUT is fairly challenging. Creating two at a time can double the complexity, especially if these share interdependencies.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="uat">
|
||||
<h2>UAT</h2>
|
||||
|
||||
<p>Tests of this type have the<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="UAT" data-type="indexterm" id="id-aOHzIvhaFbS3"> </a> following <a contenteditable="false" data-primary="UAT (user acceptance testing)" data-type="indexterm" id="id-10H4h3hYF7S5"> </a>characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: machine-hermetic or cloud-deployed isolated</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: handcrafted</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: assertions</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>A key aspect of unit tests is that they are written by the developer writing the code under test. But that makes it quite likely that misunderstandings about the <em>intended</em> behavior of a product are reflected not only in the code, but also the unit tests. Such unit tests verify that code is "Working as implemented" instead of "Working as intended."</p>
|
||||
|
||||
<p>For cases in which there is either a specific end customer or a customer proxy (a customer committee or even a product manager), UATs are automated tests that exercise the product through public APIs to ensure the overall behavior for specific <a href="https://oreil.ly/lOaOq">user journeys</a> is as intended. Multiple public frameworks exist (e.g., Cucumber and RSpec) to make such tests writable/readable in a user-friendly language, often in the context of "runnable specifications."</p>
|
||||
|
||||
<p>Google does not actually do a lot of automated UAT and does not use specification languages very much. Many of Google’s products historically have been created by the software engineers themselves. There has been little need for runnable specification languages because those defining the intended product behavior are often fluent in the actual coding languages themselves.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="probers_and_canary_analysis">
|
||||
<h2>Probers and Canary Analysis</h2>
|
||||
|
||||
<p>Tests of this type have the <a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="probers and canary analysis" data-type="indexterm" id="id-10H1I3hLi7S5"> </a>following characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: production</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: production</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: assertions and A/B diff (of metrics)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Probers and canary analysis are ways to ensure<a contenteditable="false" data-primary="canary analysis" data-type="indexterm" id="id-BMHGIPcniXSn"> </a> that the production environment itself is healthy.<a contenteditable="false" data-primary="probers" data-type="indexterm" id="id-w4HdhLczieSJ"> </a> In these respects, they are a form of production monitoring, but they are structurally very similar to other large tests.</p>
|
||||
|
||||
<p>Probers are functional tests that run encoded assertions against the production environment. Usually these tests perform well-known and deterministic read-only actions so that the assertions hold even though the production data changes over time. For example, a prober might perform a Google search at <a href="https://www.google.com">www.google.com</a> and verify that a result is returned, but not actually verify the contents of the result. In that respect, they are "smoke tests" of the production system, but they provide early detection of major issues.</p>
|
||||
|
||||
<p>Canary analysis is similar, except that it focuses on when a release is being pushed to the production environment. If the release is staged over time, we can run both prober assertions targeting the upgraded (canary) services as well as compare health metrics of both the canary and baseline parts of production and make sure that they are not out of line.</p>
|
||||
|
||||
<p>Probers should be used in any live system. If the production rollout process includes a phase in which the binary is deployed to a limited subset of the production machines (a canary phase), canary analysis should be used during that procedure.</p>
|
||||
|
||||
<section data-type="sect3" id="limitations-id00062">
|
||||
<h3>Limitations</h3>
|
||||
|
||||
<p>Any issues caught at this point in time (in production) are already affecting end users.</p>
|
||||
|
||||
<p>If a prober performs a mutable (write) action, it will modify the state of production. This could lead to one of three outcomes: nondeterminism and failure of the assertions, failure of the ability to write in the future, or user-visible side effects.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="disaster_recovery_and_chaos_engineering">
|
||||
<h2>Disaster Recovery and Chaos Engineering</h2>
|
||||
|
||||
<p>Tests of this type have the<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="disaster recovery and chaos engineering" data-type="indexterm" id="id-AKHrIMhnIXS2"> </a> following characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: production</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: production and user-crafted (fault injection)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: manual and A/B diff (metrics)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>These test how well your systems will react to unexpected changes or failures.</p>
|
||||
|
||||
<p>For years, Google has run <a contenteditable="false" data-primary="disaster recovery testing" data-type="indexterm" id="id-qWHaImfwI4S0"> </a>an annual war game called <a href="https://oreil.ly/17ffL">DiRT</a> (Disaster Recovery Testing) during which faults are injected into our infrastructure at a nearly planetary scale. We simulate everything from datacenter fires to malicious attacks. In one memorable case, we simulated an earthquake that completely isolated our headquarters in Mountain View, California, from the rest of the company. Doing so exposed not only technical shortcomings but also revealed the challenge of running a company when all the key decision makers were unreachable.<sup><a data-type="noteref" id="ch01fn144-marker" href="ch14.html#ch01fn144">3</a></sup></p>
|
||||
|
||||
<p>The impacts of DiRT tests require a lot of coordination across the company; by contrast, chaos engineering is more of a “continuous testing” for your technical infrastructure.<a contenteditable="false" data-primary="chaos engineering" data-type="indexterm" id="id-W0HOI4CgIAS9"> </a> <a href="https://oreil.ly/BCwdM">Made popular by Netflix</a>, chaos engineering involves writing programs that continuously introduce a background level of faults into your systems and seeing what happens. Some of the faults can be quite large, but in most cases, chaos testing tools are designed to restore functionality before things get out of hand. The goal of chaos engineering is to help teams break assumptions of stability and reliability and help them grapple with the challenges of building resiliency in. Today, teams at Google perform thousands of chaos tests each week using our own home-grown system called Catzilla.</p>
|
||||
|
||||
<p>These kinds of fault and negative tests make sense for live production systems that have enough theoretical fault tolerance to support them and for which the costs and risks of the tests themselves are affordable.</p>
|
||||
|
||||
<section data-type="sect3" id="limitations-id00069">
|
||||
<h3>Limitations</h3>
|
||||
|
||||
<p>Any issues caught at this point in time (in production) are already affecting end users.</p>
|
||||
|
||||
<p>DiRT is quite expensive to run, and therefore we run a coordinated exercise on an infrequent scale. When we create this level of outage, we actually cause pain and negatively impact employee performance.</p>
|
||||
|
||||
<p>As with a prober, if a DiRT test performs a mutable (write) action, it will modify the state of production. This could lead to either nondeterminism and failure of the assertions, failure of the ability to write in the future, or user-visible side effects.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="user_evaluation">
|
||||
<h2>User Evaluation</h2>
|
||||
|
||||
<p>Tests of this<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-tertiary="user evaluation" data-type="indexterm" id="id-BMHGIghQhXSn"> </a> type have the <a contenteditable="false" data-primary="user evaluation tests" data-type="indexterm" id="id-w4HdhPhdheSJ"> </a>following characteristics:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>SUT: production</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Data: production</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Verification: manual and A/B diffs (of metrics)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Production-based testing makes it possible to collect a lot of data about user behavior. We have a few different ways to collect metrics about the popularity of and issues with upcoming features, which provides us with an alternative to UAT:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Dogfooding</dt>
|
||||
<dd>It’s possible using limited rollouts and experiments to make features in production available to a subset of users. We do this with our own staff sometimes (eat our own dogfood), and they give us valuable feedback in the real deployment environment.</dd>
|
||||
<dt>Experimentation</dt>
|
||||
<dd><p>A new behavior is made available as an experiment to a subset of users without their knowing. Then, the experiment group is compared to the control group at an aggregate level in terms of some desired metric. For example, in YouTube, we had a limited experiment changing the way video upvotes worked (eliminating the downvote), and only a portion of the user base saw this change.</p>
|
||||
<p>This is a <a href="https://oreil.ly/OAvqF">massively important approach for Google</a>. One of the first stories a Noogler hears upon joining the company is about the time Google launched an experiment changing the background shading color for AdWords ads in Google Search and noticed a significant increase in ad clicks for users in the experimental group versus the control group.</p></dd>
|
||||
<dt>Rater evaluation</dt>
|
||||
<dd>Human raters are presented with results for a given operation and choose which one is "better" and why. This feedback is then used to determine whether a given change is positive, neutral, or negative. For example, Google has historically used rater evaluation for search queries (we have published the guidelines we give our raters). In some cases, the feedback from this ratings data can help determine launch go/no-go for algorithm changes. Rater evaluation is critical for nondeterministic systems like machine learning systems for which there is no clear correct answer, only a notion of better or worse.<a contenteditable="false" data-primary="larger testing" data-secondary="types of large tests" data-startref="ix_lrgtsttyp" data-type="indexterm" id="id-e5H6IGCBfMhVS1"> </a></dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="large_tests_and_the_developer_workflow">
|
||||
<h1>Large Tests and the Developer Workflow</h1>
|
||||
|
||||
<p>We’ve talked about what <a contenteditable="false" data-primary="developer workflow, large tests and" data-type="indexterm" id="ix_devwk"> </a>large tests are, why to have<a contenteditable="false" data-primary="larger testing" data-secondary="large tests and developer workflow" data-type="indexterm" id="ix_lrgtstdev"> </a> them, when to have them, and how much to have, but we have not said much about the who. Who writes the tests? Who runs the tests and investigates the failures? Who owns the tests? And how do we make this tolerable?</p>
|
||||
|
||||
<p>Although standard unit test infrastructure might not apply, it is still critical to integrate larger tests into the developer workflow. One way of doing this is to ensure that automated mechanisms for presubmit and post-submit execution exist, even if these are different mechanisms than the unit test ones. At Google, many of these large tests do not belong in TAP. They are nonhermetic, too flaky, and/or too resource intensive. But we still need to keep them from breaking or else they provide no signal and become too difficult to triage. What we do, then, is to have a separate post-submit continuous build for these. We also encourage running these tests presubmit, because that provides feedback directly to the author.</p>
|
||||
|
||||
<p>A/B diff tests that require manual blessing of diffs can also be incorporated into such a workflow.<a contenteditable="false" data-primary="A/B diff tests" data-secondary="running presubmit" data-type="indexterm" id="id-gKHYIVcrUN"> </a> For presubmit, it can be a code-review requirement to approve any diffs in the UI before approving the change. One such test we have files release-blocking bugs automatically if code is submitted with unresolved diffs.</p>
|
||||
|
||||
<p>In some cases, tests are so large or painful that presubmit execution adds too much developer friction. These tests still run post-submit and are also run as part of the release process. The drawback to not running these presubmit is that the taint makes it into the monorepo and we need to identify the culprit change to roll it back. But we need to make the trade-off between developer pain and the incurred change latency and the reliability of the continuous build.</p>
|
||||
|
||||
<section data-type="sect2" id="authoring_large_tests">
|
||||
<h2>Authoring Large Tests</h2>
|
||||
|
||||
<p>Although the<a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="authoring large tests" data-type="indexterm" id="id-5XHOILhLCvUb"> </a> structure of large tests is fairly standard, there<a contenteditable="false" data-primary="authoring large tests" data-type="indexterm" id="id-8GHMh7hgCVUk"> </a> is still a challenge with creating such a test, especially <a contenteditable="false" data-primary="larger testing" data-secondary="large tests and developer workflow" data-tertiary="authoring large tests" data-type="indexterm" id="id-KWHgtahzCqUL"> </a>if it is the first time someone on the team has done so.</p>
|
||||
|
||||
<p>The best way to make it possible to write such tests is to have clear libraries, documentation, and examples. Unit tests are easy to write because of native language support (JUnit was once esoteric but is now mainstream).<a contenteditable="false" data-primary="JUnit" data-type="indexterm" id="id-8GHOIKtgCVUk"> </a> We reuse these assertion libraries for functional integration tests, but we also have created over time libraries for interacting with SUTs, for running A/B diffs, for seeding test data, and for orchestrating test workflows. </p>
|
||||
|
||||
<p>Larger tests are more expensive to maintain, in both resources and human time, but not all large tests are created equal. One reason that A/B diff tests are popular is that they have less human cost in maintaining the verification step. Similarly, production SUTs have less maintenance cost than isolated hermetic SUTs.<a contenteditable="false" data-primary="systems under test (SUTs)" data-secondary="production vs. isolated hermetic SUTs" data-type="indexterm" id="id-KWHkI7czCqUL"> </a> And because all of this authored infrastructure and code must be maintained, the cost savings can <span class="keep-together">compound.</span></p>
|
||||
|
||||
<p>However, this cost must be looked at holistically. If the cost of manually reconciling diffs or of supporting and safeguarding production testing outweighs the savings, it becomes ineffective.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="running_large_tests">
|
||||
<h2>Running Large Tests</h2>
|
||||
|
||||
<p>We mentioned above how our larger tests don’t fit in TAP and so we have alternate continuous builds and presubmits for them.<a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="running large tests" data-type="indexterm" id="ix_devwkrun"> </a><a contenteditable="false" data-primary="larger testing" data-secondary="large tests and developer workflow" data-tertiary="running large tests" data-type="indexterm" id="ix_lrgtstdevrun"> </a><a contenteditable="false" data-primary="presubmits" data-secondary="infrastructure for large tests" data-type="indexterm" id="id-MnHotRh0TrUQ"> </a> One of the initial challenges for our engineers is how to even run nonstandard tests and how to iterate on them.</p>
|
||||
|
||||
<p>As much as possible, we have tried to make our larger tests run in ways familiar for our engineers. Our presubmit infrastructure puts a common API in front of both running these tests and running TAP tests, and our code review infrastructure shows both sets of results. But many large tests are bespoke and thus need specific documentation for how to run them on demand. This can be a source of frustration for unfamiliar engineers.</p>
|
||||
|
||||
<section data-type="sect3" id="speeding_up_tests">
|
||||
<h3>Speeding up tests</h3>
|
||||
|
||||
<p>Engineers<a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="running large tests" data-tertiary="speeding up tests" data-type="indexterm" id="id-DzHAIrhBcnTmUY"> </a> don’t wait for slow tests.<a contenteditable="false" data-primary="tests" data-secondary="speeding up" data-type="indexterm" id="id-zoHKhBhMcLTlU3"> </a><a contenteditable="false" data-primary="speeding up tests" data-type="indexterm" id="id-aOHBtvhOc2ToUD"> </a><a contenteditable="false" data-primary="execution time for tests" data-secondary="speeding up tests" data-type="indexterm" id="id-10Hec3hlc4TDUW"> </a> The slower a test is, the less frequently an engineer will run it, and the longer the wait after a failure until it is passing again.</p>
|
||||
|
||||
<p>The best way to speed up a test is often to reduce its scope or to split a large test into two smaller tests that can run in parallel. But there are some other tricks that you can do to speed up larger tests.</p>
|
||||
|
||||
<p>Some naive tests will use time-based sleeps to wait for nondeterministic action to occur, and this is quite common in larger tests. However, these tests do not have thread limitations, and real production users want to wait as little as possible, so it is best for tests to react the way real production users would. Approaches include the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Polling for a state transition repeatedly over a time window for an event to complete with a frequency closer to microseconds. You can combine this with a timeout value in case a test fails to reach a stable state.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Implementing an event handler.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Subscribing to a notification system for an event completion.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Note that tests that rely on sleeps and timeouts will all start failing when the fleet running those tests becomes overloaded, which spirals because those tests need to be rerun more often, increasing the load further.</p>
|
||||
|
||||
<dl>
|
||||
<dt>Lower internal system timeouts and delays</dt>
|
||||
<dd>
|
||||
<p>A production system is usually configured assuming a distributed deployment topology, but an SUT might be deployed on a single machine (or at least a cluster of colocated machines). If there are hardcoded timeouts or (especially) sleep statements in the production code to account for production system delay, these should be made tunable and reduced when running tests.</p>
|
||||
</dd>
|
||||
<dt>Optimize test build time</dt>
|
||||
<dd>
|
||||
<p>One downside of our monorepo is that all of the dependencies for a large test are built and provided as inputs, but this might not be necessary for some larger tests. If the SUT is composed of a core part that is truly the focus of the test and some other necessary peer binary dependencies, it might be possible to use prebuilt versions of those other binaries at a known good version. Our build system (based on the monorepo) does not support this model easily, but the approach is actually more reflective of production in which different services release at different versions.</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="driving_out_flakiness">
|
||||
<h3>Driving out flakiness</h3>
|
||||
|
||||
<p>Flakiness is bad enough<a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="running large tests" data-tertiary="driving out flakiness" data-type="indexterm" id="id-zoH4IBhrfLTlU3"> </a> for unit tests, but for larger tests, it can make them unusable.<a contenteditable="false" data-primary="flaky tests" data-secondary="driving out flakiness in large tests" data-type="indexterm" id="id-aOHrhvhKf2ToUD"> </a> A team should view eliminating flakiness of such tests as a high priority. But how can flakiness be removed from such tests?</p>
|
||||
|
||||
<p class="pagebreak-before">Minimizing flakiness starts with reducing the scope of the test—a hermetic SUT will not be at risk of the kinds of multiuser and real-world flakiness of production or a shared staging environment, and a single-machine hermetic SUT will not have the network and deployment flakiness issues of a distributed SUT. But you can mitigate other flakiness issues through test design and implementation and other techniques. In some cases, you will need to balance these with test speed.</p>
|
||||
|
||||
<p>Just as making tests reactive or event driven can speed them up, it can also remove flakiness. Timed sleeps require timeout maintenance, and these timeouts can be embedded in the test code. Increasing internal system timeouts can reduce flakiness, whereas reducing internal timeouts can lead to flakiness if the system behaves in a nondeterministic way. The key here is to identify a trade-off that defines both a tolerable system behavior for end users (e.g., our maximum allowable timeout is <em>n</em> seconds) but handles flaky test execution behaviors well.</p>
|
||||
|
||||
<p>A bigger problem with internal system timeouts is that exceeding them can lead to difficult errors to triage. A production system will often try to limit end-user exposure to catastrophic failure by handling possible internal system issues gracefully. For example, if Google cannot serve an ad in a given time limit, we don’t return a 500, we just don’t serve an ad. But this looks to a test runner as if the ad-serving code might be broken when there is just a flaky timeout issue. It’s important to make the failure mode obvious in this case and to make it easy to tune such internal timeouts for test scenarios.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="making_tests_understandable">
|
||||
<h3>Making tests understandable</h3>
|
||||
|
||||
<p>A specific case for <a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="running large tests" data-tertiary="making tests understandable" data-type="indexterm" id="id-aOHzIvhlC2ToUD"> </a>which it can be difficult <a contenteditable="false" data-primary="tests" data-secondary="making understandable" data-type="indexterm" id="id-10H4h3hWC4TDUW"> </a>to integrate <a contenteditable="false" data-primary="clear tests, writing" data-secondary="making large tests understandable" data-type="indexterm" id="id-AKHRtMhNCJT5Un"> </a>tests into the developer workflow is when those tests produce results that are unintelligible to the engineer running the tests. Even unit tests can produce some confusion—if my change breaks your test, it can be difficult to understand why if I am generally unfamiliar with your code—but for larger tests, such confusion can be insurmountable. Tests that are assertive must provide a clear pass/fail signal and must provide meaningful error output to help triage the source of failure. Tests that require human investigation, like A/B diff tests, require special handling to be meaningful or else risk being skipped during presubmit.</p>
|
||||
|
||||
<p>How does this work in practice? A good<a contenteditable="false" data-primary="failures" data-secondary="large test that fails" data-type="indexterm" id="id-10H1INtWC4TDUW"> </a> large test that fails should do the following:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Have a message that clearly identifies what the failure is</dt>
|
||||
<dd>The worst-case scenario is to have an error that just says "Assertion failed" and a stack trace. A good error anticipates the test runner’s unfamiliarity with the code and provides a message that gives context: "In test_ReturnsOneFullPageOfSearchResultsForAPopularQuery, expected 10 search results but got 1." For a performance or A/B diff test that fails, there should be a clear explanation in the output of what is being measured and why the behavior is considered suspect.</dd>
|
||||
<dt>Minimize the effort necessary to identify the root cause of the discrepancy</dt>
|
||||
<dd>A stack trace is not useful for larger tests because the call chain can span multiple process boundaries. Instead, it’s necessary to produce a trace across the call chain or to invest in automation that can narrow down the culprit. The test should produce some kind of artifact to this effect. For example, <a href="https://oreil.ly/FXzbv">Dapper</a> is a framework used by Google to associate a single request ID with all the requests in an RPC call chain, and all of the associated logs for that request can be correlated by that ID to facilitate tracing.</dd>
|
||||
<dt>Provide support and contact information.</dt>
|
||||
<dd>It should be easy for the test runner to get help by making the owners and supporters of the test easy to contact.<a contenteditable="false" data-primary="larger testing" data-secondary="large tests and developer workflow" data-startref="ix_lrgtstdevrun" data-tertiary="running large tests" data-type="indexterm" id="id-JoHMIDC0caC9TRUk"> </a><a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="running large tests" data-startref="ix_devwkrun" data-type="indexterm" id="id-nBHMhWCdcAC4TNUD"> </a></dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="owning_large_tests">
|
||||
<h2>Owning Large Tests</h2>
|
||||
|
||||
<p>Larger tests must have <a contenteditable="false" data-primary="developer workflow, large tests and" data-secondary="running large tests" data-tertiary="owning large tests" data-type="indexterm" id="id-KWHkIahoSqUL"> </a>documented <a contenteditable="false" data-primary="ownership of code" data-secondary="owning large tests" data-type="indexterm" id="id-MnH6hRhoSrUQ"> </a>owners—engineers who can adequately review changes to the test and who can be counted on to provide support in the case of test failures. Without proper ownership, a test can fall victim to the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>It becomes more difficult for contributors to modify and update the test</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It takes longer to resolve test failures</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>And the test rots.</p>
|
||||
|
||||
<p>Integration tests of components within a particular project should be owned by the project lead. Feature-focused tests (tests that cover a particular business feature across a set of services) should be owned by a "feature owner"; in some cases, this owner might be a software engineer responsible for the feature implementation end to end; in other cases it might be a product manager or a "test engineer" who owns the description of the business scenario. Whoever owns the test must be empowered to ensure its overall health and must have both the ability to support its maintenance and the incentives to do so.</p>
|
||||
|
||||
<p>It is possible to build automation around test owners if this information is recorded in a structured way. Some approaches that we use include the following:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Regular code ownership</dt>
|
||||
<dd>In many cases, a larger test is a standalone code artifact that lives in a particular location in our codebase. In that case, we can use the OWNERS (<a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a>) information already present in the monorepo to hint to automation that the owner(s) of a particular test are the owners of the test code.</dd>
|
||||
<dt>Per-test annotations</dt>
|
||||
<dd>In some cases, multiple test methods <a contenteditable="false" data-primary="annotations, per-test, documenting ownership" data-type="indexterm" id="id-w4H5ILc7TeSMUr"> </a>can be added to a single test class or module, and each of these test methods can have a different feature owner. We use per-language structured annotations to document the test owner in each of these cases so that if a particular test method fails, we can identify the owner to contact.<a contenteditable="false" data-primary="developer workflow, large tests and" data-startref="ix_devwk" data-type="indexterm" id="id-qWHNhec9T4SDUo"> </a><a contenteditable="false" data-primary="larger testing" data-secondary="large tests and developer workflow" data-startref="ix_lrgtstdev" data-type="indexterm" id="id-W0HDtgczTASgUa"> </a></dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00018">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>A comprehensive test suite requires larger tests, both to ensure that tests match the fidelity of the system under test and to address issues that unit tests cannot adequately cover. Because such tests are necessarily more complex and slower to run, care must be taken to ensure such larger tests are properly owned, well maintained, and run when necessary (such as before deployments to production). Overall, such larger tests must still be made as small as possible (while still retaining fidelity) to avoid developer friction. A comprehensive test strategy that identifies the risks of a system, and the larger tests that address them, is necessary for most software projects.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00118">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Larger tests cover things unit tests cannot.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Large tests are composed of a System Under Test, Data, Action, and Verification.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A good design includes a test strategy that identifies risks and larger tests that mitigate them.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Extra effort must be made with larger tests to keep them from creating friction in the developer workflow.<a contenteditable="false" data-primary="larger testing" data-startref="ix_lrgtst" data-type="indexterm" id="id-5XHOIaIlc0hXHO"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn140"><sup><a href="ch14.html#ch01fn140-marker">1</a></sup>See <a data-type="xref" href="ch23.html#continuous_delivery">Continuous Delivery</a> and <a data-type="xref" href="ch25.html#compute_as_a_service">Compute as a Service</a> for more information.</p><p data-type="footnote" id="ch01fn143"><sup><a href="ch14.html#ch01fn143-marker">2</a></sup>James A. Whittaker, <em>Exploratory Software Testing: Tips, Tricks, Tours, and Techniques to Guide Test Design</em> (New York: Addison-Wesley Professional, 2009).</p><p data-type="footnote" id="ch01fn144"><sup><a href="ch14.html#ch01fn144-marker">3</a></sup>During this test, almost no one could get anything done, so many people gave up on work and went to one of our many cafes, and in doing so, we ended up creating a DDoS attack on our cafe teams!</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
237
clones/abseil.io/resources/swe-book/html/ch15.html
Normal file
|
@ -0,0 +1,237 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="deprecation">
|
||||
<h1>Deprecation</h1>
|
||||
|
||||
<p class="byline">Written by Hyrum Wright</p>
|
||||
|
||||
<p class="byline">Edited by Tom Manshreck</p>
|
||||
|
||||
<blockquote data-type="epigraph">
|
||||
<p>I love deadlines. I like the whooshing sound they make as they fly by.</p>
|
||||
|
||||
<p data-type="attribution">Douglas Adams</p>
|
||||
</blockquote>
|
||||
|
||||
<p>All systems age. <a contenteditable="false" data-primary="deprecation" data-type="indexterm" id="ix_depr"> </a>Even though software is a digital asset and the physical bits themselves don’t degrade, new technologies, libraries, techniques, languages, and other environmental changes over time render existing systems obsolete. Old systems require continued maintenance, esoteric expertise, and generally more work as they diverge from the surrounding ecosystem. It’s often better to invest effort in turning off obsolete systems, rather than letting them lumber along indefinitely alongside the systems that replace them. But the number of obsolete systems still running suggests that, in practice, doing so is not trivial. We refer to the process of orderly migration away from and eventual removal of obsolete systems as <em>deprecation</em>.</p>
|
||||
|
||||
<p>Deprecation is yet another topic that more accurately<a contenteditable="false" data-primary="software engineering" data-secondary="deprecation and" data-type="indexterm" id="id-owfkHeSn"> </a> belongs to the discipline of software engineering than programming because it requires thinking about how to manage a system over time. For long-running software ecosystems, planning for and executing deprecation correctly reduces resource costs and improves velocity by removing the redundancy and complexity that builds up in a system over time. On the other hand, poorly deprecated systems may cost more than leaving them alone. While deprecating systems requires additional effort, it’s possible to plan for deprecation during the design of the system so that it’s easier to eventually decommission and remove it. Deprecations can affect systems ranging from individual function calls to entire software stacks. For concreteness, much of what follows focuses on code-level deprecations.</p>
|
||||
|
||||
<p>Unlike with most of the other topics we have discussed in this book, Google is still learning how best to deprecate and remove software systems. This chapter describes the lessons we’ve learned as we’ve deprecated large and heavily used internal systems. Sometimes, it works as expected, and sometimes it doesn’t, but the general problem of removing obsolete systems remains a difficult and evolving concern in the industry.</p>
|
||||
|
||||
<p>This chapter primarily deals with deprecating technical systems, not end-user products. The distinction is somewhat arbitrary given that an external-facing API is just another sort of product, and an internal API may have consumers that consider themselves end users. Although many of the principles apply to turning down a public product, we concern ourselves here with the technical and policy aspects of deprecating and removing obsolete systems where the system owner has visibility into its use.</p>
|
||||
|
||||
<section data-type="sect1" id="why_deprecatequestion_mark">
|
||||
<h1>Why Deprecate?</h1>
|
||||
|
||||
<p>Our discussion of deprecation <a contenteditable="false" data-primary="deprecation" data-secondary="reasons for" data-type="indexterm" id="id-AEfLHXCvsB"> </a>begins from the fundamental premise that <em>code is a liability, not an asset</em>. After all, if code were an asset, why should we even bother spending time trying to turn down and remove obsolete systems? Code has costs, some of which are borne in the process of creating a system, but many other costs are borne as a system is maintained across its lifetime. These ongoing costs, such as the operational resources required to keep a system running or the effort to continually update its codebase as surrounding ecosystems evolve, mean that it’s worth evaluating the trade-offs between keeping an aging system running or working to turn it down.</p>
|
||||
|
||||
<p>The age of a system alone doesn’t justify its deprecation. A system could be finely crafted over several years to be the epitome of software form and function. Some software systems, such as the LaTeX typesetting system, have been improved over the course of decades, and even though changes still happen, they are few and far between. Just because something is old, it does not follow that it is obsolete.</p>
|
||||
|
||||
<p>Deprecation is best suited for systems that are demonstrably obsolete and a replacement exists that provides comparable functionality. The new system might use resources more efficiently, have better security properties, be built in a more sustainable fashion, or just fix bugs. Having two systems to accomplish the same thing might not seem like a pressing problem, but over time, the costs of maintaining them both can grow substantially. Users may need to use the new system, but still have dependencies that use the obsolete one.</p>
|
||||
|
||||
<p>The two systems might need to interface with each other, requiring complicated transformation code. As both systems evolve, they may come to depend on each other, making eventual removal of either more difficult. In the long run, we’ve discovered that having multiple systems performing the same function also impedes the evolution of the newer system because it is still expected to maintain compatibility with the old one. Spending the effort to remove the old system can pay off as the replacement system can now evolve more quickly.</p>
|
||||
|
||||
<aside data-type="sidebar" id="earlier_we_made_the_assertion_that_quot">
|
||||
<p>Earlier we made <a contenteditable="false" data-primary="code" data-secondary="code as a liability, not an asset" data-type="indexterm" id="id-kPfwHGH6SwsY"> </a>the assertion that “code is a liability, not an asset.” If that is true, why have we spent most of this book discussing the most efficient way to build software systems that can live for decades? Why put all that effort into creating more code when it’s simply going to end up on the liability side of the balance sheet?</p>
|
||||
|
||||
<p>Code <em>itself</em> doesn’t bring value: it is the <em>functionality</em> that it provides that brings value. That functionality is an asset if it meets a user need: the code that implements this functionality is simply a means to that end. If we could get the same functionality from a single line of maintainable, understandable code as 10,000 lines of convoluted spaghetti code, we would prefer the former. Code itself carries a cost—the simpler the code is, while maintaining the same amount of functionality, the better.</p>
|
||||
|
||||
<p>Instead of focusing on how much code we can produce, or how large is our codebase, we should instead focus on how much functionality it can deliver per unit of code and try to maximize that metric. One of the easiest ways to do so isn’t writing more code and hoping to get more functionality; it’s removing excess code and systems that are no longer needed. Deprecation policies and procedures make this possible.</p>
|
||||
</aside>
|
||||
|
||||
<p>Even though deprecation is useful, we’ve learned at Google that organizations have a limit on the amount of deprecation work that is reasonable to undergo simultaneously, from the aspect of the teams doing the deprecation as well as the customers of those teams. For example, although everybody appreciates having freshly paved roads, if the public works department decided to close down <em>every</em> road for paving simultaneously, nobody would go anywhere. By focusing their efforts, paving crews can get specific jobs done faster while also allowing other traffic to make progress. Likewise, it’s important to choose deprecation projects with care and then commit to following through on finishing them.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="why_is_deprecation_so_hardquestion_mark">
|
||||
<h1>Why Is Deprecation So Hard?</h1>
|
||||
|
||||
<p>We’ve mentioned Hyrum’s Law <a contenteditable="false" data-primary="deprecation" data-secondary="difficulty of" data-type="indexterm" id="ix_deprdiff"> </a>elsewhere in this book, but it’s worth repeating its applicability here: the more users of a system, the higher <a contenteditable="false" data-primary="Hyrum's Law" data-secondary="deprecation and" data-type="indexterm" id="id-NmfeCGCytx"> </a>the probability that users are using it in unexpected and unforeseen ways, and the harder it will be to deprecate and remove such a system. Their usage just “happens to work” instead of being “guaranteed to work.” In this context, removing a system can be thought of as the ultimate change: we aren’t just changing behavior, we are removing that behavior completely! This kind of radical alteration will shake loose a number of unexpected dependents.</p>
|
||||
|
||||
<p>To further complicate matters, deprecation usually isn’t an option until a newer system is available that provides the same (or better!) functionality. The new system might be better, but it is also different: after all, if it were exactly the same as the obsolete system, it wouldn’t provide any benefit to users who migrate to it (though it might benefit the team operating it). This functional difference means a one-to-one match between the old system and the new system is rare, and every use of the old system must be evaluated in the context of the new one.</p>
|
||||
|
||||
<p>Another surprising reluctance to deprecate is emotional attachment to old systems, particularly those that the deprecator had a hand in helping to create. An example of this change aversion happens when systematically removing old code at Google: we’ve occasionally encountered resistance of the form “I like this code!” It can be difficult to convince engineers to tear down something they’ve spent years building. This is an understandable response, but ultimately self-defeating: if a system is obsolete, it has a net cost on the organization and should be removed. One of the ways we’ve addressed concerns about keeping old code within Google is by ensuring that the source code repository isn’t just searchable at trunk, but also historically. Even code that has been removed can be found again (see <a data-type="xref" href="ch17.html#code_search">Code Search</a>).</p>
|
||||
|
||||
<aside data-type="sidebar" id="thereapostrophes_an_old_joke_within_goo">
|
||||
<p>There’s an old joke within Google that there are two ways of doing things: the one that’s deprecated, and the one that’s not-yet-ready. This is usually the result of a new solution being “almost” done and is the unfortunate reality of working in a technological environment that is complex and fast-paced.</p>
|
||||
|
||||
<p>Google engineers have become used to working in this environment, but it can still be disconcerting. Good documentation, plenty of signposts, and teams of experts helping with the deprecation and migration process all make it easier to know whether you should be using the old thing, with all its warts, or the new one, with all its <span class="keep-together">uncertainties.</span></p>
|
||||
</aside>
|
||||
|
||||
<p>Finally, funding and executing deprecation efforts can be difficult politically; staffing a team and spending time removing obsolete systems costs real money, whereas the costs of doing nothing and letting the system lumber along unattended are not readily observable. It can be difficult to convince the relevant stakeholders that deprecation efforts are worthwhile, particularly if they negatively impact new feature development. Research techniques, such as those described in <a data-type="xref" href="ch07.html#measuring_engineering_productivity">Measuring Engineering Productivity</a>, can provide concrete evidence that a deprecation is worthwhile.</p>
|
||||
|
||||
<p>Given the difficulty in deprecating and removing obsolete software systems, it is often easier for users to evolve a system <em>in situ</em>, rather than completely replacing it. Incrementality doesn’t avoid the deprecation process altogether, but it does break it down into smaller, more manageable chunks that can yield incremental benefits. Within Google, we’ve observed that migrating to entirely new systems is <em>extremely</em> expensive, and the costs are frequently underestimated. Incremental deprecation efforts <span class="keep-together">accomplished</span> by in-place refactoring can keep existing systems running while making it easier to deliver value to users.<a contenteditable="false" data-primary="deprecation" data-secondary="difficulty of" data-startref="ix_deprdiff" data-type="indexterm" id="id-LrfQIwTyt0"> </a></p>
|
||||
|
||||
<section data-type="sect2" id="deprecation_during_design">
|
||||
<h2>Deprecation During Design</h2>
|
||||
|
||||
<p>Like many engineering activities, deprecation of a software system can be planned as those systems are first built.<a contenteditable="false" data-primary="designing systems to eventually be deprecated" data-type="indexterm" id="id-EPf0HQCecntR"> </a><a contenteditable="false" data-primary="deprecation" data-secondary="during design" data-type="indexterm" id="id-LrfBCKCbckt3"> </a> Choices of programming language, software architecture, team composition, and even company policy and culture all impact how easy it will be to eventually remove a system after it has reached the end of its useful life.</p>
|
||||
|
||||
<p>The concept of designing systems so that they can eventually be deprecated might be radical in software engineering, but it is common in other engineering disciplines. Consider the example of a nuclear power plant, which is an extremely complex piece of engineering. As part of the design of a nuclear power station, its eventual decommissioning after a lifetime of productive service must be taken into account, even going so far as to allocate funds for this purpose.<sup><a data-type="noteref" id="ch01fn146-marker" href="ch15.html#ch01fn146">1</a></sup> Many of the design choices in building a nuclear power plant are affected when engineers know that it will eventually need to be decommissioned.</p>
|
||||
|
||||
<p>Unfortunately, software systems are rarely so thoughtfully designed. Many software engineers are attracted to the task of building and launching new systems, not maintaining existing ones. The corporate culture of many companies, including Google, emphasizes building and shipping new products quickly, which often provides a disincentive for designing with deprecation in mind from the beginning. And in spite of the popular notion of software engineers as data-driven automata, it can be psychologically difficult to plan for the eventual demise of the creations we are working so hard to build.</p>
|
||||
|
||||
<p>So, what kinds of considerations should we think about when designing systems that we can more easily deprecate in the future? Here are a couple of the questions we encourage engineering teams at Google to ask:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>How easy will it be for my consumers to migrate from my product to a potential replacement?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>How can I replace parts of my system incrementally?</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Many of these questions relate to how a system provides and consumes dependencies. For a more thorough discussion of how we manage these dependencies, see <span class="keep-together"><a data-type="xref" href="ch16.html#version_control_and_branch_management">Version Control and Branch Management</a></span>.</p>
|
||||
|
||||
<p>Finally, we should point out that the decision as to whether to support a project long term is made when an organization first decides to build the project. After a software system exists, the only remaining options are support it, carefully deprecate it, or let it stop functioning when some external event causes it to break. These are all valid options, and the trade-offs between them will be organization specific. A new startup with a single project will unceremoniously kill it when the company goes bankrupt, but a large company will need to think more closely about the impact across its portfolio and reputation as they consider removing old projects. As mentioned earlier, Google is still learning how best to make these trade-offs with our own internal and external products.</p>
|
||||
|
||||
<p>In short, don’t start projects that your organization isn’t committed to support for the expected lifespan of the organization. Even if the organization chooses to deprecate and remove the project, there will still be costs, but they can be mitigated through planning and investments in tools and policy.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="types_of_deprecation">
|
||||
<h1>Types of Deprecation</h1>
|
||||
|
||||
<p>Deprecation isn’t a<a contenteditable="false" data-primary="deprecation" data-secondary="types of" data-type="indexterm" id="ix_deprtyp"> </a> single kind of process, but a continuum of them, ranging from “we’ll turn this off someday, we hope” to “this system is going away tomorrow, customers better be ready for that.” Broadly speaking, we divide this continuum into two separate areas: advisory and compulsory.</p>
|
||||
|
||||
<section data-type="sect2" id="advisory_deprecation">
|
||||
<h2>Advisory Deprecation</h2>
|
||||
|
||||
<p><em>Advisory</em> deprecations are those that don’t <a contenteditable="false" data-primary="deprecation" data-secondary="types of" data-tertiary="advisory deprecation" data-type="indexterm" id="id-JgfeCjCRUgf8"> </a>have a deadline<a contenteditable="false" data-primary="advisory deprecations" data-type="indexterm" id="id-bafmUJC5UpfO"> </a> and aren’t high priority for the organization (and for which the company isn’t willing to dedicate resources). These could also be labeled <em>aspirational</em> deprecations: the team knows the system has been replaced, and although they hope clients will eventually migrate to the new system, they don’t have imminent plans to either provide support to help move clients or delete the old system. This kind of deprecation often lacks enforcement: we hope that clients move, but can’t force them to. As our friends in SRE will readily tell you: “Hope is not a strategy.”</p>
|
||||
|
||||
<p>Advisory deprecations are a good tool for advertising the existence of a new system and encouraging early adopting users to start trying it out. Such a new system should <em>not</em> be considered in a beta period: it should be ready for production uses and loads and should be prepared to support new users indefinitely. Of course, any new system is going to experience growing pains, but after the old system has been deprecated in any way, the new system will become a critical piece of the organization’s <span class="keep-together">infrastructure.</span></p>
|
||||
|
||||
<p>One scenario we’ve seen at Google in which advisory deprecations have strong benefits is when the new system offers compelling benefits to its users. In these cases, <span class="keep-together">simply</span> notifying users of this new system and providing them self-service tools to migrate to it often encourages adoption. However, the benefits cannot be simply incremental: they must be transformative. Users will be hesitant to migrate on their own for marginal benefits, and even new systems with vast improvements will not gain full adoption using only advisory deprecation efforts.</p>
|
||||
|
||||
<p>Advisory deprecation allows system authors to nudge users in the desired direction, but they should not be counted on to do the majority of migration work. It is often tempting to simply put a deprecation warning on an old system and walk away without any further effort. Our experience at Google has been that this can lead to (slightly) fewer new uses of an obsolete system, but it rarely leads to teams actively migrating away from it. Existing uses of the old system exert a sort of conceptual (or technical) pull toward it: comparatively many uses of the old system will tend to pick up a large share of new uses, no matter how much we say, “Please use the new system.” The old system will continue to require maintenance and other resources unless its users are more actively encouraged to migrate.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="compulsory_deprecation">
|
||||
<h2>Compulsory Deprecation</h2>
|
||||
|
||||
<p>This active encouragement comes in the form of <em>compulsory</em> deprecation. <a contenteditable="false" data-primary="deprecation" data-secondary="types of" data-tertiary="compulsory deprecation" data-type="indexterm" id="id-bafwCJCaIpfO"> </a><a contenteditable="false" data-primary="compulsory deprecation" data-type="indexterm" id="id-y9feUlCaIlfz"> </a>This kind of deprecation usually comes with a deadline for removal of the obsolete system: if users continue to depend on it beyond that date, they will find their own systems no longer work.</p>
|
||||
|
||||
<p>Counterintuitively, the best way for compulsory deprecation efforts to scale is by localizing the expertise of migrating <a contenteditable="false" data-primary="migrations" data-secondary="migrating users from an obsolete system" data-type="indexterm" id="id-bafEHOUaIpfO"> </a>users to within a single team of experts—usually the team responsible for removing the old system entirely. This team has incentives to help others migrate from the obsolete system and can develop experience and tools that can then be used across the organization. Many of these migrations can be effected using the same tools discussed in <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>.</p>
|
||||
|
||||
<p>For compulsory deprecation to actually work, its schedule needs to have an enforcement mechanism. This does not imply that the schedule can’t change, but empower the team running the deprecation process to break noncompliant users after they have been sufficiently warned through efforts to migrate them. Without this power, it becomes easy for customer teams to ignore deprecation work in favor of features or other more pressing work.</p>
|
||||
|
||||
<p>At the same time, compulsory deprecations without staffing to do the work can come across to customer teams as mean spirited, which usually impedes completing the deprecation. Customers simply see such deprecation work as an unfunded mandate, requiring them to push aside their own priorities to do work just to keep their services running. This feels much like the “running to stay in place” phenomenon and creates friction between infrastructure maintainers and their customers. It’s for this reason that we strongly advocate that compulsory deprecations are actively staffed by a specialized team through completion.</p>
|
||||
|
||||
<p>It’s also worth noting that even with the force of policy behind them, compulsory deprecations can still face political hurdles. Imagine trying to enforce a compulsory deprecation effort when the last remaining user of the old system is a critical piece of infrastructure your entire organization depends on. How willing would you be to break that infrastructure—and, transitively, everybody that depends on it—just for the sake of making an arbitrary deadline? It is hard to believe the deprecation is really compulsory if that team can veto its progress.</p>
|
||||
|
||||
<p>Google’s monolithic repository and dependency graph gives us tremendous insight into how systems are used across our ecosystem. Even so, some teams might not even know they have a dependency on an obsolete system, and it can be difficult to discover these dependencies analytically. It’s also possible to find them dynamically through tests of increasing frequency and duration during which the old system is turned off temporarily. These intentional changes provide a mechanism for discovering unintended dependencies by seeing what breaks, thus alerting teams to a need to prepare for the upcoming deadline. Within Google, we occasionally change the name of implementation-only symbols to see which users are depending on them unaware.</p>
|
||||
|
||||
<p>Frequently at Google, when a system is slated for deprecation and removal, the team will announce planned outages of increasing duration in the months and weeks prior to the turndown. Similar to Google’s Disaster Recovery Testing (DiRT) exercises, these events often discover unknown dependencies between running systems. <a contenteditable="false" data-primary="dependencies" data-secondary="unknown, discovering during deprecation" data-type="indexterm" id="id-Qef9HqcDI3fz"> </a>This incremental approach allows those dependent teams to discover and then plan for the system’s eventual removal, or even work with the deprecating team to adjust their timeline. (The same principles also apply for static code dependencies, but the semantic information provided by static analysis tools is often sufficient to detect all the dependencies of the obsolete system.)</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="deprecation_warnings">
|
||||
<h2>Deprecation Warnings</h2>
|
||||
|
||||
<p>For both advisory and compulsory deprecations, it is often useful to have a programmatic way <a contenteditable="false" data-primary="deprecation" data-secondary="types of" data-tertiary="deprecation warnings" data-type="indexterm" id="id-bafEHJCAhpfO"> </a>of marking systems as deprecated so that users are warned about their use and encouraged to move away. It’s often tempting to just mark something as deprecated and hope its uses eventually disappear, but remember: “hope is not a strategy.” Deprecation warnings can help prevent new uses, but rarely lead to migration of existing systems.</p>
|
||||
|
||||
<p>What usually happens in practice is that these warnings accumulate over time. If they are used in a transitive context (for example, library A depends on library B, which depends on library C, and C issues a warning, which shows up when A is built), these warnings can soon overwhelm users of a system to the point where they ignore them altogether.<a contenteditable="false" data-primary="alert fatigue" data-type="indexterm" id="id-y9f8HAUlhlfz"> </a> In health care, this phenomenon is known as “<a href="https://oreil.ly/uYYef">alert fatigue</a>.”</p>
|
||||
|
||||
<p>Any deprecation warning issued to a user needs to have two properties: actionability and relevance. A warning is <em>actionable</em> if the user can use the warning to actually perform some relevant action, not just in theory, but in practical terms, given the expertise in that problem area that we expect for an average engineer. For example, a tool might warn that a call to a given function should be replaced with a call to its updated counterpart, or an email might outline the steps required to move data from an old system to a new one. In each case, the warning provided the next steps that an engineer can perform to no longer depend on the deprecated system.<sup><a data-type="noteref" id="ch01fn147-marker" href="ch15.html#ch01fn147">2</a></sup></p>
|
||||
|
||||
<p>A warning can be actionable, but still be annoying. To be useful, a deprecation warning should also be <em>relevant</em>. A warning is relevant if it surfaces at a time when a user actually performs the indicated action. Warning about the use of a deprecated function is best done while the engineer is writing code that uses that function, not after it has been checked into the repository for several weeks. Likewise, an email for data migration is best sent several months before the old system is removed rather than as an afterthought a weekend before the removal occurs.</p>
|
||||
|
||||
<p>It’s important to resist the urge to put deprecation warnings on everything possible. Warnings themselves are not bad, but naive tooling often produces a quantity of warning messages that can overwhelm the unsuspecting engineer. Within Google, we are very liberal with marking old functions as deprecated but leverage tooling such as <a href="https://errorprone.info">ErrorProne</a> or clang-tidy to ensure that warnings are surfaced in targeted ways. As discussed in <a data-type="xref" href="ch20.html#static_analysis-id00082">Static Analysis</a>, we limit these warnings to newly changed lines as a way to warn people about new uses of the deprecated symbol. Much more intrusive warnings, such as for deprecated targets in the dependency graph, are added only for compulsory deprecations, and the team is actively moving users away. In either case, tooling plays an important role in surfacing the appropriate information to the appropriate people at the proper time, allowing more warnings to be added without fatiguing the user.<a contenteditable="false" data-primary="deprecation" data-secondary="types of" data-startref="ix_deprtyp" data-type="indexterm" id="id-PqfLU3SmhVfm"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="managing_the_deprecation_process">
|
||||
<h1>Managing the Deprecation Process</h1>
|
||||
|
||||
<p>Although they can feel like different kinds of projects<a contenteditable="false" data-primary="deprecation" data-secondary="managing the process" data-type="indexterm" id="ix_deprmg"> </a> because we’re deconstructing a system rather than building it, deprecation projects are similar to other software engineering projects in the way they are managed and run. We won’t spend too much effort going over similarities between those management efforts, but it’s worth pointing out the ways in which they differ.</p>
|
||||
|
||||
<section data-type="sect2" id="process_owners">
|
||||
<h2>Process Owners</h2>
|
||||
|
||||
<p>We’ve learned at Google that <a contenteditable="false" data-primary="deprecation" data-secondary="managing the process" data-tertiary="process owners" data-type="indexterm" id="id-JgfAHjCRURu8"> </a>without explicit <a contenteditable="false" data-primary="ownership of code" data-secondary="deprecation process owners" data-type="indexterm" id="id-bafwCJC5URuO"> </a>owners, a deprecation process is unlikely to make meaningful progress, no matter how many warnings and alerts a system might generate. Having explicit project owners who are tasked with managing and running the deprecation process might seem like a poor use of resources, but the alternatives are even worse: don’t ever deprecate anything, or delegate deprecation efforts to the users of the system. The second case becomes simply an advisory deprecation, which will never organically finish, and the first is a commitment to maintain every old system ad infinitum. Centralizing deprecation efforts helps better assure that expertise actually <em>reduces</em> costs by making them more transparent.</p>
|
||||
|
||||
<p>Abandoned projects often present a problem when establishing ownership and aligning incentives. Every organization of reasonable size has projects that are still actively used but that nobody clearly owns or maintains, and Google is no exception. Projects sometimes enter this state because they are deprecated: the original owners have moved on to a successor project, leaving the obsolete one chugging along in the basement, still a dependency of a critical project, and hoping it just fades away eventually.</p>
|
||||
|
||||
<p>Such projects are unlikely to fade away on their own. In spite of our best hopes, we’ve found that these projects still require deprecation experts to remove them and prevent their failure at inopportune times. These teams should have removal as their primary goal, not just a side project of some other work. In the case of competing priorities, deprecation work will almost always be perceived as having a lower priority and rarely receive the attention it needs. These sorts of important-not-urgent cleanup tasks are a great use of 20% time and provide engineers exposure to other parts of the codebase.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="milestones">
|
||||
<h2>Milestones</h2>
|
||||
|
||||
<p>When building<a contenteditable="false" data-primary="milestones of a deprecation process" data-type="indexterm" id="id-bafEHJCaIRuO"> </a> a new system, project milestones<a contenteditable="false" data-primary="deprecation" data-secondary="managing the process" data-tertiary="milestones" data-type="indexterm" id="id-y9fEClCaI7uz"> </a> are generally pretty clear: “Launch the frobnazzer features by next quarter.” Following incremental development practices, teams build and deliver functionality incrementally to users, who get a win whenever they take advantage of a new feature. The end goal might be to launch the entire system, but incremental milestones help give the team a sense of progress and ensure they don’t need to wait until the end of the process to generate value for the <span class="keep-together">organization.</span></p>
|
||||
|
||||
<p>In contrast, it can often feel that the only milestone of a deprecation process is removing the obsolete system entirely. The team can feel they haven’t made any progress until they’ve turned out the lights and gone home. Although this might be the most meaningful step for the team, if it has done its job correctly, it’s often the least noticed by anyone external to the team, because by that point, the obsolete system no longer has any users. Deprecation project managers should resist the temptation to make this the only measurable milestone, particularly given that it might not even happen in all deprecation projects.</p>
|
||||
|
||||
<p>Similar to building a new system, managing a team working on deprecation should involve concrete incremental milestones, which are measurable and deliver value to users. The metrics used to evaluate the progress of the deprecation will be different, but it is still good for morale to celebrate incremental achievements in the deprecation process. We have found it useful to recognize appropriate incremental milestones, such as deleting a key subcomponent, just as we’d recognize accomplishments in building a new product.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="deprecation_tooling">
|
||||
<h2>Deprecation Tooling</h2>
|
||||
|
||||
<p>Much of the tooling used<a contenteditable="false" data-primary="deprecation" data-secondary="managing the process" data-tertiary="deprecation tooling" data-type="indexterm" id="ix_deprmgtool"> </a> to manage the deprecation process is discussed in depth elsewhere in this book, such as the large-scale change (LSC) process (<a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>) or our code review tools (<a data-type="xref" href="ch19.html#critique_googleapostrophes_code_review">Critique: Google’s Code Review Tool</a>). Rather than talk about the specifics of the tools, we’ll briefly outline how those tools are useful when managing the deprecation of an obsolete system. These tools can be categorized as discovery, migration, and backsliding prevention tooling.</p>
|
||||
|
||||
<section data-type="sect3" id="discovery">
|
||||
<h3>Discovery</h3>
|
||||
|
||||
<p>During the<a contenteditable="false" data-primary="discovery (in deprecation)" data-type="indexterm" id="id-LrfNHKCaUnhDuD"> </a> early stages of a deprecation process, and in fact during the entire process, it is useful to know <em>how</em> and <em>by whom</em> an obsolete system is being used. Much of the initial work of deprecation is determining who is using the old system—and in which unanticipated ways. Depending on the kinds of use, this process may require revisiting the deprecation decision once new information is learned. We also use these tools throughout the deprecation process to understand how the effort is progressing.</p>
|
||||
|
||||
<p>Within Google, we use tools like Code Search (see <a data-type="xref" href="ch17.html#code_search">Code Search</a>) and Kythe (see <a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a>) to statically determine which customers use a given library, and often to sample existing usage to see what sorts of behaviors customers are unexpectedly depending on. Because runtime dependencies generally require some static library or thin client use, this technique yields much of the information needed to start and run a deprecation process. Logging and runtime sampling in production help discover issues with dynamic dependencies.</p>
|
||||
|
||||
<p>Finally, we treat our global test suite as an oracle to determine whether all references to an old symbol have been removed. As discussed in <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>, tests are a mechanism of preventing unwanted behavioral changes to a system as the ecosystem evolves. Deprecation is a large part of that evolution, and customers are responsible for having sufficient testing to ensure that the removal of an obsolete system will not harm them.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="migration">
|
||||
<h3>Migration</h3>
|
||||
|
||||
<p>Much of the work of <a contenteditable="false" data-primary="migrations" data-secondary="in the deprecation process" data-type="indexterm" id="id-X3f8H7CQIVhmuK"> </a>doing deprecation efforts at Google is achieved by using the same set of code generation and review tooling we mentioned earlier. The LSC process and tooling are particularly useful in managing the large effort of actually updating the codebase to refer to new libraries or runtime services.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="preventing_backsliding">
|
||||
<h3>Preventing backsliding</h3>
|
||||
|
||||
<p>Finally, an often overlooked piece of<a contenteditable="false" data-primary="backsliding, preventing in deprecation process" data-type="indexterm" id="id-Qef9H8CNh8hdua"> </a> deprecation infrastructure is tooling for preventing the addition of new uses of the very thing being actively removed. Even for advisory deprecations, it is useful to warn users to shy away from a deprecated system in favor of a new one when they are writing new code. Without backsliding prevention, deprecation can become a game of whack-a-mole in which users constantly add new uses of a system with which they are familiar (or find examples of elsewhere in the codebase), and the deprecation team constantly migrates these new uses. This process is both counterproductive and demoralizing.</p>
|
||||
|
||||
<p>To prevent deprecation backsliding on a micro level, we use the Tricorder static analysis framework <a contenteditable="false" data-primary="Tricorder static analysis platform" data-type="indexterm" id="id-PqfOHjUmhphzuz"> </a>to notify users that they are adding calls into a deprecated system and give them feedback on the appropriate replacement. <a contenteditable="false" data-primary="@deprecated annotation" data-type="indexterm" id="id-DQfVCEULhKhBur"> </a>Owners of deprecated systems can add compiler annotations to deprecated symbols (such as the <code>@deprecated</code> Java annotation), and Tricorder surfaces new uses of these symbols at review time. These annotations give control over messaging to the teams that own the deprecated system, while at the same time automatically alerting the change author. In limited cases, the tooling also suggests a push-button fix to migrate to the suggested replacement.</p>
|
||||
|
||||
<p>On a macro level, we use visibility whitelists in our build system to ensure that new dependencies <a contenteditable="false" data-primary="dependencies" data-secondary="new, preventing introduction into deprecated system" data-type="indexterm" id="id-DQfJH8ILhKhBur"> </a>are not introduced to the deprecated system. Automated tooling periodically examines these whitelists and prunes them as dependent systems are migrated away from<a contenteditable="false" data-primary="deprecation" data-secondary="managing the process" data-startref="ix_deprmgtool" data-tertiary="deprecation tooling" data-type="indexterm" id="id-KofQCdIdh7hBuk"> </a> the obsolete system.<a contenteditable="false" data-primary="deprecation" data-secondary="managing the process" data-startref="ix_deprmg" data-type="indexterm" id="id-5nf8UlIah8hJud"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00019">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Deprecation can feel like the dirty work of cleaning up the street after the circus parade has just passed through town, yet these efforts improve the overall software ecosystem by reducing maintenance overhead and cognitive burden of engineers. Scalably maintaining complex software systems over time is more than just building and running software: we must also be able to remove systems that are obsolete or otherwise unused.</p>
|
||||
|
||||
<p>A complete deprecation process involves successfully managing social and technical challenges through policy and tooling. Deprecating in an organized and well-managed fashion is often overlooked as a source of benefit to an organization, but is essential for its long-term sustainability.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00120">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Software systems have continuing maintenance costs that should be weighed against the costs of removing them.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Removing things is often more difficult than building them to begin with because existing users are often using the system beyond its original design.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Evolving a system in place is usually cheaper than replacing it with a new one, when turndown costs are included.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It is difficult to honestly evaluate the costs involved in deciding whether to deprecate: aside from the direct maintenance costs involved in keeping the old system around, there are ecosystem costs involved in having multiple similar systems to choose between and that might need to interoperate. The old system might implicitly be a drag on feature development for the new. These ecosystem costs are diffuse and difficult to measure. Deprecation and removal costs are often similarly diffuse.<a contenteditable="false" data-primary="deprecation" data-startref="ix_depr" data-type="indexterm" id="id-EPf0H5HMIYCPir"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn146"><sup><a href="ch15.html#ch01fn146-marker">1</a></sup><a href="https://oreil.ly/heo5Q">"Design and Construction of Nuclear Power Plants to Facilitate Decommissioning,"</a> Technical Reports Series No. 382, IAEA, Vienna (1997).</p><p data-type="footnote" id="ch01fn147"><sup><a href="ch15.html#ch01fn147-marker">2</a></sup>See <a href="https://abseil.io/docs/cpp/tools/api-upgrades"><em class="hyperlink">https://abseil.io/docs/cpp/tools/api-upgrades</em></a> for an example.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
348
clones/abseil.io/resources/swe-book/html/ch16.html
Normal file
354
clones/abseil.io/resources/swe-book/html/ch17.html
Normal file
589
clones/abseil.io/resources/swe-book/html/ch18.html
Normal file
|
@ -0,0 +1,589 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="build_systems_and_build_philosophy">
|
||||
<h1>Build Systems and Build Philosophy</h1>
|
||||
|
||||
<p class="byline">Written by Erik Kuefler</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>If you ask Google engineers what they<a contenteditable="false" data-primary="build systems" data-type="indexterm" id="ix_bldsys"> </a> like most about working at Google (besides the free food and cool products), you might hear something surprising: engineers love the build system.<sup><a data-type="noteref" id="ch01fn184-marker" href="ch18.html#ch01fn184">1</a></sup> Google has spent a tremendous amount of engineering effort over its lifetime in creating its own build system from the ground up, with the goal of ensuring that our engineers are able to quickly and reliably build code.<a contenteditable="false" data-primary="Bazel" data-type="indexterm" id="id-oKCAcBs2"> </a><a contenteditable="false" data-primary="Blaze" data-type="indexterm" id="id-mrC4s9sX"> </a> The effort has been so successful that Blaze, the main component of the build system, has been reimplemented several different times by ex-Googlers who have left the company.<sup><a data-type="noteref" id="ch01fn185-marker" href="ch18.html#ch01fn185">2</a></sup> In 2015, Google finally open sourced an implementation of Blaze named <a href="https://bazel.build">Bazel</a>.</p>
|
||||
|
||||
<section data-type="sect1" id="purpose_of_a_build_system">
|
||||
<h1>Purpose of a Build System</h1>
|
||||
|
||||
<p>Fundamentally, all build systems have a straightforward purpose: they transform the source code <a contenteditable="false" data-primary="build systems" data-secondary="purpose of" data-type="indexterm" id="id-oKC9HXtKSK"> </a>written by engineers into executable binaries that can be read by machines. A good build system will generally try to optimize for two important <span class="keep-together">properties:</span></p>
|
||||
|
||||
<dl>
|
||||
<dt>Fast</dt>
|
||||
<dd>A developer should be able to type a single command to run the build and get back the resulting binary, often in as little as a few seconds.<a contenteditable="false" data-primary="speed in build systems" data-type="indexterm" id="id-JECKHBtvcrSN"> </a></dd>
|
||||
<dt>Correct</dt>
|
||||
<dd>Every time any developer runs a<a contenteditable="false" data-primary="correctness in build systems" data-type="indexterm" id="id-MzCgHzs6cNSK"> </a> build on any machine, they should get the same result (assuming that the source files and other inputs are the same).</dd>
|
||||
</dl>
|
||||
|
||||
<p>Many older build systems attempt to make trade-offs between speed and correctness by taking shortcuts that can lead to inconsistent builds.<a contenteditable="false" data-primary="Bazel" data-secondary="speed and correctness" data-type="indexterm" id="id-JECKHqsOSE"> </a> Bazel’s main objective is to avoid having to choose between speed and correctness, providing a build system structured to ensure that it’s always possible to build code efficiently and consistently.</p>
|
||||
|
||||
<p>Build systems aren’t just for humans; they also allow machines to create builds automatically, whether for testing or for releases to production.<a contenteditable="false" data-primary="automated build system" data-type="indexterm" id="id-BdCWHKSYSj"> </a> In fact, the large majority of builds at Google are triggered automatically rather than directly by engineers. Nearly all of our development tools tie into the build system in some way, giving huge amounts of value to everyone working on our codebase. Here’s a small sample of workflows that take advantage of our automated build system:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Code is automatically built, tested, and pushed to production without any human intervention. Different teams do this at different rates: some teams push weekly, others daily, and others as fast as the system can create and validate new builds. (see <a data-type="xref" href="ch24.html#continuous_delivery-id00035">Continuous Delivery</a>).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Developer changes are automatically tested when they’re sent for code review (see <a data-type="xref" href="ch19.html#critique_googleapostrophes_code_review">Critique: Google’s Code Review Tool</a>) so that both the author and reviewer can immediately see any build or test issues caused by the change.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Changes are tested again immediately before merging them into the trunk, making it much more difficult to submit breaking changes.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Authors of low-level libraries are able to test their changes across the entire codebase, ensuring that their changes are safe across millions of tests and binaries.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Engineers are<a contenteditable="false" data-primary="large-scale changes" data-type="indexterm" id="id-l1CZHNHnSph2Sm"> </a> able to create large-scale changes (LSCs) that touch tens of thousands of source files at a time (e.g., renaming a common symbol) while still being able to safely submit and test those changes. We discuss LSCs in greater detail in <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>All of this is possible only because of Google’s investment in its build system. Although Google might be unique in its scale, any organization of any size can realize similar benefits by making proper use of a modern build system. This chapter describes what Google considers to be a "modern build system" and how to use such systems.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="what_happens_without_a_build_systemques">
|
||||
<h1>What Happens Without a Build System?</h1>
|
||||
|
||||
<p>Build systems allow your development to scale. <a contenteditable="false" data-primary="build systems" data-secondary="using other tools instead of" data-type="indexterm" id="ix_bldsysno"> </a>As we’ll illustrate in the next section, we run into problems of scaling without a proper build environment.</p>
|
||||
|
||||
<section data-type="sect2" id="but_all_i_need_is_a_compilerexclamation">
|
||||
<h2>But All I Need Is a Compiler!</h2>
|
||||
|
||||
<p>The need for a build <a contenteditable="false" data-primary="build systems" data-secondary="using other tools instead of" data-tertiary="compilers" data-type="indexterm" id="id-BdCWHdtXcghy"> </a>system might not be immediately obvious. <a contenteditable="false" data-primary="compilers, using instead of build systems" data-type="indexterm" id="id-MzCpt8t6cdhK"> </a>After all, most of us probably didn’t use a build system when we were first learning to code—we probably started by invoking tools like <code>gcc</code> or <code>javac</code> directly from the command line, or the equivalent in an integrated development environment (IDE).<a contenteditable="false" data-primary="Java" data-secondary="javac compiler" data-type="indexterm" id="id-AqClSrt9cMhK"> </a> As long as all of our source code is in the same directory, a command like this works fine:</p>
|
||||
|
||||
<pre data-type="programlisting">javac *.java</pre>
|
||||
|
||||
<p>This instructs the Java compiler to take every Java source file in the current directory and turn it into a binary class file. In the simplest case, this is all that we need.</p>
|
||||
|
||||
<p>However, things become more complicated quickly as soon as our code expands. <code>javac</code> is smart enough to look in subdirectories of our current directory to find code that we import. But it has no way of finding code stored in other parts of the filesystem (perhaps a library shared by several of our projects). It also obviously only knows how to build Java code. Large systems often involve different pieces written in a variety of programming languages with webs of dependencies among those pieces, meaning no compiler for a single language can possibly build the entire system.</p>
|
||||
|
||||
<p>As soon as we end up having to deal with code from multiple languages or multiple compilation units, building code is no longer a one-step process. We now need to think about what our code depends on and build those pieces in the proper order, possibly using a different set of tools for each piece. If we change any of the dependencies, we need to repeat this process to avoid depending on stale binaries. For a codebase of even moderate size, this process quickly becomes tedious and <span class="keep-together">error-prone.</span></p>
|
||||
|
||||
<p>The compiler also doesn’t know anything about how to handle external dependencies, such as third-party JAR files in Java.<a contenteditable="false" data-primary="dependencies" data-secondary="external, compilers and" data-type="indexterm" id="id-l1CZH4fdcphz"> </a><a contenteditable="false" data-primary="Java" data-secondary="third-party JAR files" data-type="indexterm" id="id-gqCOt9fyc9hj"> </a> Often the best we can do without a build system is to download the dependency from the internet, stick it in a <code>lib</code> folder on the hard drive, and configure the compiler to read libraries from that directory. <a contenteditable="false" data-primary="libraries, compilers and" data-type="indexterm" id="id-6MCNspfbcnhz"> </a>Over time, it’s easy to forget what libraries we put in there, where they came from, and whether they’re still in use. And good luck keeping them up to date as the library maintainers release new versions.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="shell_scripts_to_the_rescuequestion_mar">
|
||||
<h2>Shell Scripts to the Rescue?</h2>
|
||||
|
||||
<p>Suppose that your<a contenteditable="false" data-primary="shell scripts, using for builds" data-type="indexterm" id="id-MzCgH8tnsdhK"> </a> hobby project starts<a contenteditable="false" data-primary="build systems" data-secondary="using other tools instead of" data-tertiary="shell scripts" data-type="indexterm" id="id-NoCVt6tmsNhO"> </a> out simple enough that you can build it using just a compiler, but you begin running into some of the problems described previously. Maybe you still don’t think you need a real build system and can automate away the tedious parts using some simple shell scripts that take care of building things in the correct order. This helps out for a while, but pretty soon you start running into even more problems:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>It becomes tedious. As your system grows more complex, you begin spending almost as much time working on your build scripts as on real code. Debugging shell scripts is painful, with more and more hacks being layered on top of one another.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It’s slow. To make sure you weren’t accidentally relying on stale libraries, you have your build script build every dependency in order every time you run it. You think about adding some logic to detect which parts need to be rebuilt, but that sounds awfully complex and error prone for a script. Or you think about specifying which parts need to be rebuilt each time, but then you’re back to square one.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Good news: it’s time for a release! Better go figure out all the arguments you need to pass to the <code>jar</code> command to <a href="https://xkcd.com/1168">make your final build</a>. And remember how to upload it and push it out to the central repository. And build and push the documentation updates, and send out a notification to users. Hmm, maybe this calls for another script...</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Disaster! Your hard drive crashes, and now you need to recreate your entire system. You were smart enough to keep all of your source files in version control, but what about those libraries you downloaded? Can you find them all again and make sure they were the same version as when you first downloaded them? Your scripts probably depended on particular tools being installed in particular places—can you restore that same environment so that the scripts work again? What about all those environment variables you set a long time ago to get the compiler working just right and then forgot about?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Despite the problems, your project is successful enough that you’re able to begin hiring more engineers. Now you realize that it doesn’t take a disaster for the previous problems to arise—you need to go through the same painful bootstrapping process every time a new developer joins your team. And despite your best efforts, there are still small differences in each person’s system. Frequently, what works on one person’s machine doesn’t work on another’s, and each time it takes a few hours of debugging tool paths or library versions to figure out where the difference is.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>You decide that you need to automate your build system. In theory, this is as simple as getting a new computer and setting it up to run your build script every night using cron. You still need to go through the painful setup process, but now you don’t have the benefit of a human brain being able to detect and resolve minor problems. Now, every morning when you get in, you see that last night’s build failed because yesterday a developer made a change that worked on their system but didn’t work on the automated build system. Each time it’s a simple fix, but it happens so often that you end up spending a lot of time each day discovering and applying these simple fixes.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Builds become slower and slower as the project grows. One day, while waiting for a build to complete, you gaze mournfully at the idle desktop of your coworker, who is on vacation, and wish there were a way to take advantage of all that wasted computational power.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>You’ve run into a classic problem of scale. For a single developer working on at most a couple hundred lines of code for at most a week or two (which might have been the entire experience thus far of a junior developer who just graduated university), a compiler is all you need. Scripts can maybe take you a little bit farther. But as soon as you need to coordinate across multiple developers and their machines, even a perfect build script isn’t enough because it becomes very difficult to account for the minor differences in those machines. At this point, this simple approach breaks down and it’s time to invest in a real build system.<a contenteditable="false" data-primary="build systems" data-secondary="using other tools instead of" data-startref="ix_bldsysno" data-type="indexterm" id="id-4JCNHVspsKhB"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="modern_build_systems">
|
||||
<h1>Modern Build Systems</h1>
|
||||
|
||||
<p>Fortunately, all of the problems we started running into have already been solved many times over by existing general-purpose build systems.<a contenteditable="false" data-primary="build systems" data-secondary="modern" data-type="indexterm" id="ix_bldsysmod"> </a> Fundamentally, they aren’t that different from the aforementioned script-based DIY approach we were working on: they run the same compilers under the hood, and you need to understand those underlying tools to be able to know what the build system is really doing. But these existing systems have gone through many years of development, making them far more robust and flexible than the scripts you might try hacking together yourself.</p>
|
||||
|
||||
<section data-type="sect2" id="itapostrophes_all_about_dependencies">
|
||||
<h2>It’s All About Dependencies</h2>
|
||||
|
||||
<p>In looking<a contenteditable="false" data-primary="dependencies" data-secondary="build systems and" data-type="indexterm" id="id-MzCgH8t6cJfK"> </a> through <a contenteditable="false" data-primary="build systems" data-secondary="modern" data-tertiary="dependencies and" data-type="indexterm" id="id-NoCVt6tgclfO"> </a>the previously described problems, one theme repeats over and over: managing your own code is fairly straightforward, but managing its dependencies is much more difficult (and <a data-type="xref" href="ch21.html#dependency_management">Dependency Management</a> is devoted to covering this problem in detail). There are all sorts of dependencies: sometimes there’s a dependency on a task (e.g., "push the documentation before I mark a release as complete"), and sometimes there’s a dependency on an artifact (e.g., "I need to have the latest version of the computer vision library to build my code"). Sometimes, you have internal dependencies on another part of your codebase, and sometimes you have external dependencies on code or data owned by another team (either in your organization or a third party). But in any case, the idea of "I need that before I can have this" is something that recurs repeatedly in the design of build systems, and managing dependencies is perhaps the most fundamental job of a build system.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="task-based_build_systems">
|
||||
<h2>Task-Based Build Systems</h2>
|
||||
|
||||
<p>The shell scripts we started developing in the <a contenteditable="false" data-primary="build systems" data-secondary="modern" data-tertiary="task-based" data-type="indexterm" id="ix_bldsysmodtsk"> </a>previous <a contenteditable="false" data-primary="task-based build systems" data-type="indexterm" id="ix_tskbld"> </a>section were an example of a primitive <em>task-based build system</em>. In a task-based build system, the fundamental unit of work is the task. Each task is a script of some sort that can execute any sort of logic, and tasks specify other tasks as dependencies that must run before them. Most major build systems in use today, such as Ant, Maven, Gradle, Grunt, and Rake, are task based.<a contenteditable="false" data-primary="Rake" data-type="indexterm" id="id-l1C2s6tMsvfz"> </a><a contenteditable="false" data-primary="Grunt" data-type="indexterm" id="id-gqC4S2tZsefj"> </a><a contenteditable="false" data-primary="Gradle" data-type="indexterm" id="id-9ACDh0t2s8fA"> </a><a contenteditable="false" data-primary="Maven" data-type="indexterm" id="id-6MCwfEtwsOfz"> </a><a contenteditable="false" data-primary="Ant" data-type="indexterm" id="id-agC5Uwtpsyfp"> </a></p>
|
||||
|
||||
<p>Instead of shell scripts, most modern build systems require engineers to create <em>buildfiles</em> that describe how to perform the build.<a contenteditable="false" data-primary="buildfiles" data-type="indexterm" id="id-AqCntjcWsdfK"> </a> Take this example from the <a href="https://oreil.ly/WL9ry">Ant manual</a>:</p>
|
||||
|
||||
<div data-type="example" id="id-47TjsVsgf2">
|
||||
<pre data-code-language="xml" data-type="programlisting"><project name="MyProject" default="dist" basedir=".">
|
||||
<description>
|
||||
simple example build file
|
||||
</description>
|
||||
<!-- set global properties for this build -->
|
||||
<property name="src" location="src"/>
|
||||
<property name="build" location="build"/>
|
||||
<property name="dist" location="dist"/>
|
||||
|
||||
<target name="init">
|
||||
<!-- Create the time stamp -->
|
||||
<tstamp/>
|
||||
<!-- Create the build directory structure used by compile -->
|
||||
<mkdir dir="${build}"/>
|
||||
</target>
|
||||
|
||||
<target name="compile" depends="init"
|
||||
description="compile the source">
|
||||
<!-- Compile the Java code from ${src} into ${build} -->
|
||||
<javac srcdir="${src}" destdir="${build}"/>
|
||||
</target>
|
||||
|
||||
<target name="dist" depends="compile"
|
||||
description="generate the distribution">
|
||||
<!-- Create the distribution directory -->
|
||||
<mkdir dir="${dist}/lib"/>
|
||||
|
||||
<!-- Put everything in ${build} into the MyProject-${DSTAMP}.jar file -->
|
||||
<jar jarfile="${dist}/lib/MyProject-${DSTAMP}.jar" basedir="${build}"/>
|
||||
</target>
|
||||
|
||||
<target name="clean"
|
||||
description="clean up">
|
||||
<!-- Delete the ${build} and ${dist} directory trees -->
|
||||
<delete dir="${build}"/>
|
||||
<delete dir="${dist}"/>
|
||||
</target>
|
||||
</project></pre>
|
||||
</div>
|
||||
|
||||
<p>The buildfile is written in XML and defines some simple metadata about the build along with a list of tasks (the <code><target></code> tags in the XML<sup><a data-type="noteref" id="ch01fn187-marker" href="ch18.html#ch01fn187">3</a></sup>). Each task executes a list of possible commands defined by Ant, which here include creating and deleting directories, running <code>javac</code>, and creating a JAR file. <a contenteditable="false" data-primary="dependencies" data-secondary="in task-based build systems" data-type="indexterm" id="id-6MCNs4SwsOfz"> </a>This set of commands can be extended by user-provided plug-ins to cover any sort of logic. Each task can also define the tasks it depends on via the <code>depends</code> attribute. These dependencies form an acyclic graph (see <a data-type="xref" href="ch18.html#an_acyclic_graph_showing_dependencies">Figure 18-1</a>).</p>
|
||||
|
||||
<figure id="an_acyclic_graph_showing_dependencies"><img alt="An acyclic graph showing dependencies" src="images/seag_1801.png">
|
||||
<figcaption><span class="label">Figure 18-1. </span>An acyclic graph showing dependencies</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Users perform builds by providing tasks to Ant’s command-line tool. <a contenteditable="false" data-primary="Ant" data-secondary="performing builds by providing tasks to command line" data-type="indexterm" id="id-9ACYH9f2s8fA"> </a>For example, when a user types <code>ant dist</code>, Ant takes the following steps:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>Loads a file named <em>build.xml</em> in the current directory and parses it to create the graph structure shown in <a data-type="xref" href="ch18.html#an_acyclic_graph_showing_dependencies">Figure 18-1</a>.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Looks for the task named <code>dist</code> that was provided on the command line and discovers that it has a dependency on the task named <code>compile</code>.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Looks for the task named <code>compile</code> and discovers that it has a dependency on the task named <code>init</code>.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Looks for the task named <code>init</code> and discovers that it has no dependencies.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Executes the commands defined in the <code>init</code> task.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Executes the commands defined in the <code>compile</code> task given that all of that task’s dependencies have been run.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Executes the commands defined in the <code>dist</code> task given that all of that task’s dependencies have been run.</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>In the end, the code executed by Ant when running the <code>dist</code> task is equivalent to the following shell script:</p>
|
||||
|
||||
<div data-type="example" id="id-ayT4Tjs4fd">
|
||||
<pre data-code-language="bash" data-type="programlisting">./createTimestamp.sh
|
||||
mkdir build/
|
||||
javac src/* -d build/
|
||||
mkdir -p dist/lib/
|
||||
jar cf dist/lib/MyProject-$(date --iso-8601).jar build/*</pre>
|
||||
</div>
|
||||
|
||||
<p>When the syntax is stripped away, the buildfile and the build script actually aren’t too different.<a contenteditable="false" data-primary="buildfiles" data-secondary="build scripts and" data-type="indexterm" id="id-X3CoHZCLsOfa"> </a> But we’ve already gained a lot by doing this. We can create new buildfiles in other directories and link them together. We can easily add new tasks that depend on existing tasks in arbitrary and complex ways. We need only pass the name of a single task to the <code>ant</code> command-line tool, and it will take care of determining everything that needs to be run.</p>
|
||||
|
||||
<p>Ant is a very old piece of software, originally released in 2000—not what many people would consider a "modern" build system today! <a contenteditable="false" data-primary="Ant" data-secondary="replacement by more modern build systems" data-type="indexterm" id="id-qYCeH5u8sBfv"> </a><a contenteditable="false" data-primary="Maven" data-secondary="improvements on Ant" data-type="indexterm" id="id-daCot1uLsZfw"> </a>Other tools like Maven and Gradle have improved on Ant in the intervening years and essentially <a contenteditable="false" data-primary="Gradle" data-secondary="improvements on Ant" data-type="indexterm" id="id-5BCecruRsDfr"> </a>replaced it by adding features like automatic management of external dependencies and a cleaner syntax without any XML. But the nature of these newer systems remains the same: they<a contenteditable="false" data-primary="build scripts" data-secondary="writing as tasks" data-type="indexterm" id="id-ZJC8sYu6sjfb"> </a> allow engineers to write build scripts in a principled and modular way as tasks and provide tools for executing those tasks and managing dependencies among them.</p>
|
||||
|
||||
<section data-type="sect3" id="the_dark_side_of_task-based_build_syste">
|
||||
<h3>The dark side of task-based build systems</h3>
|
||||
|
||||
<p>Because these tools <a contenteditable="false" data-primary="task-based build systems" data-secondary="dark side of" data-type="indexterm" id="id-5BC3HjtYF2sJf3"> </a>essentially let engineers define any script as a task, they are extremely powerful, allowing you to do pretty much anything you can imagine with them. But that power comes with drawbacks, and task-based build systems can become difficult to work with as their build scripts grow more complex. The problem with such systems is that they actually end up giving <em>too much power to engineers and not enough power to the system</em>. Because the system has no idea what the scripts are doing, performance suffers, as it must be very conservative in how it schedules and executes build steps. And there’s no way for the system to confirm that each script is doing what it should, so scripts tend to grow in complexity and end up being another thing that needs debugging.</p>
|
||||
|
||||
<section data-type="sect4" id="difficulty_of_parallelizing_build_steps">
|
||||
<h4>Difficulty of parallelizing build steps</h4>
|
||||
|
||||
<p>Modern development <a contenteditable="false" data-primary="parallelization of build steps" data-secondary="difficulty in task-based systems" data-type="indexterm" id="id-84CrHKt6cKF4sVfV"> </a>workstations are typically<a contenteditable="false" data-primary="task-based build systems" data-secondary="difficulty of parallelizing build steps" data-type="indexterm" id="id-12CQt1t1coFws1fB"> </a> quite powerful, with multiple cores that should theoretically be capable of executing several build steps in parallel. But task-based systems are often unable to parallelize task execution even when it seems like they should be able to. Suppose that task A depends on tasks B and C. Because tasks B and C have no dependency on each other, is it safe to run them at the same time so that the system can more quickly get to task A? Maybe, if they don’t touch any of the same resources. But maybe not—perhaps both use the same file to track their statuses and running them at the same time will cause a conflict. There’s no way in general for the system to know, so either it has to risk these conflicts (leading to rare but very difficult-to-debug build problems), or it has to restrict the entire build to running on a single thread in a single process. This can be a huge waste of a powerful developer machine, and it completely rules out the possibility of distributing the build across multiple machines.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="difficulty_performing_incremental_build">
|
||||
<h4>Difficulty performing incremental builds</h4>
|
||||
|
||||
<p>A good build system will allow<a contenteditable="false" data-primary="task-based build systems" data-secondary="difficulty of performing incremental builds" data-type="indexterm" id="id-12C7H1tysoFws1fB"> </a> engineers to perform reliable incremental builds such that a small change doesn’t require the entire codebase to be rebuilt from scratch.<a contenteditable="false" data-primary="incremental builds, difficulty in task-based build systems" data-type="indexterm" id="id-rpCNt0tpsWFVsVfV"> </a> This is especially important if the build system is slow and unable to parallelize build steps for the aforementioned reasons. But unfortunately, task-based build systems struggle here, too. Because tasks can do anything, there’s no way in general to check whether they’ve already been done. Many tasks simply take a set of source files and run a compiler to create a set of binaries; thus, they don’t need to be rerun if the underlying source files haven’t changed. But without additional information, the system can’t say this for sure—maybe the task downloads a file that could have changed, or maybe it writes a timestamp that could be different on each run. To guarantee correctness, the system typically must rerun every task during each build.</p>
|
||||
|
||||
<p>Some build systems try to enable incremental builds by letting engineers specify the conditions under which a task needs to be rerun. Sometimes this is feasible, but often it’s a much trickier problem than it appears. For example, in languages like C++ that allow files to be included directly by other files, it’s impossible to determine the entire set of files that must be watched for changes without parsing the input sources. Engineers will often end up taking shortcuts, and these shortcuts can lead to rare and frustrating problems where a task result is reused even when it shouldn’t be. When this happens frequently, engineers get into the habit of running <code>clean</code> before every build to get a fresh state, completely defeating the purpose of having an incremental build in the first place. Figuring out when a task needs to be rerun is surprisingly subtle, and is a job better handled by machines than humans.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="difficulty_maintaining_and_debugging_sc">
|
||||
<h4>Difficulty maintaining and debugging scripts</h4>
|
||||
|
||||
<p>Finally, the build scripts imposed by task-based build systems are often just difficult to work with.<a contenteditable="false" data-primary="build scripts" data-secondary="difficulties of task-based build systems with" data-type="indexterm" id="id-rpCYH0tKSWFVsVfV"> </a><a contenteditable="false" data-primary="task-based build systems" data-secondary="difficulty maintaining and debugging build scripts" data-type="indexterm" id="id-O7Cyt9t0S6F1snf9"> </a> Though they often receive less scrutiny, build scripts are code just like the system being built, and are easy places for bugs to hide. Here are some examples of bugs that are very common when working with a task-based build system:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Task A depends on task B to produce a particular file as output. The owner of task B doesn’t realize that other tasks rely on it, so they change it to produce output in a different location. This can’t be detected until someone tries to run task A and finds that it fails.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Task A depends on task B, which depends on task C, which is producing a particular file as output that’s needed by task A. The owner of task B decides that it doesn’t need to depend on task C any more, which causes task A to fail even though task B doesn’t care about task C at all!</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The developer of a new task accidentally makes an assumption about the machine running the task, such as the location of a tool or the value of particular environment variables. The task works on their machine, but fails whenever another developer tries it.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A task contains a nondeterministic component, such as downloading a file from the internet or adding a timestamp to a build. Now, people will get potentially different results each time they run the build, meaning that engineers won’t always be able to reproduce and fix one another’s failures or failures that occur on an automated build system.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Tasks with multiple dependencies can create race conditions. If task A depends on both task B and task C, and task B and C both modify the same file, task A will get a different result depending on which one of tasks B and C finishes first.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>There’s no general-purpose way to solve these performance, correctness, or maintainability problems within the task-based framework laid out here. So long as engineers can write arbitrary code that runs during the build, the system can’t have enough information to always be able to run builds quickly and correctly. To solve the problem, we need to take some power out of the hands of engineers and put it back in the hands of the system and reconceptualize the role of the system not as running tasks, but as producing artifacts. This is the approach that Google takes with Blaze and Bazel, and it will be described in the next section.<a contenteditable="false" data-primary="task-based build systems" data-startref="ix_tskbld" data-type="indexterm" id="id-KYCpH1sZSbFasvf9"> </a><a contenteditable="false" data-primary="build systems" data-secondary="modern" data-startref="ix_bldsysmodtsk" data-tertiary="task-based" data-type="indexterm" id="id-E8CytKsESwFmsvfm"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="artifact-based_build_systems">
|
||||
<h2>Artifact-Based Build Systems</h2>
|
||||
|
||||
<p>To design a better build system, we need to take a step back. <a contenteditable="false" data-primary="build systems" data-secondary="modern" data-tertiary="artifact-based" data-type="indexterm" id="ix_bldsysmodart"> </a><a contenteditable="false" data-primary="artifact-based build systems" data-type="indexterm" id="ix_artibld"> </a>The problem with the earlier systems is that they gave too much power to individual engineers by letting them define their own tasks. Maybe instead of letting engineers define tasks, we can have a small number of tasks defined by the system that engineers can configure in a limited way. We could probably deduce the name of the most important task from the name of this chapter: a build system’s primary task should be to <em>build</em> code. Engineers would still need to tell the system <em>what</em> to build, but the <em>how</em> of doing the build would be left to the system.</p>
|
||||
|
||||
<p>This is exactly the approach taken by Blaze and the other <em>artifact-based</em> build systems descended from it (which include Bazel, Pants, and Buck). <a contenteditable="false" data-primary="Buck" data-type="indexterm" id="id-l1CEtDcnSvfz"> </a><a contenteditable="false" data-primary="Pants" data-type="indexterm" id="id-gqCwcocOSefj"> </a><a contenteditable="false" data-primary="Bazel" data-type="indexterm" id="id-9AC1svcVS8fA"> </a><a contenteditable="false" data-primary="Blaze" data-type="indexterm" id="id-6MCQSAcmSOfz"> </a>Like with task-based build systems, we still have buildfiles, but the contents of those buildfiles are very different. <a contenteditable="false" data-primary="buildfiles" data-secondary="in artifact-based build systems" data-type="indexterm" id="id-agCbhqcnSyfp"> </a>Rather than being an imperative set of commands in a Turing-complete scripting language describing how to produce an output, buildfiles in Blaze are a <em>declarative manifest</em> describing a set of artifacts to build, their dependencies, and a limited set of options that affect how they’re built. When engineers run <code>blaze</code> on the command line, they specify a set of targets to build (the "what"), and Blaze is responsible for configuring, running, and scheduling the compilation steps (the "how"). Because the build system now has full control over what tools are being run when, it can make much stronger guarantees that allow it to be far more efficient while still guaranteeing correctness.</p>
|
||||
|
||||
<section data-type="sect3" id="a_functional_perspective">
|
||||
<h3>A functional perspective</h3>
|
||||
|
||||
<p>It’s easy to make an analogy between artifact-based build systems and functional programming. <a contenteditable="false" data-primary="artifact-based build systems" data-secondary="functional perspective" data-type="indexterm" id="id-gqCzH2tZs8Sofy"> </a>Traditional imperative<a contenteditable="false" data-primary="imperative programming languages" data-type="indexterm" id="id-9ACBt0t2soSEf6"> </a> programming <a contenteditable="false" data-primary="programming languages" data-secondary="imperative and functional" data-type="indexterm" id="id-6MCecEtwsySEfq"> </a>languages (e.g., Java, C, and Python) specify lists of statements to be executed one after another, in the same way that task-based build systems let programmers define a series of steps to execute. Functional programming<a contenteditable="false" data-primary="functional programming languages" data-type="indexterm" id="id-agCeswtps3S3fe"> </a> languages (e.g., Haskell and ML), in contrast, are structured more like a series of mathematical equations. In functional languages, the programmer describes a computation to perform, but leaves the details of when and exactly how that computation is executed to the compiler. This maps to the idea of declaring a manifest in an artifact-based build system and letting the system figure out how to execute the build.</p>
|
||||
|
||||
<p>Many problems cannot be easily expressed using functional programming, but the ones that do benefit greatly from it: the language is often able to trivially parallelize such programs and make strong guarantees about their correctness that would be impossible in an imperative language. The easiest problems to express using functional programming are the ones that simply involve transforming one piece of data into another using a series of rules or functions. And that’s exactly what a build system is: the whole system is effectively a mathematical function that takes source files (and tools like the compiler) as inputs and produces binaries as outputs. So, it’s not surprising that it works well to base a build system around the tenets of functional programming.</p>
|
||||
|
||||
<section data-type="sect4" id="getting_concrete_with_bazel">
|
||||
<h4>Getting concrete with Bazel</h4>
|
||||
|
||||
<p>Bazel is the open source version of Google’s internal build tool, Blaze, and is a good example of an artifact-based build system.<a contenteditable="false" data-primary="Bazel" data-secondary="getting concrete with" data-type="indexterm" id="id-agCwHwtps6sBSdf4"> </a><a contenteditable="false" data-primary="artifact-based build systems" data-secondary="getting concrete with Bazel" data-type="indexterm" id="id-QjCAtmt0sZsySnfa"> </a> Here’s what <a contenteditable="false" data-primary="buildfiles" data-secondary="in artifact-based build systems" data-tertiary="Bazel" data-type="indexterm" id="id-X3C3cLtLsds5SEfl"> </a>a buildfile (normally named BUILD) looks like in Bazel:</p>
|
||||
|
||||
<div data-type="example" id="id-ayTNcjsps3S3fe">
|
||||
<pre data-type="programlisting">java_binary(
|
||||
name = "MyBinary",
|
||||
srcs = ["MyBinary.java"],
|
||||
deps = [
|
||||
":mylib",
|
||||
],
|
||||
)
|
||||
|
||||
java_library(
|
||||
name = "mylib",
|
||||
srcs = ["MyLibrary.java", "MyHelper.java"],
|
||||
visibility = ["//java/com/example/myproduct:__subpackages__"],
|
||||
deps = [
|
||||
"//java/com/example/common",
|
||||
"//java/com/example/myproduct/otherlib",
|
||||
"@com_google_common_guava_guava//jar",
|
||||
],
|
||||
)</pre>
|
||||
</div>
|
||||
|
||||
<p>In Bazel, <em>BUILD</em> files define <em>targets—</em>the two types of targets here are <code>java_binary</code> and <code>java_library</code>. Every target corresponds to an artifact that can be created by the system: <code>binary</code> targets produce binaries that can be executed directly, and <code>library</code> targets produce libraries that can be used by binaries or other libraries. Every target has a <em>name</em> (which defines how it is referenced on the command line and by other targets), <em>srcs</em> (which define the source files that must be compiled to create the artifact for the target), and <em>deps</em> (which define other targets that must be built before this target and linked into it). Dependencies can either be within the same package (e.g., <code>MyBinary</code>’s dependency on <code>":mylib"</code>), on a different package in the same source hierarchy (e.g., <code>mylib</code>’s dependency on <code>"//java/com/example/common"</code>), or on a third-party artifact outside of the source hierarchy (e.g., <code>mylib</code>’s dependency on <code>"@com_google_common_guava_guava//jar"</code>). Each source hierarchy is called a <em>workspace</em> and is identified by the presence of a special <em>WORKSPACE</em> file at the root.</p>
|
||||
|
||||
<p>Like with Ant, users perform builds using Bazel’s command-line tool.<a contenteditable="false" data-primary="Bazel" data-secondary="getting concrete with" data-tertiary="performing builds with command line" data-type="indexterm" id="id-qYCeHdS8sJsqSqfn"> </a> To build the <code>MyBinary</code> target, a user would run <code>bazel build :MyBinary</code>. Upon entering that command for the first time in a clean repository, Bazel would do the following:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>Parse every <em>BUILD</em> file in the workspace to create a graph of dependencies among artifacts.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Use the graph to determine the <em>transitive dependencies</em> of <code>MyBinary;</code> that is, every target that <code>MyBinary</code> depends on and every target that those targets depend on, recursively.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Build (or download for external dependencies) each of those dependencies, in order. Bazel starts by building each target that has no other dependencies and keeps track of which dependencies still need to be built for each target. As soon as all of a target’s dependencies are built, Bazel starts building that target. This process continues until every one of <code>MyBinary</code>’s transitive dependencies have been built.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Build <code>MyBinary</code> to produce a final executable binary that links in all of the dependencies that were built in step 3.</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>Fundamentally, it might not seem like what’s happening here is that much different than what happened when using a task-based build system. Indeed, the end result is the same binary, and the process for producing it involved analyzing a bunch of steps to find dependencies among them, and then running those steps in order. But there are critical differences.<a contenteditable="false" data-primary="Bazel" data-secondary="getting concrete with" data-tertiary="parallelization of build steps" data-type="indexterm" id="id-5BC3HgfRs2sQSwfX"> </a> The first one appears in step 3: because Bazel knows that each target will only produce a Java library, it knows that all it has to do is run the Java compiler rather than an arbitrary user-defined script, so it knows that it’s safe to run these steps in parallel.<a contenteditable="false" data-primary="parallelization of build steps" data-secondary="in Bazel" data-type="indexterm" id="id-ZJCdt4f6sVseSJfO"> </a> This can produce an order of magnitude performance improvement over building targets one at a time on a multicore machine, and is only possible because the artifact-based approach leaves the build system in charge of its own execution strategy so that it can make stronger guarantees about parallelism.</p>
|
||||
|
||||
<p>The benefits extend beyond parallelism, though. The next thing that this approach gives us becomes apparent when the developer types <code>bazel build :MyBinary</code> a second time without making any changes: Bazel will exit in less than a second with a message saying that the target is up to date.<a contenteditable="false" data-primary="Bazel" data-secondary="getting concrete with" data-tertiary="rebuilding only minimum set of targets each time" data-type="indexterm" id="id-84C0tyUls6sbSVfV"> </a> This is possible due to the functional programming paradigm we talked about earlier—Bazel knows that each target is the result only of running a Java compiler, and it knows that the output from the Java compiler depends only on its inputs, so as long as the inputs haven’t changed, the output can be reused. And this analysis works at every level; if <code>MyBinary.java</code> changes, Bazel knows to rebuild <code>MyBinary</code> but reuse <code>mylib</code>. If a source file for <code>//java/com/example/common</code> changes, Bazel knows to rebuild that library, <code>mylib</code>, and <code>MyBinary</code>, but reuse <code>//java/com/example/myproduct/otherlib</code>. Because Bazel knows about the properties of the tools it runs at every step, it’s able to rebuild only the minimum set of artifacts each time while guaranteeing that it won’t produce stale builds.</p>
|
||||
|
||||
<p>Reframing the build process in terms of artifacts rather than tasks is subtle but powerful. By reducing the flexibility exposed to the programmer, the build system can know more about what is being done at every step of the build. It can use this knowledge to make the build far more efficient by parallelizing build processes and reusing their outputs. But this is really just the first step, and these building blocks of parallelism and reuse will form the basis for a distributed and highly scalable build system that will be discussed later.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="other_nifty_bazel_tricks">
|
||||
<h3>Other nifty Bazel tricks</h3>
|
||||
|
||||
<p>Artifact-based build systems fundamentally solve the problems with parallelism and reuse that are inherent in task-based build systems.<a contenteditable="false" data-primary="artifact-based build systems" data-secondary="other nifty Bazel tricks" data-type="indexterm" id="ix_artibldBzl"> </a> But there are still a few problems that came up earlier that we haven’t addressed. Bazel has clever ways of solving each of these, and we should discuss them before moving on.</p>
|
||||
|
||||
<section data-type="sect4" id="tools_as_dependencies">
|
||||
<h4>Tools as dependencies</h4>
|
||||
|
||||
<p>One problem we ran into <a contenteditable="false" data-primary="Bazel" data-secondary="tools as dependencies" data-type="indexterm" id="id-agCwHwtVc3SBSdf4"> </a>earlier was that builds depended on the tools installed on our machine, and reproducing builds across systems could be difficult due to different tool versions or locations. The problem becomes even more difficult when your project uses languages that require different tools based on which platform they’re being built on or compiled for (e.g., Windows versus Linux), and each of those platforms requires a slightly different set of tools to do the same job.</p>
|
||||
|
||||
<p>Bazel solves the first part of this problem by treating tools as dependencies to each target.<a contenteditable="false" data-primary="dependencies" data-secondary="Bazel treating tools as dependencies to each target" data-type="indexterm" id="id-QjCwH8cVczSySnfa"> </a> Every <code>java_library</code> in the workspace implicitly depends on a Java compiler, which defaults to a well-known compiler but can be configured globally at the <span class="keep-together">workspace</span> level. Whenever Blaze builds a <code>java_library</code>, it checks to make sure that the specified compiler is available at a known location and downloads it if not. Just like any other dependency, if the Java compiler changes, every artifact that was dependent upon it will need to be rebuilt. Every type of target defined in Bazel uses this same strategy of declaring the tools it needs to run, ensuring that Bazel is able to bootstrap them no matter what exists on the system where it runs.</p>
|
||||
|
||||
<p>Bazel solves the second part of <a contenteditable="false" data-primary="Bazel" data-secondary="platform independence using toolchains" data-type="indexterm" id="id-X3CoH0sDcXS5SEfl"> </a>the problem, platform independence, by using <a href="https://oreil.ly/ldiv8">toolchains</a>. Rather than having targets depend directly on their tools, they actually depend on types of toolchains. <a contenteditable="false" data-primary="toolchains, use by Bazel" data-type="indexterm" id="id-daCZc0sjclSOSrf7"> </a>A toolchain contains a set of tools and other properties defining how a type of target is built on a particular platform. The workspace can define the particular toolchain to use for a toolchain type based on the host and target platform. For more details, see the Bazel manual.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="extending_the_build_system">
|
||||
<h4>Extending the build system</h4>
|
||||
|
||||
<p>Bazel comes with targets for several popular programming languages out of the box, but engineers will always<a contenteditable="false" data-primary="Bazel" data-secondary="extending the build system" data-type="indexterm" id="id-QjCwHmt0szSySnfa"> </a> want to do more—part of the benefit of task-based systems is their flexibility in supporting any kind of build process, and it would be better not to give that up in an artifact-based build system. <a contenteditable="false" data-primary="rules, defining in Bazel" data-type="indexterm" id="id-X3CAtLtLsXS5SEfl"> </a>Fortunately, Bazel allows its supported target types to be extended by <a href="https://oreil.ly/Vvg5D">adding custom rules</a>.</p>
|
||||
|
||||
<p>To define a rule in Bazel, the rule author declares the inputs that the rule requires (in the form of attributes passed in the <em>BUILD</em> file) and the fixed set of outputs that the rule produces. The author also defines the <em>actions</em> that will be generated by that rule. Each action declares its inputs and outputs, runs a particular executable or writes a particular string to a file, and can be connected to other actions via its inputs and outputs. This means that actions are the lowest-level composable unit in the build system—an action can do whatever it wants so long as it uses only its declared inputs and outputs, and Bazel will take care of scheduling actions and caching their results as appropriate.</p>
|
||||
|
||||
<p>The system isn’t foolproof given that there’s no way to stop an action developer from doing something like introducing a nondeterministic process as part of their action. But this doesn’t happen very often in practice, and pushing the possibilities for abuse all the way down to the action level greatly decreases opportunities for errors. Rules supporting many common languages and tools are widely available online, and most projects will never need to define their own rules. Even for those that do, rule definitions only need to be defined in one central place in the repository, meaning most engineers will be able to use those rules without ever having to worry about their implementation.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="isolating_the_environment">
|
||||
<h4>Isolating the environment</h4>
|
||||
|
||||
<p>Actions sound like they might run<a contenteditable="false" data-primary="Bazel" data-secondary="isolating the environment" data-type="indexterm" id="id-X3CoHLt6SXS5SEfl"> </a> into the same problems as tasks in other systems—isn’t it still possible to write actions that both write to the same file and end up conflicting with one another? <a contenteditable="false" data-primary="sandboxing" data-secondary="use by Bazel" data-type="indexterm" id="id-qYCYt0tOSjSqSqfn"> </a>Actually, Bazel makes these <span class="keep-together">conflicts</span> impossible by using <a href="https://oreil.ly/lP5Y9"><em>sandboxing</em></a>. On supported systems, every action is isolated from every other action via a filesystem sandbox. Effectively, each action can see only a restricted view of the filesystem that includes the inputs it has declared and any outputs it has produced. This is enforced by systems such as LXC on Linux, the same technology behind Docker. This means that it’s impossible for actions to conflict with one another because they are unable to read any files they don’t declare, and any files that they write but don’t declare will be thrown away when the action finishes. Bazel also uses sandboxes to restrict actions from communicating via the network.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="making_external_dependencies_determinis">
|
||||
<h4>Making external dependencies deterministic</h4>
|
||||
|
||||
<p>There’s still one problem remaining: build systems often <a contenteditable="false" data-primary="Bazel" data-secondary="making external dependencies deterministic" data-type="indexterm" id="id-qYCeH0twhjSqSqfn"> </a>need to download dependencies (whether tools or libraries) from external sources rather than directly building them.<a contenteditable="false" data-primary="dependencies" data-secondary="making external dependencies deterministic in Bazel" data-type="indexterm" id="id-daCotgt0hlSOSrf7"> </a> This can be seen in the example via the <code>@com_google_common_guava_guava//jar</code> dependency, which downloads a JAR file from Maven.</p>
|
||||
|
||||
<p>Depending on files outside of the current workspace is risky. Those files could change at any time, potentially requiring the build system to constantly check whether they’re fresh.<a contenteditable="false" data-primary="unreproducable builds" data-type="indexterm" id="id-daCYHAc0hlSOSrf7"> </a> If a remote file changes without a corresponding change in the workspace source code, it can also lead to unreproducible builds—a build might work one day and fail the next for no obvious reason due to an unnoticed dependency change.<a contenteditable="false" data-primary="security" data-secondary="risks introduced by external dependencies" data-type="indexterm" id="id-5BCJtMcnh8SQSwfX"> </a> Finally, an external dependency can introduce a huge security risk when it is owned by a third party:<sup><a data-type="noteref" id="ch01fn188-marker" href="ch18.html#ch01fn188">4</a></sup> if an attacker is able to infiltrate that third-party server, they can replace the dependency file with something of their own design, potentially giving them full control over your build environment and its output.</p>
|
||||
|
||||
<p>The fundamental problem is that we want the build system to be aware of these files without having to check them into source control. Updating a dependency should be a conscious choice, but that choice should be made once in a central place rather than managed by individual engineers or automatically by the system. This is because even with a "Live at Head" model, we still want builds to be deterministic, which implies that if you check out a commit from last week, you should see your dependencies as they were then rather than as they are now.</p>
|
||||
|
||||
<p>Bazel and some other build systems address this problem by requiring a workspace-wide manifest file that lists a <em>cryptographic hash</em> for every external<a contenteditable="false" data-primary="cryptographic hashes" data-type="indexterm" id="id-84C0t0SghBSbSVfV"> </a> dependency in the workspace.<sup><a data-type="noteref" id="ch01fn189-marker" href="ch18.html#ch01fn189">5</a></sup> The hash is a concise way to uniquely represent the file without checking the entire file into source control. Whenever a new external dependency is referenced from a workspace, that dependency’s hash is added to the manifest, either manually or automatically. When Bazel runs a build, it checks the actual hash of its cached dependency against the expected hash defined in the manifest and redownloads the file only if the hash differs.</p>
|
||||
|
||||
<p>If the artifact we download has a different hash than the one declared in the manifest, the build will fail unless the hash in the manifest is updated. This can be done automatically, but that change must be approved and checked into source control before the build will accept the new dependency. This means that there’s always a record of when a dependency was updated, and an external dependency can’t change without a corresponding change in the workspace source. It also means that, when checking out an older version of the source code, the build is guaranteed to use the same dependencies that it was using at the point when that version was checked in (or else it will fail if those dependencies are no longer available).</p>
|
||||
|
||||
<p>Of course, it can still be a problem if a remote server becomes unavailable or starts serving corrupt data—this can cause all of your builds to begin failing if you don’t have another copy of that dependency available. To avoid this problem, we recommend that, for any nontrivial project, you mirror all of its dependencies onto servers or services that you trust and control. Otherwise you will always be at the mercy of a third party for your build system’s availability, even if the checked-in hashes guarantee its security.<a contenteditable="false" data-primary="artifact-based build systems" data-secondary="other nifty Bazel tricks" data-startref="ix_artibldBzl" data-type="indexterm" id="id-12C7HjfKhNS1S1fB"> </a><a contenteditable="false" data-primary="build systems" data-secondary="modern" data-startref="ix_bldsysmodart" data-tertiary="artifact-based" data-type="indexterm" id="id-rpCNtrfmhXSdSVfV"> </a><a contenteditable="false" data-primary="artifact-based build systems" data-startref="ix_artibld" data-type="indexterm" id="id-O7CLcQfDhVSWSnf9"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="distributed_builds">
|
||||
<h2>Distributed Builds</h2>
|
||||
|
||||
<p>Google’s codebase is enormous—with more than two<a contenteditable="false" data-primary="build systems" data-secondary="modern" data-tertiary="distributed builds" data-type="indexterm" id="ix_bldsysmoddst"> </a> billion lines of code, chains of dependencies can become very deep.<a contenteditable="false" data-primary="distributed builds" data-type="indexterm" id="ix_dstbld"> </a> Even simple binaries at Google often depend on tens of thousands of build targets. At this scale, it’s simply impossible to complete a build in a reasonable amount of time on a single machine: no build system can get around the fundamental laws of physics imposed on a machine’s hardware. The only way to make this work is with a build system that supports <em>distributed builds</em> wherein the units of work being done by the system are spread across an arbitrary and scalable number of machines. Assuming we’ve broken the system’s work into small enough units (more on this later), this would allow us to complete any build of any size as quickly as we’re willing to pay for. This scalability is the holy grail we’ve been working toward by defining an artifact-based build system.</p>
|
||||
|
||||
<section data-type="sect3" id="remote_caching">
|
||||
<h3>Remote caching</h3>
|
||||
|
||||
<p>The simplest type<a contenteditable="false" data-primary="remote caching in distributed builds" data-type="indexterm" id="id-gqCzH2tyc9hofy"> </a> of distributed build is one<a contenteditable="false" data-primary="distributed builds" data-secondary="remote caching" data-type="indexterm" id="id-9ACBt0tbcJhEf6"> </a> that only leverages <em>remote caching</em>, which is shown in <a data-type="xref" href="ch18.html#a_distributed_build_showing_remote_cach">Figure 18-2</a>.</p>
|
||||
|
||||
<figure id="a_distributed_build_showing_remote_cach"><img alt="A distributed build showing remote caching" src="images/seag_1802.png">
|
||||
<figcaption><span class="label">Figure 18-2. </span>A distributed build showing remote caching</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Every system that performs builds, including both developer workstations and continuous integration systems, shares a reference to a common remote cache service. This service might be a fast and local short-term storage system like Redis or a cloud service like Google Cloud Storage. Whenever a user needs to build an artifact, whether directly or as a dependency, the system first checks with the remote cache to see if that artifact already exists there. If so, it can download the artifact instead of building it. If not, the system builds the artifact itself and uploads the result back to the cache. This means that low-level dependencies that don’t change very often can be built once and shared across users rather than having to be rebuilt by each user. At Google, many artifacts are served from a cache rather than built from scratch, vastly reducing the cost of running our build system.</p>
|
||||
|
||||
<p>For a remote caching system to work, the build system must guarantee that builds are completely reproducible. That is, for any build target, it must be possible to determine the set of inputs to that target such that the same set of inputs will produce exactly the same output on any machine. This is the only way to ensure that the results of downloading an artifact are the same as the results of building it oneself. <a contenteditable="false" data-primary="Bazel" data-secondary="remote caching and reproducible builds" data-type="indexterm" id="id-agCwHlSVcKh3fe"> </a>Fortunately, Bazel provides this guarantee and so supports <a href="https://oreil.ly/D9doX">remote caching</a>. Note that this requires that each artifact in the cache be keyed on both its target and a hash of its inputs—that way, different engineers could make different modifications to the same target at the same time, and the remote cache would store all of the resulting artifacts and serve them appropriately without conflict.</p>
|
||||
|
||||
<p>Of course, for there to be any benefit from a remote cache, downloading an artifact needs to be faster than building it. This is not always the case, especially if the cache server is far from the machine doing the build. Google’s network and build system is carefully tuned to be able to quickly share build results. When configuring remote caching in your organization, take care to consider network latencies and perform experiments to ensure that the cache is actually improving performance.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="remote_execution">
|
||||
<h3>Remote execution</h3>
|
||||
|
||||
<p>Remote caching isn’t a true distributed build.<a contenteditable="false" data-primary="distributed builds" data-secondary="remote execution" data-type="indexterm" id="id-9ACYH0t2sJhEf6"> </a> If the cache is lost or if you make a low-level change <a contenteditable="false" data-primary="remote execution of distributed builds" data-type="indexterm" id="id-6MCDtEtwsnhEfq"> </a>that requires everything to be rebuilt, you still need to perform the entire build locally on your machine. The true goal is to support <em>remote execution</em>, in which the actual work of doing the build can be spread across any number of workers. <a data-type="xref" href="ch18.html#a_remote_execution_system">Figure 18-3</a> depicts a remote execution system.</p>
|
||||
|
||||
<figure id="a_remote_execution_system"><img alt="A remote execution system" src="images/seag_1803.png">
|
||||
<figcaption><span class="label">Figure 18-3. </span>A remote execution system</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The build tool running on each user’s machine (where users are either human engineers or automated build systems) sends requests to a central build master. The build master breaks the requests into their component actions and schedules the execution of those actions over a scalable pool of workers. Each worker performs the actions asked of it with the inputs specified by the user and writes out the resulting artifacts. These artifacts are shared across the other machines executing actions that require them until the final output can be produced and sent to the user.</p>
|
||||
|
||||
<p>The trickiest part of implementing such a system is managing the communication between the workers, the master, and the user’s local machine. Workers might depend on intermediate artifacts produced by other workers, and the final output needs to be sent back to the user’s local machine. To do this, we can build on top of the distributed cache described previously by having each worker write its results to and read its dependencies from the cache. The master blocks workers from proceeding until everything they depend on has finished, in which case they’ll be able to read their inputs from the cache. The final product is also cached, allowing the local machine to download it. Note that we also need a separate means of exporting the local changes in the user’s source tree so that workers can apply those changes before building.</p>
|
||||
|
||||
<p>For this to work, all of the parts of the artifact-based build systems described earlier need to come together. Build environments must be completely self-describing so that we can spin up workers without human intervention. Build processes themselves must be completely self-contained because each step might be executed on a different machine. Outputs must be completely deterministic so that each worker can trust the results it receives from other workers. Such guarantees are extremely difficult for a task-based system to provide, which makes it nigh-impossible to build a reliable remote execution system on top of one.</p>
|
||||
|
||||
<section data-type="sect4" id="distributed_builds_at_google">
|
||||
<h4>Distributed builds at Google</h4>
|
||||
|
||||
<p>Since 2008, Google has been using a distributed <a contenteditable="false" data-primary="distributed builds" data-secondary="at Google" data-type="indexterm" id="id-daCYHgtVfvsEhrf7"> </a>build system that employs both remote caching and remote execution, which is illustrated in <a data-type="xref" href="ch18.html#googleapostrophes_distributed_build_sys">Figure 18-4</a>.</p>
|
||||
|
||||
<figure id="googleapostrophes_distributed_build_sys"><img alt="Google’s distributed build system" src="images/seag_1804.png">
|
||||
<figcaption><span class="label">Figure 18-4. </span>Google’s distributed build system</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Google’s remote cache is called ObjFS.<a contenteditable="false" data-primary="remote caching in distributed builds" data-secondary="Google's remote cache" data-type="indexterm" id="id-ZJCDHnsYfVsjhJfO"> </a> It consists of a backend that stores build outputs in <a href="https://oreil.ly/S_N-D">Bigtables</a> distributed throughout our fleet of production machines and a frontend FUSE daemon named objfsd that runs on each developer’s machine. The FUSE daemon allows engineers to browse build outputs as if they were normal files stored on the workstation, but with the file content downloaded on-demand only for the few files that are directly requested by the user. Serving file contents on-demand greatly reduces both network and disk usage, and the system is able to <a href="https://oreil.ly/NZxSp">build twice as fast</a> compared to when we stored all build output on the developer’s local disk.</p>
|
||||
|
||||
<p>Google’s remote execution system is called Forge.<a contenteditable="false" data-primary="Forge" data-type="indexterm" id="id-84CrH0Syf6slhVfV"> </a><a contenteditable="false" data-primary="remote execution of distributed builds" data-secondary="Google remote execution system, Forge" data-type="indexterm" id="id-12CQt6SDfOsYh1fB"> </a> A Forge client in Blaze called the Distributor sends requests for each action to a job running in our datacenters called the Scheduler. The Scheduler maintains a cache of action results, allowing it to return a response immediately if the action has already been created by any other user of the system. If not, it places the action into a queue. A large pool of Executor jobs continually read actions from this queue, execute them, and store the results directly in the ObjFS Bigtables. These results are available to the executors for future actions, or to be downloaded by the end user via objfsd.</p>
|
||||
|
||||
<p>The end result is a system that scales to efficiently support all builds performed at Google. And the scale of Google’s builds is truly massive: Google runs millions of builds executing millions of test cases and producing petabytes of build outputs from billions of lines of source code every <em>day</em>. Not only does such a system let our engineers build complex codebases quickly, it also allows us to implement a huge number of automated tools and systems that rely on our build. We put many years of effort into developing this system, but nowadays open source tools are readily available such that any organization can implement a similar system. Though it can take time and energy to deploy such a build system, the end result can be truly magical for engineers and is often well worth the effort.<a contenteditable="false" data-primary="distributed builds" data-startref="ix_dstbld" data-type="indexterm" id="id-rpCNtQh4fqs0hVfV"> </a><a contenteditable="false" data-primary="build systems" data-secondary="modern" data-startref="ix_bldsysmoddst" data-tertiary="distributed builds" data-type="indexterm" id="id-O7CLcmhgfmsehnf9"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="timecomma_scalecomma_trade-offs">
|
||||
<h2>Time, Scale, Trade-Offs</h2>
|
||||
|
||||
<p>Build systems are all about making code easier to work with at scale and over time. And like everything in <a contenteditable="false" data-primary="build systems" data-secondary="modern" data-tertiary="time, scale, and trade-offs" data-type="indexterm" id="id-l1CZH6tOfvfz"> </a>software engineering, there are trade-offs in choosing which sort of build system to use. The DIY approach using shell scripts or direct invocations of tools works only for the smallest projects that don’t need to deal with code changing over a long period of time, or for languages like Go that have a built-in build <span class="keep-together">system.</span></p>
|
||||
|
||||
<p>Choosing a task-based build system<a contenteditable="false" data-primary="task-based build systems" data-secondary="time, scale, and trade-offs" data-type="indexterm" id="id-gqCzHoc2fefj"> </a> instead of relying on DIY scripts greatly improves your project’s ability to scale, allowing you to automate complex builds and more easily reproduce those builds across machines. The trade-off is that you need to actually start putting some thought into how your build is structured and deal with the overhead of writing build files (though automated tools can often help with this). This trade-off tends to be worth it for most projects, but for particularly trivial projects (e.g., those contained in a single source file), the overhead might not buy you much.</p>
|
||||
|
||||
<p>Task-based build systems begin to run into some fundamental problems as the project scales further, and these issues can be remedied by using an artifact-based build system instead.<a contenteditable="false" data-primary="artifact-based build systems" data-secondary="time, scale, and trade-offs" data-type="indexterm" id="id-9ACYHysrf8fA"> </a> Such build systems unlock a whole new level of scale because huge builds can now be distributed across many machines, and thousands of engineers can be more certain that their builds are consistent and reproducible. As with so many other topics in this book, the trade-off here is a lack of flexibility: artifact-based systems don’t let you write generic tasks in a real programming language, but require you to work within the constraints of the system. This is usually not a problem for projects that are designed to work with artifact-based systems from the start, but migration from an existing task-based system can be difficult and is not always worth it if the build isn’t already showing problems in terms of speed or correctness.</p>
|
||||
|
||||
<p>Changes to a project’s build system can be expensive, and that cost increases as the project becomes larger. This is why Google believes that almost every new project benefits from incorporating an artifact-based build system like Bazel right from the start. Within Google, essentially all code from tiny experimental projects up to Google Search is built using Blaze.<a contenteditable="false" data-primary="build systems" data-secondary="modern" data-startref="ix_bldsysmod" data-type="indexterm" id="id-6MCbH4S6fOfz"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="dealing_with_modules_and_dependencies">
|
||||
<h1>Dealing with Modules and Dependencies</h1>
|
||||
|
||||
<p>Projects that use <a contenteditable="false" data-primary="build systems" data-secondary="dealing with modules and dependencies" data-type="indexterm" id="ix_blssysMD"> </a>artifact-based build systems like Bazel are broken into a set of modules, with<a contenteditable="false" data-primary="modules, dealing with in build systems" data-type="indexterm" id="ix_mods"> </a> modules expressing dependencies on one another via <em>BUILD</em> files. Proper organization of these modules and dependencies can have a huge effect on both the performance of the build system and how much work it takes to maintain.</p>
|
||||
|
||||
<section data-type="sect2" id="using_fine-grained_modules_and_the_oneo">
|
||||
<h2>Using Fine-Grained Modules and the 1:1:1 Rule</h2>
|
||||
|
||||
<p>The first question that comes up when structuring an artifact-based build is deciding how much functionality<a contenteditable="false" data-primary="build systems" data-secondary="dealing with modules and dependencies" data-tertiary="using fine-grained modules and 1:1:1 rule" data-type="indexterm" id="id-NoC0H6tgcwUO"> </a> an individual module<a contenteditable="false" data-primary="modules, dealing with in build systems" data-secondary="using fine-grained modules and 1:1:1 rule" data-type="indexterm" id="id-4JC7trtec6UB"> </a> should encompass. In Bazel, a "module" is represented by a target specifying a buildable unit like a <code>java_library</code> or a <code>go_binary</code>. At one extreme, the entire project could be contained in a single module by putting one BUILD file at the root and recursively globbing together all of that project’s source files. At the other extreme, nearly every source file could be made into its own module, effectively requiring each file to list in a <em>BUILD</em> file every other file it depends on.</p>
|
||||
|
||||
<p>Most projects fall somewhere between these extremes, and the choice involves a trade-off between performance and maintainability. Using a single module for the entire project might mean that you never need to touch the <em>BUILD</em> file except when adding an external dependency, but it means that the build system will always need to build the entire project all at once. This means that it won’t be able to parallelize or distribute parts of the build, nor will it be able to cache parts that it’s already built. One-module-per-file is the opposite: the build system has the maximum flexibility in caching and scheduling steps of the build, but engineers need to expend more effort maintaining lists of dependencies whenever they change which files reference which.</p>
|
||||
|
||||
<p>Though the exact granularity varies by language (and often even within language), Google tends to favor significantly smaller modules than one might typically write in a task-based build system. A typical production binary at Google will likely depend on tens of thousands of targets, and even a moderate-sized team can own several hundred targets within its codebase. <a contenteditable="false" data-primary="1:1:1 rule" data-primary-sortas="one one one" data-type="indexterm" id="id-AqCMHos9czUK"> </a>For languages like Java that have a strong built-in notion of packaging, each directory usually contains a single package, target, and <em>BUILD</em> file (Pants, another build system based on Blaze, calls this the <a href="https://oreil.ly/lSKbW">1:1:1 rule</a>). Languages with weaker packaging conventions will frequently define multiple targets per <em>BUILD</em> file.</p>
|
||||
|
||||
<p>The benefits of smaller build targets really begin to show at scale because they lead to faster distributed builds and a less frequent need to rebuild targets. The advantages become even more compelling after testing enters the picture, as finer-grained targets mean that the build system can be much smarter about running only a limited subset of tests that could be affected by any given change. Because Google believes in the systemic benefits of using smaller targets, we’ve made some strides in mitigating the downside by investing in tooling to automatically manage <em>BUILD</em> files to avoid burdening developers. Many of <a href="https://oreil.ly/r0wO7">these tools</a> are now open source.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="minimizing_module_visibility">
|
||||
<h2>Minimizing Module Visibility</h2>
|
||||
|
||||
<p>Bazel and other build systems allow each target to specify a visibility: a property that specifies which other targets may depend on it.<a contenteditable="false" data-primary="build systems" data-secondary="dealing with modules and dependencies" data-tertiary="minimizing module visibility" data-type="indexterm" id="id-4JCNHrtps6UB"> </a><a contenteditable="false" data-primary="visibility, minimizing for modules in build systems" data-type="indexterm" id="id-AqCntrtWszUK"> </a><a contenteditable="false" data-primary="modules, dealing with in build systems" data-secondary="minimizing module visibility" data-type="indexterm" id="id-l1C8c6tMsjUz"> </a> Targets can be <code>public</code>, in which case they can be referenced by any other target in the workspace; <code>private</code>, in which case they can be referenced only from within the same <em>BUILD</em> file; or visible to only an explicitly defined list of other targets. A visibility is essentially the opposite of a dependency: if target A wants to depend on target B, target B must make itself visible to target A.</p>
|
||||
|
||||
<p>Just like in most programming languages, it is usually best to minimize visibility as much as possible. Generally, teams at Google will make targets public only if those targets represent widely used libraries available to any team at Google. Teams that require others to coordinate with them before using their code will maintain a whitelist of customer targets as their target’s visibility. Each team’s internal implementation targets will be restricted to only directories owned by the team, and most <em>BUILD</em> files will have only one target that isn’t private.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="managing_dependencies">
|
||||
<h2>Managing Dependencies</h2>
|
||||
|
||||
<p>Modules need to be able to<a contenteditable="false" data-primary="build systems" data-secondary="dealing with modules and dependencies" data-tertiary="managing dependencies" data-type="indexterm" id="ix_bldsysMDdep"> </a> refer to one another.<a contenteditable="false" data-primary="modules, dealing with in build systems" data-secondary="managing dependencies" data-type="indexterm" id="ix_modsdep"> </a> The downside of breaking a codebase into fine-grained modules is that you need to manage <a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-type="indexterm" id="ix_depmg"> </a>the dependencies among those modules (though tools can help automate this). Expressing these dependencies usually ends up being the bulk of the content in a <em>BUILD</em> file.</p>
|
||||
|
||||
<section data-type="sect3" id="internal_dependencies">
|
||||
<h3>Internal dependencies</h3>
|
||||
|
||||
<p>In a large project broken<a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="internal dependencies" data-type="indexterm" id="id-gqCzH2tyc8S2Uy"> </a> into fine-grained modules, most dependencies are likely to be internal; that is, on another target defined and built in the same source repository. Internal dependencies differ from external dependencies in that they are built from source rather than downloaded as a prebuilt artifact while running the build. This also means that there’s no notion of "version" for internal dependencies—a target and all of its internal dependencies are always built at the same commit/revision in the repository.</p>
|
||||
|
||||
<p>One issue that should be handled carefully with regard to internal dependencies is how to <a contenteditable="false" data-primary="transitive dependencies" data-type="indexterm" id="id-9ACYHvcbcoSyU6"> </a>treat <em>transitive dependencies</em> (<a data-type="xref" href="ch18.html#transitive_dependencies">Figure 18-5</a>). Suppose target A depends on target B, which depends on a common library target C. Should target A be able to use classes defined in target C?</p>
|
||||
|
||||
<figure id="transitive_dependencies"><img alt="Transitive dependencies" src="images/seag_1805.png">
|
||||
<figcaption><span class="label">Figure 18-5. </span>Transitive dependencies</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>As far as the underlying tools are concerned, there’s no problem with this; both B and C will be linked into target A when it is built, so any symbols defined in C are known to A. Blaze allowed this for many years, but as Google grew, we began to see problems. Suppose that B was refactored such that it no longer needed to depend on C. If B’s dependency on C was then removed, A and any other target that used C via a dependency on B would break. Effectively, a target’s dependencies became part of its public contract and could never be safely changed. This meant that dependencies accumulated over time and builds at Google started to slow down.<a contenteditable="false" data-primary="transitive dependencies" data-secondary="strict, enforcing" data-type="indexterm" id="id-agCwHlSVc3SRUe"> </a></p>
|
||||
|
||||
<p>Google eventually solved this issue by introducing a "strict transitive dependency mode" in Blaze. In this mode, Blaze detects whether a target tries to reference a symbol without depending on it directly and, if so, fails with an error and a shell command that can be used to automatically insert the dependency. Rolling this change out across Google’s entire codebase and refactoring every one of our millions of build targets to explicitly list their dependencies was a multiyear effort, but it was well worth it. Our builds are now much faster given that targets have fewer unnecessary dependencies,<sup><a data-type="noteref" id="ch01fn190-marker" href="ch18.html#ch01fn190">6</a></sup> and engineers are empowered to remove dependencies they don’t need without worrying about breaking targets that depend on them.</p>
|
||||
|
||||
<p>As usual, enforcing strict transitive dependencies involved a trade-off. It made build files more verbose, as frequently used libraries now need to be listed explicitly in many places rather than pulled in incidentally, and engineers needed to spend more effort adding dependencies to <em>BUILD</em> files. We’ve since developed tools that reduce this toil by automatically detecting many missing dependencies and adding them to a <em>BUILD</em> files without any developer intervention. But even without such tools, we’ve found the trade-off to be well worth it as the codebase scales: explicitly adding a dependency to <em>BUILD</em> file is a one-time cost, but dealing with implicit transitive dependencies can cause ongoing problems as long as the build target exists. <a href="https://oreil.ly/Z-CqD">Bazel enforces strict transitive dependencies</a> on Java code by default.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="external_dependencies">
|
||||
<h3>External dependencies</h3>
|
||||
|
||||
<p>If a dependency<a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="external dependencies" data-type="indexterm" id="id-9ACYH0t2soSyU6"> </a> isn’t internal, it must be external. External dependencies are those on artifacts that are built and stored outside of the build system. The dependency is imported directly from an <em>artifact repository</em> (typically accessed over the internet) and used as-is rather than being built from source. One of the biggest differences between external and internal dependencies is that external dependencies have <em>versions</em>, and those versions exist independently of the project’s source code.</p>
|
||||
|
||||
<section data-type="sect4" id="automatic_versus_manual_dependency_mana">
|
||||
<h4>Automatic versus manual dependency management</h4>
|
||||
|
||||
<p>Build systems can allow the versions of external dependencies to be managed either manually or automatically. <a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="automatic vs. manual management" data-type="indexterm" id="id-agCwHwtVc6sBS0U4"> </a>When managed manually, the buildfile explicitly lists the version it wants to download from <a contenteditable="false" data-primary="semantic version strings" data-type="indexterm" id="id-QjCAtmtVcZsySrUa"> </a>the artifact repository, often using <a href="https://semver.org">a semantic version string</a> such as "1.1.4". When managed automatically, the source file specifies a range of acceptable versions, and the build system always downloads the latest one.<a contenteditable="false" data-primary="Gradle" data-secondary="dependency versions" data-type="indexterm" id="id-qYCvs0tocJsqS9Un"> </a> For example, Gradle allows a dependency version to be declared as "1.+" to specify that any minor or patch version of a dependency is acceptable so long as the major version is 1.</p>
|
||||
|
||||
<p>Automatically managed dependencies can be convenient for small projects, but they’re usually a recipe for disaster on projects of nontrivial size or that are being worked on by more than one engineer. The problem with automatically managed dependencies is that you have no control over when the version is updated. There’s no way to guarantee that external parties won’t make breaking updates (even when they claim to use semantic versioning), so a build that worked one day might be broken the next with no easy way to detect what changed or to roll it back to a working state. Even if the build doesn’t break, there can be subtle behavior or performance changes that are impossible to track down.</p>
|
||||
|
||||
<p>In contrast, because manually managed dependencies require a change in source control, they can be easily discovered and rolled back, and it’s possible to check out an older version of the repository to build with older dependencies.<a contenteditable="false" data-primary="Bazel" data-secondary="dependency versions" data-type="indexterm" id="id-X3CoH0sDcds5SYUl"> </a> Bazel requires that versions of all dependencies be specified manually. At even moderate scales, the overhead of manual version management is well worth it for the stability it provides.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="the_one-version_rule">
|
||||
<h4>The One-Version Rule</h4>
|
||||
|
||||
<p>Different versions of a library are usually represented by different artifacts, so in theory there’s no reason that different<a contenteditable="false" data-primary="One-Version Rule" data-type="indexterm" id="id-QjCwHmt0sZsySrUa"> </a><a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="One-Version Rule" data-type="indexterm" id="id-X3CAtLtLsds5SYUl"> </a> versions of the same external dependency couldn’t both be declared in the build system under different names. That way, each target could choose which version of the dependency it wanted to use. Google has found this to cause a lot of problems in practice, so we enforce a strict <a href="https://oreil.ly/OFa9V"><em>One-Version Rule</em></a> for all third-party dependencies in our internal codebase.</p>
|
||||
|
||||
<p>The biggest problem with allowing multiple versions is the <em>diamond dependency</em> issue. <a contenteditable="false" data-primary="diamond dependency issue" data-type="indexterm" id="id-qYCYtwc8sJsqS9Un"> </a>Suppose that target A depends on target B and on v1 of an external library. If target B is later refactored to add a dependency on v2 of the same external library, target A will break because it now depends implicitly on two different versions of the same library. Effectively, it’s never safe to add a new dependency from a target to any third-party library with multiple versions, because any of that target’s users could already be depending on a different version. Following the One-Version Rule makes this conflict impossible—if a target adds a dependency on a third-party library, any existing dependencies will already be on that same version, so they can happily <span class="keep-together">coexist.</span></p>
|
||||
|
||||
<p>We’ll examine this further in the context of a large monorepo in <a data-type="xref" href="ch21.html#dependency_management">Dependency Management</a>.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="transitive_external_dependencies">
|
||||
<h4>Transitive external dependencies</h4>
|
||||
|
||||
<p>Dealing with the transitive dependencies of an external dependency can be particularly difficult.<a contenteditable="false" data-primary="transitive dependencies" data-secondary="external" data-type="indexterm" id="id-X3CoHLt6Sds5SYUl"> </a><a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="transitive external dependencies" data-type="indexterm" id="id-qYCYt0tOSJsqS9Un"> </a> Many artifact repositories such as Maven Central allow artifacts to specify dependencies on particular versions of other artifacts in the repository. Build tools like Maven or Gradle will often recursively download each transitive dependency by default, meaning that adding a single dependency in your project could potentially cause dozens of artifacts to be downloaded in total.</p>
|
||||
|
||||
<p>This is very convenient: when adding a dependency on a new library, it would be a big pain to have to track down each of that library’s transitive dependencies and add them all manually. But there’s also a huge downside: because different libraries can depend on different versions of the same third-party library, this strategy necessarily violates the One-Version Rule and leads to the diamond dependency problem. If your target depends on two external libraries that use different versions of the same dependency, there’s no telling which one you’ll get. This also means that updating an external dependency could cause seemingly unrelated failures throughout the codebase if the new version begins pulling in conflicting versions of some of its <span class="keep-together">dependencies.</span></p>
|
||||
|
||||
<p>For this reason, Bazel does not automatically download transitive dependencies. And, unfortunately, there’s no silver bullet—Bazel’s alternative is to require a global file that lists every single one of the repository’s external dependencies and an explicit version used for that dependency throughout the repository. Fortunately, <a href="https://oreil.ly/kejfX">Bazel provides tools</a> that are able to automatically generate such a file containing the transitive dependencies of a set of Maven artifacts. This tool can be run once to generate the initial <em>WORKSPACE</em> file for a project, and that file can then be manually updated to adjust the versions of each dependency.</p>
|
||||
|
||||
<p>Yet again, the choice here is one between convenience and scalability. Small projects might prefer not having to worry about managing transitive dependencies themselves and might be able to get away with using automatic transitive dependencies. This strategy becomes less and less appealing as the organization and codebase grows, and conflicts and unexpected results become more and more frequent. At larger scales, the cost of manually managing dependencies is much less than the cost of dealing with issues caused by automatic dependency management.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="caching_build_results_using_external_de">
|
||||
<h4>Caching build results using external dependencies</h4>
|
||||
|
||||
<p>External dependencies are <a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="caching build results using external dependencies" data-type="indexterm" id="id-qYCeH0twhJsqS9Un"> </a>most often provided by third parties that release stable versions of libraries, perhaps without providing source code. <a contenteditable="false" data-primary="caching build results using external dependencies" data-type="indexterm" id="id-daCotgt0hvsOS8U7"> </a>Some organizations might also choose to make some of their own code available as artifacts, allowing other pieces of code to depend on them as third-party rather than internal dependencies. This can theoretically speed up builds if artifacts are slow to build but quick to download.</p>
|
||||
|
||||
<p>However, this also introduces a lot of overhead and complexity: someone needs to be responsible for building each of those artifacts and uploading them to the artifact repository, and clients need to ensure that they stay up to date with the latest version. Debugging also becomes much more difficult because different parts of the system will have been built from different points in the repository, and there is no longer a consistent view of the source tree.</p>
|
||||
|
||||
<p>A better way to solve the problem of artifacts taking a long time to build is to use a build system that supports remote caching, as described earlier. Such a build system will save the resulting artifacts from every build to a location that is shared across engineers, so if a developer depends on an artifact that was recently built by someone else, the build system will automatically download it instead of building it. This provides all of the performance benefits of depending directly on artifacts while still ensuring that builds are as consistent as if they were always built from the same source. This is the strategy used internally by Google, and Bazel can be configured to use a remote cache.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="security_and_reliability_of_external_de">
|
||||
<h4>Security and reliability of external dependencies</h4>
|
||||
|
||||
<p>Depending on artifacts from third-party<a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-tertiary="security and reliability of external dependencies" data-type="indexterm" id="id-daCYHgtVfvsOS8U7"> </a> sources is inherently risky.<a contenteditable="false" data-primary="security" data-secondary="of external dependencies" data-type="indexterm" id="id-5BCJtjtWf2sQS8UX"> </a> There’s an availability risk<a contenteditable="false" data-primary="reliability of external dependencies" data-type="indexterm" id="id-ZJCocvtYfVseSMUO"> </a> if the third-party source (e.g., an artifact repository) goes down, because your entire build might grind to a halt if it’s unable to download an external dependency. There’s also a security risk: if the third-party system is compromised by an attacker, the attacker could replace the referenced artifact with one of their own design, allowing them to inject arbitrary code into your build.</p>
|
||||
|
||||
<p>Both problems can be mitigated by mirroring any artifacts you depend on onto servers you control and blocking your build system from accessing third-party artifact repositories like Maven Central. The trade-off is that these mirrors take effort and resources to maintain, so the choice of whether to use them often depends on the scale of the project. The security issue can also be completely prevented with little overhead by requiring the hash of each third-party artifact to be specified in the source repository, causing the build to fail if the artifact is tampered with.</p>
|
||||
|
||||
<p>Another alternative that completely sidesteps the issue is to <em>vendor</em> your project’s dependencies.<a contenteditable="false" data-primary="vendoring your project's dependencies" data-type="indexterm" id="id-84C0tRsyf6sbSqUV"> </a> When a project vendors its dependencies, it checks them into source control alongside the project’s source code, either as source or as binaries. This effectively means that all of the project’s external dependencies are converted to internal dependencies. Google uses this approach internally, checking every third-party library referenced throughout Google into a <em>third_party</em> directory at the root of Google’s source tree. However, this works at Google only because Google’s source control system is custom built to handle an extremely large monorepo, so vendoring might not be an option for other organizations.<a contenteditable="false" data-primary="dependencies" data-secondary="managing for modules in build systems" data-startref="ix_depmg" data-type="indexterm" id="id-rpCWsas4fqsdS0UV"> </a><a contenteditable="false" data-primary="build systems" data-secondary="dealing with modules and dependencies" data-startref="ix_bldsysMDdep" data-tertiary="managing dependencies" data-type="indexterm" id="id-O7CRSqsgfmsWSLU9"> </a><a contenteditable="false" data-primary="modules, dealing with in build systems" data-secondary="managing dependencies" data-startref="ix_modsdep" data-type="indexterm" id="id-KYCEh1sdf1sWSaU9"> </a><a contenteditable="false" data-primary="modules, dealing with in build systems" data-startref="ix_mods" data-type="indexterm" id="id-E8CZfKsVfvsVSlUm"> </a><a contenteditable="false" data-primary="build systems" data-secondary="dealing with modules and dependencies" data-startref="ix_blssysMD" data-type="indexterm" id="id-YmCwUmsrfWsWSLUA"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00022">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>A build system is one of the most important parts of an engineering organization. Each developer will interact with it potentially dozens or hundreds of times per day, and in many situations, it can be the rate-limiting step in determining their productivity. This means that it’s worth investing time and thought into getting things right.</p>
|
||||
|
||||
<p>As discussed in this chapter, one of the more surprising lessons that Google has learned is that <em>limiting engineers’ power and flexibility can improve their productivity</em>. We were able to develop a build system that meets our needs not by giving engineers free reign in defining how builds are performed, but by developing a highly structured framework that limits individual choice and leaves most interesting decisions in the hands of automated tools. And despite what you might think, engineers don’t resent this: Googlers love that this system mostly works on its own and lets them focus on the interesting parts of writing their applications instead of grappling with build logic. Being able to trust the build is powerful—incremental builds just work, and there is almost never a need to clear build caches or run a “clean” step.</p>
|
||||
|
||||
<p>We took this insight and used it to create a whole new type of <em>artifact-based</em> build system, contrasting with traditional <em>task-based</em> build systems. This reframing of the build as centering around artifacts instead of tasks is what allows our builds to scale to an organization the size of Google. At the extreme end, it allows for a <em>distributed build system</em> that is able to leverage the resources of an entire compute cluster to accelerate engineers’ productivity. Though your organization might not be large enough to benefit from such an investment, we believe that artifact-based build systems scale down as well as they scale up: even for small projects, build systems like Bazel can bring significant benefits in terms of speed and correctness.</p>
|
||||
|
||||
<p>The remainder of this chapter explored how to manage dependencies in an artifact-based world. We came to the conclusion that <em>fine-grained modules scale better than coarse-grained modules</em>. We also discussed the difficulties of managing dependency versions, describing the O<em>ne-Version Rule</em> and the observation that all dependencies should be <em>versioned manually and explicitly</em>. Such practices avoid common pitfalls like the diamond dependency issue and allow a codebase to achieve Google’s scale of billions of lines of code in a single repository with a unified build system.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00124">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A fully featured build system is necessary to keep developers productive as an organization scales.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Power and flexibility come at a cost. Restricting the build system appropriately makes it easier on developers.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Build systems organized around artifacts tend to scale better and be more reliable than build systems organized around tasks.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>When defining artifacts and dependencies, it’s better to aim for fine-grained modules. Fine-grained modules are better able to take advantage of parallelism and incremental builds.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>External dependencies should be versioned explicitly under source control. Relying on "latest" versions is a recipe<a contenteditable="false" data-primary="build systems" data-startref="ix_bldsys" data-type="indexterm" id="id-gqCzHVHOSltnTy"> </a> for disaster and unreproducible builds.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn184"><sup><a href="ch18.html#ch01fn184-marker">1</a></sup>In an internal survey, 83% of Googlers reported being satisfied with the build system, making it the fourth most satisfying tool of the 19 surveyed. The average tool had a satisfaction rating of 69%.</p><p data-type="footnote" id="ch01fn185"><sup><a href="ch18.html#ch01fn185-marker">2</a></sup>See <a href="https://buck.build/"><em class="hyperlink">https://buck.build/</em></a> and <a href="https://www.pantsbuild.org/index.html"><em class="hyperlink">https://www.pantsbuild.org/index.html</em></a>.</p><p data-type="footnote" id="ch01fn187"><sup><a href="ch18.html#ch01fn187-marker">3</a></sup>Ant uses the word "target" to represent what we call a "task" in this chapter, and it uses the word "task" to refer to what we call "commands."</p><p data-type="footnote" id="ch01fn188"><sup><a href="ch18.html#ch01fn188-marker">4</a></sup>Such "<a href="https://oreil.ly/bfC05">software supply chain</a>" attacks are becoming more common.</p><p data-type="footnote" id="ch01fn189"><sup><a href="ch18.html#ch01fn189-marker">5</a></sup>Go recently added <a href="https://oreil.ly/lHGjt">preliminary support for modules using the exact same system</a>.</p><p data-type="footnote" id="ch01fn190"><sup><a href="ch18.html#ch01fn190-marker">6</a></sup>Of course, actually <em>removing</em> these dependencies was a whole separate process. But requiring each target to explicitly declare what it used was a critical first step. See <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a> for more information about how Google makes large-scale changes like this.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
387
clones/abseil.io/resources/swe-book/html/ch19.html
Normal file
|
@ -0,0 +1,387 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="critique_googleapostrophes_code_review">
|
||||
<h1>Critique: Google’s Code Review Tool</h1>
|
||||
|
||||
<p class="byline">Written by Caitlin Sadowski, Ilham Kurnia, and Ben Rohlfs</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>As you saw in <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a>, code review is a vital part of software development, particularly when working at scale.<a contenteditable="false" data-primary="Critique code review tool" data-type="indexterm" id="ix_Crit"> </a> The main goal of code review is to improve the readability and maintainability of the code base, and this is supported fundamentally by the review process. However, having a well-defined code review process is only one part of the code review story. Tooling that supports that process also plays an important part in its success.</p>
|
||||
|
||||
<p>In this chapter, we’ll look at what makes successful code review tooling via Google’s well-loved in-house system, <em>Critique</em>. Critique has explicit support for the primary motivations of code review, providing reviewers and authors with a view of the review and ability to comment on the change. Critique also has support for gatekeeping what code is checked into the codebase, discussed in the section on "scoring" changes. Code review information from Critique also can be useful when doing code archaeology, following some technical decisions that are explained in code review interactions (e.g., when inline comments are lacking). Although Critique is not the only code review tool used at Google, it is the most popular one by a large margin.</p>
|
||||
|
||||
<section data-type="sect1" id="code_review_tooling_principles">
|
||||
<h1>Code Review Tooling Principles</h1>
|
||||
|
||||
<p>We mentioned above that Critique provides functionality to support the goals of code review (we look at this functionality in more detail later in this chapter), but why is it so successful?<a contenteditable="false" data-primary="Critique code review tool" data-secondary="code review tooling principles" data-type="indexterm" id="id-AjH2tjTxU0"> </a> Critique has been shaped by Google’s development culture, which includes code review as a core part of the workflow. This cultural influence translates into a set of guiding principles that Critique was designed to emphasize:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Simplicity</dt>
|
||||
<dd>Critique’s user interface (UI) is based around making it easy to do code review without a lot of unnecessary choices, and with a smooth interface. The UI loads fast, navigation is easy and hotkey supported, and there are clear visual markers for the overall state of whether a change has been reviewed.</dd>
|
||||
<dt>Foundation of trust</dt>
|
||||
<dd>Code review is not for slowing others down; instead, it is for empowering others. Trusting colleagues as much as possible makes it work.<a contenteditable="false" data-primary="trust" data-secondary="code reviews and" data-type="indexterm" id="id-b8HVtxCEfXU3"> </a> This might mean, for example, trusting authors to make changes and not requiring an additional review phase to double check that minor comments are actually addressed. Trust also plays out by making changes openly accessible (for viewing and reviewing) across Google.</dd>
|
||||
<dt>Generic communication</dt>
|
||||
<dd>Communication problems are rarely solved through tooling. Critique prioritizes generic ways for users to comment on the code changes, instead of complicated protocols. Critique encourages users to spell out what they want in their comments or even suggests some edits instead of making the data model and process more complex. Communication can go wrong even with the best code review tool because the users are humans.</dd>
|
||||
<dt>Workflow integration</dt>
|
||||
<dd>Critique has a number of integration points with other core software development tools. Developers can easily navigate to view the code under review in our code search and browsing tool, edit code in our web-based code editing tool, or view test results associated with a code change.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Across these guiding principles, simplicity has probably had the most impact on the tool. There were many interesting features we considered adding, but we decided not to make the model more complicated to support a small set of users.</p>
|
||||
|
||||
<p>Simplicity also has an interesting tension with workflow integration. We considered but ultimately decided against creating a “Code Central” tool with code editing, reviewing, and searching in one tool. Although Critique has many touchpoints with other tools, we consciously decided to keep code review as the primary focus. Features are linked from Critique but implemented in different subsystems.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="code_review_flow">
|
||||
<h1>Code Review Flow</h1>
|
||||
|
||||
<p>Code reviews can be executed at many stages <a contenteditable="false" data-primary="Critique code review tool" data-secondary="code review flow" data-type="indexterm" id="id-QkHMtlTLhM"> </a>of software <a contenteditable="false" data-primary="code reviews" data-secondary="flow" data-type="indexterm" id="id-ZZHXT2Txhz"> </a>development, as illustrated in <a data-type="xref" href="ch19.html#the_code-review_flow">Figure 19-1</a>. Critique reviews typically take place before a change can be committed to the codebase, also <a contenteditable="false" data-primary="precommit reviews" data-type="indexterm" id="id-b8HACMTZhV"> </a>known as <em>precommit reviews</em>. Although <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a> contains a brief description of the code review flow, here we expand it to describe key aspects of Critique that help at each stage. We’ll look at each stage in more detail in the following sections.</p>
|
||||
|
||||
<figure id="the_code-review_flow"><img alt="The code-review flow" src="images/seag_1901.png">
|
||||
<figcaption><span class="label">Figure 19-1. </span>The code review flow</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Typical review steps go as follows:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p><strong>Create a change.</strong> A user authors a change to the codebase in their workspace. This <em>author</em> then uploads a <em>snapshot</em> (showing a patch at a particular point in time) to Critique, which triggers the run of automatic code analyzers (see <span class="keep-together"><a data-type="xref" href="ch20.html#static_analysis-id00082">Static Analysis</a></span>).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Request review.</strong> After the author is satisfied with the diff of the change and the result of the analyzers shown in Critique, they mail the change to one or more reviewers.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Comment.</strong> <em>Reviewers</em> open the change in Critique and draft comments on the diff. Comments are by default marked as <em>unresolved,</em> meaning they are crucial for the author to address. Additionally, reviewers can add <em>resolved</em> comments that are optional or informational. Results from automatic code analyzers, if present, are also visible to reviewers. Once a reviewer has drafted a set of comments, they need to <em>publish</em> them in order for the author to see them; this has the advantage of allowing a reviewer to provide a complete thought on a change atomically, after having reviewed the entire change. Anyone can comment on changes, providing a “drive-by review” as they see it necessary.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Modify change and reply to comments.</strong> The author modifies the change, uploads new snapshots based on the feedback, and replies back to the reviewers. The author addresses (at least) all unresolved comments, either by changing the code or just replying to the comment and changing the comment type to be <em>resolved</em>. The author and reviewers can look at diffs between any pairs of snapshots to see what changed. Steps 3 and 4 might be repeated multiple times.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Change approval.</strong> When the reviewers are happy with the latest state of the change, they approve the change and mark it as “looks good to me” (LGTM).<a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="change approval with" data-type="indexterm" id="id-6bHWT3tgSQSLh8"> </a> They can optionally include comments to address. After a change is deemed good for submission, it is clearly marked green in the UI to show this state.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Commit a change.</strong> Provided the change is approved (which we’ll discuss shortly), the author can trigger the commit process of the change. If automatic analyzers and other precommit hooks (called “presubmits”) don’t find any problems, the change is committed to the codebase.</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>Even after the review process is started, the entire system provides significant flexibility to deviate from the regular review flow. For example, reviewers can un-assign themselves from the change or explicitly assign it to someone else, and the author can postpone the review altogether. In emergency cases, the author can forcefully commit their change and have it reviewed after commit.</p>
|
||||
|
||||
<section data-type="sect2" id="notifications">
|
||||
<h2>Notifications</h2>
|
||||
|
||||
<p>As a change moves through the stages outlined earlier, Critique publishes event notifications that might be used by other supporting tools. <a contenteditable="false" data-primary="notifications from Critique" data-type="indexterm" id="id-3QHYtXTJhvhW"> </a>This notification model allows Critique to focus on being a primary code review tool instead of a general purpose tool, while still being integrated into the developer workflow. Notifications enable a separation of concerns such that Critique can just emit events and other systems build off of those events.</p>
|
||||
|
||||
<p>For example, users can install a Chrome extension that consumes these event notifications. When a change needs the user’s attention—for example, because it is their turn to review the change or some presubmit fails—the extension displays a Chrome notification with a button to go directly to the change or silence the notification. We have found that some developers really like immediate notification of change updates, but others choose not to use this extension because they find it is too disruptive to their flow.</p>
|
||||
|
||||
<p>Critique also manages emails related to a change; important Critique events trigger email notifications. In addition to being displayed in the Critique UI, some analyzer findings are configured to also send the results out by email. Critique also processes email replies and translates them to comments, supporting users who prefer an email-based flow. Note that for many users, emails are not a key feature of code review; they use Critique’s dashboard view (discussed later) to manage reviews.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stage_one_creating_a_change">
|
||||
<h1>Stage 1: Create a Change</h1>
|
||||
|
||||
<p>A code review tool should provide support at all stages of the review process and should not be the bottleneck for committing changes.<a contenteditable="false" data-primary="changes to code" data-secondary="creating" data-type="indexterm" id="ix_chgcr"> </a><a contenteditable="false" data-primary="Critique code review tool" data-secondary="creating a change" data-type="indexterm" id="ix_Critcrch"> </a> In the prereview step, making it easier for change authors to polish a change before sending it out for review helps reduce the time taken by the reviewers to inspect the change. Critique displays change diffs with knobs to ignore whitespace changes and highlight move-only changes. Critique also surfaces the results from builds, tests, and static analyzers, including style checks (as discussed in <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a>).</p>
|
||||
|
||||
<p>Showing an author the diff of a change gives them the opportunity to wear a different hat: that of a code reviewer. Critique lets a change author see the diff of their changes as their reviewer will, and also see the automatic analysis results. Critique also supports making lightweight modifications to the change from within the review tool and suggests appropriate reviewers. When sending out the request, the author can also include preliminary comments on the change, providing the opportunity to ask reviewers directly about any open questions. Giving authors the chance to see a change just as their reviewers do prevents misunderstanding.</p>
|
||||
|
||||
<p>To provide further context for the reviewers, the author can also link the change to a specific bug. Critique uses an autocomplete service to show relevant bugs, prioritizing bugs that are assigned to the author.</p>
|
||||
|
||||
<section data-type="sect2" id="diffing">
|
||||
<h2>Diffing</h2>
|
||||
|
||||
<p>The core of the code review process is understanding the code change itself.<a contenteditable="false" data-primary="Critique code review tool" data-secondary="creating a change" data-tertiary="diffing" data-type="indexterm" id="id-XAHat8TDSVsa"> </a> Larger changes are typically more difficult to understand than smaller ones. <a contenteditable="false" data-primary="diffing code changes" data-type="indexterm" id="id-3QHJTXTVS0sW"> </a>Optimizing the diff of a change is thus a core requirement for a good code review tool.</p>
|
||||
|
||||
<p>In Critique, this principle translates onto multiple layers (see <a data-type="xref" href="ch19.html#intraline_diffing_showing_character-lev">Figure 19-2</a>). The diffing component, starting from an optimized longest common subsequence algorithm, is enhanced with the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Syntax highlighting</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Cross-references (powered by Kythe; see <a data-type="xref" href="ch17.html#code_search">Code Search</a>)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Intraline diffing that shows the difference on character-level<a contenteditable="false" data-primary="intraline diffing showing character-level differences" data-type="indexterm" id="id-8XHatxt5f8CoSDsA"> </a> factoring in the word boundaries (<a data-type="xref" href="ch19.html#intraline_diffing_showing_character-lev">Figure 19-2</a>)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>An option to ignore whitespace differences to a varying degree</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Move detection, in which <a contenteditable="false" data-primary="move detection for code chunks" data-type="indexterm" id="id-OoHMtXtWSeC7S5s4"> </a>chunks of code that are moved from one place to another are marked as being moved (as opposed to being marked as removed here and added there, as a naive diff algorithm would)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<figure id="intraline_diffing_showing_character-lev"><img alt="Intraline diffing showing character-level differences" src="images/seag_1902.png">
|
||||
<figcaption><span class="label">Figure 19-2. </span>Intraline diffing showing character-level differences</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Users can also view the diff in various different modes, such as overlay and side by side. When developing Critique, we decided that it was important to have side-by-side diffs to make the review process easier. Side-by-side diffs take a lot of space: to make them a reality, we had to simplify the diff view structure, so there is no border, no padding—just the diff and line numbers. We also had to play around with a variety of fonts and sizes until we had a diff view that accommodates even for Java’s 100-character line limit for the typical screen-width resolution when Critique launched (1,440 pixels).</p>
|
||||
|
||||
<p>Critique further supports a variety of custom tools that provide diffs of artifacts produced by a change, such as a screenshot diff of the UI modified by a change or configuration files generated by a change.</p>
|
||||
|
||||
<p>To make the process of navigating diffs smooth, we were careful not to waste space and spent significant effort ensuring that diffs load quickly, even for images and large files and/or changes. We also provide keyboard shortcuts to quickly navigate through files while visiting only modified sections.</p>
|
||||
|
||||
<p>When users drill down to the file level, Critique provides a UI widget with a compact display of the chain of snapshot versions of a file; users can drag and drop to select which versions to compare. This widget automatically collapses similar snapshots, drawing focus to important snapshots. It helps the user understand the evolution of a file within a change; for example, which snapshots have test coverage, have already been reviewed, or have comments. To address concerns of scale, Critique prefetches everything, so loading different snapshots is very quick.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="analysis_results-id00001">
|
||||
<h2>Analysis Results</h2>
|
||||
|
||||
<p>Uploading a snapshot of the change <a contenteditable="false" data-primary="Critique code review tool" data-secondary="creating a change" data-tertiary="analysis results" data-type="indexterm" id="id-3QHYtXTxU0sW"> </a>triggers code analyzers (see <a data-type="xref" href="ch20.html#static_analysis-id00082">Static Analysis</a>). Critique displays<a contenteditable="false" data-primary="analysis results from code analyzers" data-type="indexterm" id="id-6bHJfpTLUes9"> </a> the analysis results on <a contenteditable="false" data-primary="diffing code changes" data-secondary="change summary and diff view" data-type="indexterm" id="id-8XH7CDTbUOsO"> </a>the change page, summarized by analyzer status chips shown below the change description, as depicted in <a data-type="xref" href="ch19.html#change_summary_and_diff_view">Figure 19-3</a>, and detailed in the Analysis tab, as illustrated in <a data-type="xref" href="ch19.html#analysis_results">Figure 19-4</a>.</p>
|
||||
|
||||
<p>Analyzers can mark specific findings to highlight in red for increased visibility. Analyzers that are still in progress are represented by yellow chips, and gray chips are displayed otherwise. For the sake of simplicity, Critique offers no other options to mark or highlight findings—actionability is a binary option. If an analyzer produces some results (“findings”), clicking the chip opens up the findings. Like comments, findings can be displayed inside the diff but styled differently to make them easily distinguishable. Sometimes, the findings also include fix suggestions, which the author can preview and choose to apply from Critique.</p>
|
||||
|
||||
<figure id="change_summary_and_diff_view"><img alt="Change summary and diff view" src="images/seag_1903.png">
|
||||
<figcaption><span class="label">Figure 19-3. </span>Change summary and diff view</figcaption>
|
||||
</figure>
|
||||
|
||||
<figure id="analysis_results"><img alt="Analysis results" src="images/seag_1904.png">
|
||||
<figcaption><span class="label">Figure 19-4. </span>Analysis results</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>For example, suppose that a linter finds a style violation of extra spaces at the end of the line. The change page will display a chip for that linter. From the chip, the author can quickly go to the diff showing the offending code to understand the style violation with two clicks. Most linter violations also include fix suggestions. With a click, the author can preview the fix suggestion (for example, remove the extra spaces), and with another click, apply the fix on the change.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="tight_tool_integration">
|
||||
<h2>Tight Tool Integration</h2>
|
||||
|
||||
<p>Google has tools built on top of Piper, its<a contenteditable="false" data-primary="Piper" data-secondary="tools built on top of" data-type="indexterm" id="id-93HOtBTehlsK"> </a> monolithic source code<a contenteditable="false" data-primary="Critique code review tool" data-secondary="creating a change" data-tertiary="tight tool ingegration" data-type="indexterm" id="id-6bHWTpTMhes9"> </a> repository (see <a data-type="xref" href="ch16.html#version_control_and_branch_management">Version Control and Branch Management</a>), such as the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Cider, an online IDE for editing source code stored in the cloud</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Code Search, a tool for searching code in the codebase</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Tricorder, a tool for displaying static analysis results (mentioned earlier)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Rapid, a release tool that packages and deploys binaries containing a series of changes</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Zapfhahn, a test coverage calculation tool</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Additionally, there are services that provide context on change metadata (for example, about users involved in a change or linked bugs). Critique is a natural melting pot for a quick one-click/hover access or even embedded UI support to these systems, although we need to be careful not to sacrifice simplicity. For example, from a change page in Critique, the author needs to click only once to start editing the change further in Cider.<a contenteditable="false" data-primary="Kythe" data-secondary="navigating cross-references with" data-type="indexterm" id="id-8XHatACrhOsO"> </a> There is support to navigate between cross-references using Kythe or view the mainline state of the code in Code Search (see <a data-type="xref" href="ch17.html#code_search">Code Search</a>). Critique links out to the release tool so that users can see whether a submitted change is in a specific release. For these tools, Critique favors links rather than embedding so as not to distract from the core review experience. One exception here is test coverage: the information of whether a line of code is covered by a test is shown by different background colors on the line gutter in the file’s diff view (not all projects use this coverage tool).</p>
|
||||
|
||||
<p>Note that <a contenteditable="false" data-primary="workspaces" data-secondary="tight integration between Critique and" data-type="indexterm" id="id-GvHXtMSDhGsE"> </a>tight integration between Critique and a developer’s workspace is possible because of the fact that workspaces are stored in a FUSE-based filesystem, accessible beyond a particular developer’s computer. The Source of Truth is hosted in the cloud and accessible to all of these tools.<a contenteditable="false" data-primary="changes to code" data-secondary="creating" data-startref="ix_chgcr" data-type="indexterm" id="id-OoHVToSYhysx"> </a><a contenteditable="false" data-primary="Critique code review tool" data-secondary="creating a change" data-startref="ix_Critcrch" data-type="indexterm" id="id-5LHdfeS8hBsM"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stage_two_request_review">
|
||||
<h1>Stage 2: Request Review</h1>
|
||||
|
||||
<p>After the author is happy with the state of the change, they<a contenteditable="false" data-primary="Critique code review tool" data-secondary="request review" data-type="indexterm" id="ix_Critreqrev"> </a> can send it for review, as depicted in <a data-type="xref" href="ch19.html#requesting_reviewers">Figure 19-5</a>. This requires the author to pick the reviewers. Within a small team, finding a reviewer might seem simple, but even there it is useful to distribute reviews evenly across team members and consider situations like who is on vacation. To address this, teams can provide an email alias for incoming code reviews. The alias is used by a tool called <em>GwsQ</em> (named after the initial team that used this technique: Google Web Server) that assigns specific reviewers based on the configuration linked to the alias. For example, a change author can assign a review to some-team-list-alias, and GwsQ will pick a specific member of some-team-list-alias to perform the review.</p>
|
||||
|
||||
<figure id="requesting_reviewers"><img alt="Requesting reviewers" src="images/seag_1905.png">
|
||||
<figcaption><span class="label">Figure 19-5. </span>Requesting reviewers</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Given the size of Google’s codebase and the number of people modifying it, it can be difficult to find out who is best qualified to review a change outside your own project. Finding reviewers is a problem to consider when reaching a certain scale. Critique must deal with scale. Critique offers the functionality to propose sets of reviewers that are sufficient to approve the change. The reviewer selection utility takes into account the following factors:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Who owns the code that is being changed (see the next section)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Who is most familiar with the code (i.e., who recently changed it)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Who is available for review (i.e., not out of office and preferably in the same time zone)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The GwsQ team alias setup</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p class="pagebreak-before">Assigning a reviewer to a change triggers a review request. This request runs “presubmits” or precommit hooks applicable to the change; teams can configure the presubmits related to their projects in many ways. The most common hooks include the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Automatically adding email lists to changes to raise awareness and transparency</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Running automated test suites for the project</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Enforcing project-specific invariants on both code (to enforce local code style restrictions) and change descriptions (to allow generation of release notes or other forms of tracking)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>As running tests is resource intensive, at Google they are part of presubmits (run when requesting review and when committing changes) rather than for every snapshot like Tricorder checks. Critique surfaces the result of running the hooks in a similar way to how analyzer results are displayed, with an extra distinction to highlight the fact that a failed result blocks the change from being sent for review or committed. Critique notifies the author via email if presubmits fail.<a contenteditable="false" data-primary="Critique code review tool" data-secondary="request review" data-startref="ix_Critreqrev" data-type="indexterm" id="id-6bHQtKs7IJ"> </a></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stage_three_and_four_understanding_and">
|
||||
<h1>Stages 3 and 4: Understanding and Commenting on a Change</h1>
|
||||
|
||||
<p>After the review process starts, the author and the reviewers work in tandem to reach the goal of committing changes of high quality.<a contenteditable="false" data-primary="Critique code review tool" data-secondary="understanding and commenting on a change" data-type="indexterm" id="ix_Critundcomm"> </a></p>
|
||||
|
||||
<section data-type="sect2" id="commenting">
|
||||
<h2>Commenting</h2>
|
||||
|
||||
<p>Making<a contenteditable="false" data-primary="changes to code" data-secondary="commenting on" data-type="indexterm" id="id-XAHat8TVfYca"> </a> comments is the second <a contenteditable="false" data-primary="commenting on changes in Critique" data-type="indexterm" id="id-3QHJTXTafocW"> </a>most common action that users make in Critique after viewing changes (<a data-type="xref" href="ch19.html#commenting_on_the_diff_view">Figure 19-6</a>). Commenting in Critique is free for all. Anyone—not only the change author and the assigned reviewers—can comment on a change.</p>
|
||||
|
||||
<p>Critique also offers the ability to track review progress via per-person state. Reviewers have checkboxes to mark individual files at the latest snapshot as reviewed, helping the reviewer keep track of what they have already looked at. When the author modifies a file, the “reviewed” checkbox for that file is cleared for all reviewers because the latest snapshot has been updated.</p>
|
||||
|
||||
<figure id="commenting_on_the_diff_view"><img alt="Commenting on the diff view" src="images/seag_1906.png">
|
||||
<figcaption><span class="label">Figure 19-6. </span>Commenting on the diff view</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>When a reviewer sees a relevant analyzer finding, they can click a "Please fix" button to create an unresolved comment asking the author to address the finding. Reviewers can also suggest a fix to a change by inline editing the latest version of the file. Critique transforms this suggestion into a comment with a fix attached that can be applied by the author.</p>
|
||||
|
||||
<p>Critique does not dictate what comments users should create, but for some common comments, Critique provides quick shortcuts. The change author can click the "Done" button on the comment panel to indicate when a reviewer’s comment has been addressed, or the "Ack" button to acknowledge that the comment has been read, typically used for informational or optional comments. Both have the effect of resolving the comment thread if it is unresolved. These shortcuts simplify the workflow and reduce the time needed to respond to review comments.</p>
|
||||
|
||||
<p>As mentioned earlier, comments are drafted as-you-go, but then “published” atomically, as shown in <a data-type="xref" href="ch19.html#preparing_comments_to_the_author">Figure 19-7</a>. This allows authors and reviewers to ensure that they are happy with their comments before sending them out.</p>
|
||||
|
||||
<figure id="preparing_comments_to_the_author"><img alt="Preparing comments to the author" src="images/seag_1907.png">
|
||||
<figcaption><span class="label">Figure 19-7. </span>Preparing comments to the author</figcaption>
|
||||
</figure>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="understanding_the_state_of_a_change">
|
||||
<h2>Understanding the State of a Change</h2>
|
||||
|
||||
<p>Critique provides <a contenteditable="false" data-primary="changes to code" data-secondary="understanding the state of" data-type="indexterm" id="ix_chgund"> </a>a number of mechanisms to make it clear where in the comment-and-iterate phase a change is currently located. These include a feature for determining who needs to take action next, and a dashboard view of review/author status for all of the changes with which a particular developer is involved.</p>
|
||||
|
||||
<section data-type="sect3" id="quotation_marksemicolonwhose_turnquotat">
|
||||
<h3>"Whose turn" feature</h3>
|
||||
|
||||
<p>One important factor in accelerating the review process is understanding when it’s your turn to act, especially when there are multiple reviewers assigned to a change. This might be the case if the author wants to have their change reviewed by a software engineer and the user-experience person responsible for the feature, or the SRE carrying the pager for the service. Critique helps define who is expected to look at the change next by managing an <em>attention set</em> for each change.</p>
|
||||
|
||||
<p>The attention set comprises the set of people on which a change is currently blocked. When a reviewer or author is in the attention set, they are expected to respond in a timely manner. Critique tries to be smart about updating the attention set when a user publishes their comments, but users can also manage the attention set themselves. Its usefulness increases even more when there are more reviewers in the change. The attention set is surfaced in Critique by rendering the relevant usernames in bold.</p>
|
||||
|
||||
<p>After we implemented this feature, our users had a difficult time imagining the previous state. The prevailing opinion is: how did we get along without this? The alternative before we implemented this feature was chatting between reviewers and authors to understand who was dealing with a change. This feature also emphasizes the turn-based nature of code review; it is always at least one person’s turn to take action.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="dashboard_and_search_system">
|
||||
<h3>Dashboard and search system</h3>
|
||||
|
||||
<p>Critique’s landing page is the user’s dashboard<a contenteditable="false" data-primary="dashboard and search system (Critique)" data-type="indexterm" id="id-8XHatDTKC8Cwc2"> </a> page, as depicted in <a data-type="xref" href="ch19.html#dashboard_view">Figure 19-8</a>. The dashboard page is divided into user-customizable sections, each of them containing a list of change summaries.</p>
|
||||
|
||||
<figure id="dashboard_view"><img alt="Dashboard view" src="images/seag_1908.png">
|
||||
<figcaption><span class="label">Figure 19-8. </span>Dashboard view</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The dashboard page is powered by <a contenteditable="false" data-primary="Changelist Search" data-type="indexterm" id="id-OoHMtDC7CeCEcQ"> </a>a search system called <em>Changelist Search</em>. Changelist Search indexes the latest state of all available changes (both pre- and post-submit) across all users at Google and allows its users to look up relevant changes by regular expression–based queries. Each dashboard section is defined by a query to Changelist Search. We have spent time ensuring Changelist Search is fast enough for interactive use; everything is indexed quickly so that authors and reviewers are not slowed down, despite the fact that we have an extremely large number of concurrent changes happening simultaneously at Google.</p>
|
||||
|
||||
<p>To optimize the user experience (UX), Critique’s default dashboard setting is to have the first section display the changes that need a user’s attention, although this is customizable. There is also a search bar for making custom queries over all changes and browsing the results. As a reviewer, you mostly just need the attention set. As an author, you mostly just need to take a look at what is still waiting for review to see if you need to ping any changes. Although we have shied away from customizability in some other parts of the Critique UI, we found that users like to set up their dashboards differently without detracting from the fundamental experience, similar to<a contenteditable="false" data-primary="changes to code" data-secondary="understanding the state of" data-startref="ix_chgund" data-type="indexterm" id="id-5LHmteSeCKCocw"> </a> the way <a contenteditable="false" data-primary="Critique code review tool" data-secondary="understanding and commenting on a change" data-startref="ix_Critundcomm" data-type="indexterm" id="id-YQHmTrSQCaCWc5"> </a>everyone organizes their emails differently.<sup><a data-type="noteref" id="ch01fn191-marker" href="ch19.html#ch01fn191">1</a></sup></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stage_five_change_approvals_left_parent">
|
||||
<h1>Stage 5: Change Approvals (Scoring a Change)</h1>
|
||||
|
||||
<p>Showing whether a reviewer<a contenteditable="false" data-primary="Critique code review tool" data-secondary="change approvals" data-type="indexterm" id="id-04HatBTDHK"> </a> thinks a change <a contenteditable="false" data-primary="changes to code" data-secondary="change approvals or scoring a change" data-type="indexterm" id="id-XAHDT8TzHl"> </a>is good boils down to providing concerns and suggestions via comments. There also needs to be some mechanism for providing a high-level “OK” on a change. At Google, the scoring for a change is divided into three parts:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>LGTM ("looks good to me")</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Approval</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The number of unresolved comments</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>An LGTM stamp <a contenteditable="false" data-primary="LGTM (looks good to me) stamp from reviewers" data-secondary="meaning of" data-type="indexterm" id="id-3QHYtyCyHY"> </a>from a reviewer means that "I have reviewed this change, believe that it meets our standards, and I think it is okay to commit it after addressing unresolved comments." An Approval stamp<a contenteditable="false" data-primary="Approval stamp from reviewers" data-type="indexterm" id="id-93HDTyCNHk"> </a> from a reviewer means that "as a gatekeeper, I allow this change to be committed to the codebase." A reviewer can mark comments as unresolved, meaning that the author will need to act upon them. When the change has at least one LGTM, sufficient approvals and no unresolved comments, the author can then commit the change. Note that every change requires an LGTM regardless of approval status, ensuring that at least two pairs of eyes viewed the change. This simple scoring rule allows Critique to inform the author when a change is ready to commit (shown prominently as a green page header).</p>
|
||||
|
||||
<p>We made a conscious decision in the process of building Critique to simplify this rating scheme. Initially, Critique had a “Needs More Work” rating and also a <span class="keep-together">“LGTM++”</span>. The model we have moved to is to make LGTM/Approval always positive. If a change definitely needs a second review, primary reviewers can add comments but without LGTM/Approval. After a change transitions into a mostly-good state, reviewers will typically trust authors to take care of small edits—the tooling does not require repeated LGTMs regardless of change size.</p>
|
||||
|
||||
<p>This rating scheme has also had a positive influence on code review culture. Reviewers cannot just thumbs-down a change with no useful feedback; all negative feedback from reviewers must be tied to something specific to be fixed (for example, an unresolved comment). The phrasing “unresolved comment” was also chosen to sound relatively nice.</p>
|
||||
|
||||
<p>Critique includes<a contenteditable="false" data-primary="scoring a change" data-type="indexterm" id="id-8XHatVhoH9"> </a> a scoring panel, next to the analysis chips, with the following <span class="keep-together">information:</span></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Who has LGTM’ed the change</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>What approvals are still required and why</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>How many unresolved comments are still open</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Presenting the scoring information this way helps the author quickly understand what they still need to do to get the change committed.</p>
|
||||
|
||||
<p>LGTM and Approval are <em>hard</em> requirements and can be granted only by reviewers. Reviewers can also revoke their LGTM and Approval at any time before the change is committed. Unresolved comments are <em>soft</em> requirements; the author can mark a comment “resolved” as they reply. This distinction promotes and relies on trust and communication between the author and the reviewers. For example, a reviewer can LGTM the change accompanied with unresolved comments without later on checking precisely whether the comments are truly addressed, highlighting the trust the reviewer places on the author. This trust is particularly important for saving time when there is a significant difference in time zones between the author and the reviewer. Exhibiting trust is also a good way to build trust and strengthen teams.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="stage_six_commiting_a_change">
|
||||
<h1>Stage 6: Commiting a Change</h1>
|
||||
|
||||
<p>Last but not least, Critique has a button for committing the change after the review to avoid context-switching to a command-line interface.<a contenteditable="false" data-primary="changes to code" data-secondary="committing" data-type="indexterm" id="id-XAHat8TMul"> </a><a contenteditable="false" data-primary="Critique code review tool" data-secondary="committing a change" data-type="indexterm" id="id-3QHJTXTbuY"> </a></p>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect2" id="after_commit_tracking_history">
|
||||
<h2 class="less_space">After Commit: Tracking History</h2>
|
||||
|
||||
<p>In addition to the core use of Critique as a tool for reviewing source code changes before they are committed to the repository, Critique is also used as a tool for change archaeology. <a contenteditable="false" data-primary="changes to code" data-secondary="tracking history of" data-type="indexterm" id="id-93HOtBTVfGuK"> </a><a contenteditable="false" data-primary="tracking history of code changes in Critique" data-type="indexterm" id="id-6bHWTpTEf6u9"> </a>For most files, developers can view a list of the past history of changes that modified a particular file in the Code Search system (see <a data-type="xref" href="ch17.html#code_search">Code Search</a>), or navigate directly to a change. Anyone at Google can browse the history of a change to generally viewable files, including the comments on and evolution of the change. This enables future auditing and is used to understand more details about why changes were made or how bugs were introduced. Developers can also use this feature to learn how changes were engineered, and code review data in aggregate is used to produce <span class="keep-together">trainings.</span></p>
|
||||
|
||||
<p>Critique also supports the ability to comment after a change is committed; for example, when a problem is discovered later or additional context might be useful for someone investigating the change at another time. Critique also supports the ability to roll back changes and see whether a particular change has already been rolled back.</p>
|
||||
|
||||
<aside data-type="sidebar" id="gerrit">
|
||||
<h5>Case Study: Gerrit</h5>
|
||||
|
||||
<p>Although Critique is the most commonly used review tool at Google, it is not the only one.<a contenteditable="false" data-primary="Gerrit code review tool" data-type="indexterm" id="id-GvHXtWTXC9fYum"> </a> Critique is not externally available due to its tight interdependencies with our large monolithic repository and other internal tools. Because of this, teams at Google that work on open source projects (including Chrome and Android) or internal projects that can’t or don’t want to be hosted in the monolithic repository use a different code review tool: Gerrit.</p>
|
||||
|
||||
<p>Gerrit is a standalone, open source code review tool that is tightly integrated with the Git version control system. As such, it offers a web UI to many Git features including code browsing, merging branches, cherry-picking commits, and, of course, code review. In addition, Gerrit has a fine-grained permission model that we can use to restrict access to repositories and branches.</p>
|
||||
|
||||
<p>Both Critique and Gerrit have the same model for code reviews in that each commit is reviewed separately. Gerrit supports stacking commits and uploading them for individual review. It also allows the chain to be committed atomically after it’s reviewed.</p>
|
||||
|
||||
<p>Being open source, Gerrit accommodates more variants and a wider range of use cases; Gerrit’s rich plug-in system enables a tight integration into custom environments. To support these use cases, Gerrit also supports a more sophisticated scoring system. A reviewer can veto a change by placing a –2 score, and the scoring system is highly configurable.</p>
|
||||
|
||||
<div data-type="note" id="id-YVcxUACGfduk"><h6>Note</h6>
|
||||
<p>You can learn more about Gerrit and see it in action at <a href="https://www.gerritcodereview.com"><em class="hyperlink">https://www.gerritcodereview.com</em></a>.</p>
|
||||
</div>
|
||||
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00023">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>There are a number of implicit trade-offs when using a code review tool. Critique builds in a number of features and integrates with other tools to make the review process more seamless for its users. Time spent in code reviews is time not spent coding, so any optimization of the review process can be a productivity gain for the company. Having only two people in most cases (author and reviewer) agree on the change before it can be committed keeps velocity high. Google greatly values the educational aspects of code review, even though they are more difficult to quantify.</p>
|
||||
|
||||
<p>To minimize the time it takes for a change to be reviewed, the code review process should flow seamlessly, informing users succinctly of the changes that need their attention and identifying potential issues before human reviewers come in (issues are caught by analyzers and Continuous Integration). When possible, quick analysis results are presented before the longer-running analyses can finish.</p>
|
||||
|
||||
<p>There are several ways in which Critique needs to support questions of scale. The Critique tool must scale to the large quantity of review requests produced without suffering a degradation in performance. Because Critique is on the critical path to getting changes committed, it must load efficiently and be usable for special situations such as unusually large changes.<sup><a data-type="noteref" id="ch01fn192-marker" href="ch19.html#ch01fn192">2</a></sup> The interface must support managing user activities (such as finding relevant changes) over the large codebase and help reviewers and authors navigate the codebase. For example, Critique helps with finding appropriate reviewers for a change without having to figure out the ownership/maintainer landscape (a feature that is particularly important for large-scale changes such as API migrations that can affect many files).</p>
|
||||
|
||||
<p>Critique favors an opinionated process and a simple interface to improve the general review workflow. However, Critique does allow some customizability: custom analyzers and presubmits provide specific context on changes, and some team-specific policies (such as requiring LGTM from multiple reviewers) can be enforced.</p>
|
||||
|
||||
<p>Trust and communication are core to the code review process. A tool can enhance the experience, but can’t replace them. Tight integration with other tools has also been a key factor in Critique’s success.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00125">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Trust and communication are core to the code review process. A tool can enhance the experience, but it can’t replace them.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Tight integration with other tools is key to great code review experience.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Small workflow optimizations, like the addition of an explicit “attention set,” can increase clarity and reduce friction substantially.<a contenteditable="false" data-primary="Critique code review tool" data-startref="ix_Crit" data-type="indexterm" id="id-8XHatxt5fETyi2"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn191"><sup><a href="ch19.html#ch01fn191-marker">1</a></sup>Centralized “global” reviewers for large-scale changes (LSCs) are particularly prone to customizing this dashboard to avoid flooding it during an LSC (see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>).</p><p data-type="footnote" id="ch01fn192"><sup><a href="ch19.html#ch01fn192-marker">2</a></sup>Although most changes are small (fewer than 100 lines), Critique is sometimes used to review large refactoring changes that can touch hundreds or thousands of files, especially for LSCs that must be executed atomically (see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>).</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
213
clones/abseil.io/resources/swe-book/html/ch20.html
Normal file
|
@ -0,0 +1,213 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="static_analysis-id00082">
|
||||
<h1>Static Analysis</h1>
|
||||
|
||||
<p class="byline">Written by Caitlin Sadowski</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>Static analysis refers to programs analyzing<a contenteditable="false" data-primary="static analysis" data-type="indexterm" id="ix_statan"> </a> source code to find potential issues such as bugs, antipatterns, and other issues that can be diagnosed <em>without executing the program</em>. The “static” part specifically refers to analyzing the source code instead of a running program (referred to as “dynamic” analysis). Static analysis can find bugs in programs early, before they are checked in as production code. For example, static analysis can identify constant expressions that overflow, tests that are never run, or invalid format strings in logging statements that would crash when executed.<sup><a data-type="noteref" id="ch01fn193-marker" href="ch20.html#ch01fn193">1</a></sup> However, static analysis is useful for more than just finding bugs. Through static analysis at Google, we codify best practices, help keep code current to modern API versions, and prevent or reduce technical debt.<a contenteditable="false" data-primary="static analysis" data-secondary="examples of" data-type="indexterm" id="id-lrhPtAtj"> </a> Examples of these analyses include verifying that naming conventions are upheld, flagging the use of deprecated APIs, or pointing out simpler but equivalent expressions that make code easier to read.<a contenteditable="false" data-primary="deprecation" data-secondary="static analysis in API deprecation" data-type="indexterm" id="id-OwhnIbtO"> </a> Static analysis is also an integral tool in the API deprecation process, where it can prevent backsliding during migration of the codebase to a new API (see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>). We have also found evidence that static analysis checks can educate developers and actually prevent antipatterns from entering the codebase.<sup><a data-type="noteref" id="ch01fn194-marker" href="ch20.html#ch01fn194">2</a></sup></p>
|
||||
|
||||
<p>In this chapter, we’ll look at what makes effective static analysis, some of the lessons we at Google have learned about making static analysis work, and how we implemented these best practices in our static analysis tooling and processes.<sup><a data-type="noteref" id="ch01fn195-marker" href="ch20.html#ch01fn195">3</a></sup></p>
|
||||
|
||||
<section data-type="sect1" id="characteristics_of_effective_static_ana">
|
||||
<h1>Characteristics of Effective Static Analysis</h1>
|
||||
|
||||
<p>Although there have been decades of static analysis<a contenteditable="false" data-primary="static analysis" data-secondary="effective, characteristics of" data-type="indexterm" id="ix_stataneff"> </a> research focused on developing new analysis techniques and specific analyses, a focus on approaches for improving <em>scalability</em> and <em>usability</em> of static analysis tools has been a relatively recent <span class="keep-together">development.</span></p>
|
||||
|
||||
<section data-type="sect2" id="scalability">
|
||||
<h2>Scalability</h2>
|
||||
|
||||
<p>Because modern <a contenteditable="false" data-primary="scalability" data-secondary="of static analysis tools" data-type="indexterm" id="id-m1h1svCrHzuk"> </a>software has become <a contenteditable="false" data-primary="static analysis" data-secondary="effective, characteristics of" data-tertiary="scalability" data-type="indexterm" id="id-jGhJC7CPHNud"> </a>larger, analysis tools must explicitly address scaling in order to produce results in a timely manner, without slowing down the software development process. Static analysis tools at Google must scale to the size of Google’s multibillion-line codebase. To do this, analysis tools are shardable and incremental. Instead of analyzing entire large projects, we focus analyses on files affected by a pending code change, and typically show analysis results only for edited files or lines. Scaling also has benefits: because our codebase is so large, there is a lot of low-hanging fruit in terms of bugs to find. In addition to making sure analysis tools can run on a large codebase, we also must scale up the number and variety of analyses available. Analysis contributions are solicited from throughout the company. Another component to static analysis scalability is ensuring the <em>process</em> is scalable. To do this, Google static analysis infrastructure avoids bottlenecking analysis results by showing them directly to relevant engineers.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="usability">
|
||||
<h2>Usability</h2>
|
||||
|
||||
<p>When thinking about<a contenteditable="false" data-primary="static analysis" data-secondary="effective, characteristics of" data-tertiary="usability" data-type="indexterm" id="id-jGhOs7C9tNud"> </a> analysis usability, it is important<a contenteditable="false" data-primary="usability of static analyses" data-type="indexterm" id="id-qphyC5CgtouL"> </a> to consider the cost-benefit trade-off for static analysis tool users. This “cost” could either be in terms of developer time or code quality. Fixing a static analysis warning could introduce a bug. For code that is not being frequently modified, why “fix” code that is running fine in production? For example, fixing a dead code warning by adding a call to the previously dead code could result in untested (possibly buggy) code suddenly running. There is unclear benefit and potentially high cost. For this reason, we generally focus on newly introduced warnings; existing issues in otherwise working code are typically only worth highlighting (and fixing) if they are particularly important (security issues, significant bug fixes, etc.). Focusing on newly introduced warnings (or warnings on modified lines) also means that the developers viewing the warnings have the most relevant context on them.</p>
|
||||
|
||||
<p>Also, developer time is valuable! Time spent triaging analysis reports or fixing highlighted issues is weighed against the benefit provided by a particular analysis. If the analysis author can save time (e.g., by providing a fix that can be automatically applied to the code in question), the cost in the trade-off goes down. Anything that can be fixed automatically should be fixed automatically. We also try to show developers reports about issues that actually have a negative impact on code quality so that they do not waste time slogging through irrelevant results.</p>
|
||||
|
||||
<p>To further reduce the cost of reviewing static analysis results, we focus on smooth developer workflow integration. A further strength of homogenizing everything in one workflow is that a dedicated tools team can update tools along with workflow and code, allowing analysis tools to evolve with the source code in tandem.</p>
|
||||
|
||||
<p>We believe these choices and trade-offs that we have made in making static analyses scalable and usable arise organically from our focus on three core principles, which we formulate as lessons in the next section.<a contenteditable="false" data-primary="static analysis" data-secondary="effective, characteristics of" data-startref="ix_stataneff" data-type="indexterm" id="id-PEhgs1I3tqub"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="key_lessons_in_making_static_analysis_w">
|
||||
<h1>Key Lessons in Making Static Analysis Work</h1>
|
||||
|
||||
<p>There are three key lessons that we have learned at Google about what makes static analysis tools work well. <a contenteditable="false" data-primary="static analysis" data-secondary="making it work, key lessons in" data-type="indexterm" id="ix_statanmkwk"> </a>Let’s take a look at them in the following subsections.</p>
|
||||
|
||||
<section data-type="sect2" id="focus_on_developer_happiness">
|
||||
<h2>Focus on Developer Happiness</h2>
|
||||
|
||||
<p>We mentioned some of the ways in which we try to save developer time and reduce the cost of interacting <a contenteditable="false" data-primary="static analysis" data-secondary="making it work, key lessons in" data-tertiary="focus on developer happiness" data-type="indexterm" id="id-jGhOs7CPHlSd"> </a>with the<a contenteditable="false" data-primary="developer happiness, focus on, with static analysis" data-type="indexterm" id="id-qphyC5ClHRSL"> </a> aforementioned static analysis tools; we also keep track of how well analysis tools are performing. If you don’t measure this, you can’t fix problems. We only deploy analysis tools with low false-positive rates (more on that in a minute). <a contenteditable="false" data-primary="feedback" data-secondary="soliciting from developers on static analysis" data-type="indexterm" id="id-5Ah0HpC8HAS7"> </a>We also <em>actively solicit and act on feedback</em> from developers consuming static analysis results, in real time. Nurturing this feedback loop between static analysis tool users and tool developers creates a virtuous cycle that has built up user trust and improved our tools. User trust is extremely important for the success of static analysis tools.</p>
|
||||
|
||||
<p>For static<a contenteditable="false" data-primary="false negatives in static analysis" data-type="indexterm" id="id-qphgs8HlHRSL"> </a> analysis, a “false negative” is when a piece of code contains an issue that the analysis tool was designed to find, but the tool misses it. A “false positive” occurs when a tool incorrectly flags code as having the issue.<a contenteditable="false" data-primary="false positives in static analysis" data-type="indexterm" id="id-5AhDCGH8HAS7"> </a> Research about static analysis tools traditionally focused on reducing false negatives; in practice, low false-positive rates are often critical for developers to actually want to use a tool—who wants to wade through hundreds of false reports in search of a few true ones?<sup><a data-type="noteref" id="ch01fn196-marker" href="ch20.html#ch01fn196">4</a></sup></p>
|
||||
|
||||
<p>Furthermore, <em>perception</em> is a key aspect of the false-positive rate. If a static analysis tool is producing warnings that are technically correct but misinterpreted by users as false positives (e.g., due to confusing messages), users will react the same as if those warnings were in fact false positives. Similarly, warnings that are technically correct but unimportant in the grand scheme of things provoke the same reaction. We call the user-perceived false-positive rate the “effective false positive” rate. An issue is an “effective false positive” if developers did not take some positive action after seeing the issue. This means that if an analysis incorrectly reports an issue, yet the developer happily makes the fix anyway to improve code readability or maintainability, that is not an effective false positive. For example, we have a Java analysis that flags cases in which a developer calls the <code>contains</code> method on a hash table (which is equivalent to <code>containsValue</code>) when they actually meant to call <code>containsKey</code>—even if the developer correctly meant to check for the value, calling <code>containsValue</code> instead is clearer. Similarly, if an analysis reports an actual fault, yet the developer did not understand the fault and therefore took no action, that is an effective false positive.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="make_static_analysis_a_part_of_the_core">
|
||||
<h2>Make Static Analysis a Part of the Core Developer Workflow</h2>
|
||||
|
||||
<p>At Google, we integrate <a contenteditable="false" data-primary="developer workflow, making static analysis part of" data-type="indexterm" id="id-qphgs5CgtRSL"> </a>static analysis into the core <a contenteditable="false" data-primary="static analysis" data-secondary="making it work, key lessons in" data-tertiary="making static analysis part of core developer workflow" data-type="indexterm" id="id-5AhDCpCXtAS7"> </a>workflow via integration with code review tooling. Essentially all code committed at Google is reviewed before being committed; because developers are already in a change mindset when they send code for review, improvements suggested by static analysis tools can be made without too much disruption. There are other benefits to code review integration. Developers typically context switch after sending code for review, and are blocked on reviewers—there is time for analyses to run, even if they take several minutes to do so. There is also peer pressure from reviewers to address static analysis warnings. Furthermore, static analysis can save reviewer time by highlighting common issues automatically; static analysis tools help the code review process (and the reviewers) scale. Code review is a sweet spot for analysis results.<sup><a data-type="noteref" id="ch01fn197-marker" href="ch20.html#ch01fn197">5</a></sup></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="empower_users_to_contribute">
|
||||
<h2>Empower Users to Contribute</h2>
|
||||
|
||||
<p>There are many domain experts at Google whose knowledge could improve code produced. <a contenteditable="false" data-primary="static analysis" data-secondary="making it work, key lessons in" data-tertiary="empowering users to contribute" data-type="indexterm" id="id-5AhkspCLIAS7"> </a>Static analysis is an opportunity to leverage expertise and apply it at scale by having domain experts write new analysis tools or individual checks within a tool. For example, experts who know the context for a particular kind of configuration file can write an analyzer that checks properties of those files. In addition to domain experts, analyses are contributed by developers who discover a bug and would like to prevent the same kind of bug from reappearing anywhere else in the codebase. We focus on building a static analysis ecosystem that is easy to plug into instead of integrating a small set of existing tools. We have focused on developing simple APIs that can be used by engineers throughout Google—not just analysis or language experts—to create analyses; for example, Refaster<sup><a data-type="noteref" id="ch01fn198-marker" href="ch20.html#ch01fn198">6</a></sup> enables writing an analyzer by specifying pre- and post-code snippets demonstrating what transformations are expected by that analyzer.<a contenteditable="false" data-primary="static analysis" data-secondary="making it work, key lessons in" data-startref="ix_statanmkwk" data-type="indexterm" id="id-yQh8HXCPIoS7"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tricorder_googleapostrophes_static_anal">
|
||||
<h1>Tricorder: Google’s Static Analysis Platform</h1>
|
||||
|
||||
<p>Tricorder, our static analysis<a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-type="indexterm" id="ix_statanTri"> </a> platform, is a core part<a contenteditable="false" data-primary="Tricorder static analysis platform" data-type="indexterm" id="ix_Tric"> </a> of static analysis at Google.<sup><a data-type="noteref" id="ch01fn199-marker" href="ch20.html#ch01fn199">7</a></sup> Tricorder came out of several failed attempts to integrate static analysis with the developer workflow at Google;<sup><a data-type="noteref" id="ch01fn200-marker" href="ch20.html#ch01fn200">8</a></sup> the key difference between Tricorder and previous attempts was our relentless focus on having Tricorder deliver only valuable results to its users. Tricorder is integrated with the main code review tool at Google, Critique. <a contenteditable="false" data-primary="Critique code review tool" data-secondary="diff viewer, Tricorder warnings on" data-type="indexterm" id="id-PEhDIVCqUO"> </a>Tricorder warnings show up on Critique’s diff viewer as gray comment boxes, as demonstrated in <a data-type="xref" href="ch20.html#critiqueapostrophes_diff_viewingcomma_s">Figure 20-1</a>.</p>
|
||||
|
||||
<figure id="critiqueapostrophes_diff_viewingcomma_s"><img alt="Critique’s diff viewing, showing a static analysis warning from Tricorder in gray" src="images/seag_2001.png">
|
||||
<figcaption><span class="label">Figure 20-1. </span>Critique’s diff viewing, showing a static analysis warning from Tricorder in gray</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>To scale, Tricorder uses a microservices architecture. The Tricorder system sends analyze requests to analysis servers along with metadata about a code change. These servers can use that metadata to read the versions of the source code files in the change via a FUSE-based filesystem and can access cached build inputs and outputs. The analysis server then starts running each individual analyzer and writes the output to a storage layer; the most recent results for each category are then displayed in Critique. Because analyses sometimes take a few minutes to run, analysis servers also post status updates to let change authors and reviewers know that analyzers are running and post a completed status when they have finished. Tricorder analyzes more than 50,000 code review changes per day and is often running several analyses per second.</p>
|
||||
|
||||
<p>Developers throughout Google write Tricorder analyses (called “analyzers”) or contribute individual “checks” to existing analyses.<a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="criteria for new checks" data-type="indexterm" id="id-5AhksOIeUv"> </a> There are four criteria for new Tricorder checks:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Be understandable</dt>
|
||||
<dd>Be easy for any engineer to understand the output.</dd>
|
||||
<dt>Be actionable and easy to fix</dt>
|
||||
<dd>The fix might require more time, thought, or effort than a compiler check, and the result should include guidance as to how the issue might indeed be fixed.</dd>
|
||||
<dt>Produce less than 10% effective false positives</dt>
|
||||
<dd>Developers should feel the check is pointing out an actual issue <a href="https://oreil.ly/ARSzt">at least 90% of the time</a>.</dd>
|
||||
<dt>Have the potential for significant impact on code quality</dt>
|
||||
<dd>The issues might not affect correctness, but developers should take them seriously and deliberately choose to fix them.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Tricorder analyzers report results for more than 30 languages and support a variety of analysis types. Tricorder includes more than 100 analyzers, with most being contributed from outside the Tricorder team. Seven of these analyzers are themselves plug-in systems that have hundreds of additional checks, again contributed from developers across Google. The overall effective false-positive rate is just below 5%.</p>
|
||||
|
||||
<section data-type="sect2" id="integrated_tools">
|
||||
<h2>Integrated Tools</h2>
|
||||
|
||||
<p>There are many <a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="integrated tools" data-type="indexterm" id="id-zghysGCnUPUL"> </a>different <a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="integrated tools" data-type="indexterm" id="id-GohLCXCZUaUk"> </a>types of static analysis tools integrated with Tricorder.</p>
|
||||
|
||||
<p><a href="http://errorprone.info">Error Prone</a> and <a href="https://oreil.ly/DAMiv">clang-tidy</a> extend the compiler <a contenteditable="false" data-primary="clang-tidy" data-secondary="integration with Tricorder" data-type="indexterm" id="id-rkhXHzHXUZUw"> </a>to identify AST antipatterns for Java and C++, respectively.<a contenteditable="false" data-primary="Error Prone tool (Java)" data-secondary="integration with Tricorder" data-type="indexterm" id="id-1JhotwHRUvUV"> </a> These antipatterns could represent real bugs. For example, consider the following code snippet hashing a field <code>f</code> of type <code>long</code>:</p>
|
||||
|
||||
<pre data-type="program-listing">result = 31 * result + (int) (f ^ (f >>> 32));
|
||||
</pre>
|
||||
|
||||
<p>Now consider the case in which the type of <code>f</code> is <code>int</code>. The code will still compile, but the right shift by 32 is a no-op so that <code>f</code> is XORed with itself and no longer affects the value produced. We fixed 31 occurrences of this bug in Google’s codebase while enabling the check as a compiler error in Error Prone. There are <a href="https://errorprone.info/bugpatterns">many more such examples</a>. AST antipatterns can also result in code readability improvements, such as removing a redundant call to <code>.get()</code> on a smart pointer.</p>
|
||||
|
||||
<p>Other analyzers showcase relationships between disparate files in a corpus. The Deleted Artifact Analyzer warns if a source file is deleted that is referenced by other non-code places in the codebase (such as inside checked-in documentation). IfThisThenThat allows developers to specify that portions of two different files must be changed in tandem (and warns if they are not). Chrome’s Finch analyzer runs on configuration files for A/B experiments in Chrome, highlighting common problems including not having the right approvals to launch an experiment or crosstalk with other currently running experiments that affect the same population. The Finch analyzer makes Remote Procedure Calls (RPCs) to other services in order to provide this information.</p>
|
||||
|
||||
<p>In addition to the source code itself, some analyzers run on other artifacts produced by that source code; many projects have enabled a binary size checker that warns when changes significantly affect a binary size.</p>
|
||||
|
||||
<p>Almost all analyzers are intraprocedural, meaning that the analysis results are based on code within a procedure (function). Compositional or incremental interprocedural analysis techniques are technically feasible but would require additional infrastructure investment (e.g., analyzing and storing method summaries as analyzers run).</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="integrated_feedback_channels">
|
||||
<h2>Integrated Feedback Channels</h2>
|
||||
|
||||
<p>As mentioned earlier, establishing <a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="integrated feedback channels" data-type="indexterm" id="id-GohnsXC0caUk"> </a>a feedback<a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="integrated feedback channels" data-type="indexterm" id="id-dqhKCaCycJUR"> </a> loop between <a contenteditable="false" data-primary="feedback" data-secondary="integrated feedback channels in Tricorder" data-type="indexterm" id="id-rkhXHQCxcZUw"> </a>analysis consumers and analysis writers is critical to track and maintain developer happiness. With Tricorder, we display the option to click a “Not useful” button on an analysis result; this click provides the option to file a bug <em>directly against the analyzer writer</em> about why the result is not useful with information about analysis result prepopulated. Code reviewers can also ask change authors to address analysis results by clicking a “Please fix” button. The Tricorder team tracks analyzers with high “Not useful” click rates, particularly relative to how often reviewers ask to fix analysis results, and will disable analyzers if they don’t work to address problems and improve the “not useful” rate. Establishing and tuning this feedback loop took a lot of work, but has paid dividends many times over in improved analysis results and a better user experience (UX)—before we established clear feedback channels, many developers would just ignore analysis results they did not understand.</p>
|
||||
|
||||
<p>And sometimes the fix is pretty simple—such as updating the text of the message an analyzer outputs! For example, we once rolled out an Error Prone check that flagged when too many arguments were being passed to a <code>printf</code>-like function in Guava that accepted only <code>%s</code> (and no other <code>printf</code> specifiers). The Error Prone team received weekly "Not useful" bug reports claiming the analysis was incorrect because the number of format specifiers matched the number of arguments—all due to users trying to pass specifiers other than <code>%s</code>. After the team changed the diagnostic text to state directly that the function accepts only the <code>%s</code> placeholder, the influx of bug reports stopped. Improving the message produced by an analysis provides an explanation of what is wrong, why, and how to fix it exactly at the point where that is most relevant and can make the difference for developers learning something when they read the message.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="suggested_fixes">
|
||||
<h2>Suggested Fixes</h2>
|
||||
|
||||
<p>Tricorder<a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="suggested fixes" data-type="indexterm" id="id-dqhasaCvfJUR"> </a> checks also, <a contenteditable="false" data-primary="Critique code review tool" data-secondary="view of static analysis fix" data-type="indexterm" id="id-rkhJCQCpfZUw"> </a>when possible, <em>provide fixes</em>, as<a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="suggested fixes" data-type="indexterm" id="id-DGhjteCgf2U1"> </a> shown in <a data-type="xref" href="ch20.html#view_of_an_example_static_analysis_fix">Figure 20-2</a>.</p>
|
||||
|
||||
<figure id="view_of_an_example_static_analysis_fix"><img alt="View of an example static analysis fix in Critique" src="images/seag_2002.png">
|
||||
<figcaption><span class="label">Figure 20-2. </span>View of an example static analysis fix in Critique</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>Automated fixes serve as an additional documentation source when the message is unclear and, as mentioned earlier, reduce the cost to addressing static analysis issues. Fixes can be applied directly from within Critique, or over an entire code change via a command-line tool. Although not all analyzers provide fixes, many do. We take the approach that <em>style</em> issues in particular should be fixed automatically; for example, by formatters that automatically reformat source code files. Google has style guides for each language that specify formatting issues; pointing out formatting errors is not a good use of a human reviewer’s time. Reviewers click “Please Fix” thousands of times per day, and authors apply the automated fixes approximately 3,000 times per day. And Tricorder analyzers received “Not useful” clicks 250 times per day.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="per-project_customization">
|
||||
<h2>Per-Project Customization</h2>
|
||||
|
||||
<p>After we had <a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="per-project customization" data-type="indexterm" id="id-rkh2sQCnhZUw"> </a>built up a foundation of user <a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="per-project customization" data-type="indexterm" id="id-1JhKCDCrhvUV"> </a>trust by showing only high-confidence analysis results, we added the ability to run additional “optional” analyzers to specific projects in addition to the on-by-default ones.<a contenteditable="false" data-primary="Proto Best Practices analyzer" data-type="indexterm" id="id-DGhgHeCDh2U1"> </a> The <em>Proto Best Practices</em> analyzer is an example of an optional analyzer. This analyzer highlights potentially breaking data format changes to <a href="https://developers.google.com/protocol-buffers">protocol buffers</a>—Google’s language-independent data serialization format.<a contenteditable="false" data-primary="protocol buffers static analysis of" data-type="indexterm" id="id-pMh6u5CVhqUk"> </a> These changes are only breaking when serialized data is stored somewhere (e.g., in server logs); protocol buffers for projects that do not have stored serialized data do not need to enable the check. We have also added the ability to customize existing analyzers, although typically this customization is limited, and many checks are applied by default uniformly across the codebase.</p>
|
||||
|
||||
<p>Some analyzers have even started as optional, improved based on user feedback, built up a large userbase, and then graduated into on-by-default status as soon as we could capitalize on the user trust we had built up. For example, we have an analyzer that suggests Java code readability improvements that typically do not actually change code behavior. Tricorder users initially worried about this analysis being too “noisy,” but eventually wanted more analysis results available.</p>
|
||||
|
||||
<p>The key insight to making<a contenteditable="false" data-primary="project-level customization in Tricorder" data-type="indexterm" id="id-DGhpsztDh2U1"> </a> this customization successful was to focus on <em>project-level customization, not user-level customization</em>. Project-level customization ensures that all team members have a consistent view of analysis results for their project and prevents situations in which one developer is trying to fix an issue while another developer introduces it.</p>
|
||||
|
||||
<p>Early on in the development of Tricorder, a set of relatively straightforward style checkers (“linters”) displayed results in Critique, and Critique provided user settings to choose the confidence level of results to display and suppress results from specific analyses. We removed all of this user customizability from Critique and immediately started getting complaints from users about annoying analysis results.<a contenteditable="false" data-primary="linters in Tricorder" data-type="indexterm" id="id-N8hKsPI1hjUy"> </a> Instead of reenabling customizability, we asked users why they were annoyed and found all kinds of bugs and false positives with the linters. For example, the C++ linter also ran on Objective-C files but produced incorrect, useless results. We fixed the linting infrastructure so that this would no longer happen. The HTML linter had an extremely high false-positive rate with very little useful signal and was typically suppressed from view by developers writing HTML. Because the linter was so rarely helpful, we just disabled this linter. In short, user customization resulted in hidden bugs and suppressing feedback.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="presubmits">
|
||||
<h2>Presubmits</h2>
|
||||
|
||||
<p>In addition to code review, there are also <a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="presubmit checks" data-type="indexterm" id="id-1JhwsDCnTvUV"> </a>other workflow <a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="presubmits" data-type="indexterm" id="id-DGh3CeCqT2U1"> </a>integration points for static analysis at Google. <a contenteditable="false" data-primary="presubmits" data-secondary="checks in Tricorder" data-type="indexterm" id="id-N8h8HECPTjUy"> </a>Because developers can choose to ignore static analysis warnings displayed in code review, Google additionally has the ability to add an analysis that blocks committing a pending code change, which we call a <em>presubmit check</em>. Presubmit checks include very simple customizable built-in checks on the contents or metadata of a change, such as ensuring that the commit message does not say “DO NOT SUBMIT” or that test files are always included with corresponding code files. Teams can also specify a suite of tests that must pass or verify that there are no Tricorder issues for a particular category. Presubmits also check that code is well formatted. Presubmit checks are typically run when a developer mails out a change for review and again during the commit process, but they can be triggered on an ad hoc basis in between those points. See <a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a> for more details on presubmits at Google.</p>
|
||||
|
||||
<p>Some teams have written their own custom presubmits. These are additional checks on top of the base presubmit set that add the ability to enforce higher best-practice standards than the company as a whole and add project-specific analysis. This enables new projects to have stricter best-practice guidelines than projects with large amounts of legacy code (for example). Team-specific presubmits can make the large-scale change (LSC) process (see <a data-type="xref" href="ch22.html#large-scale_changes">Large-Scale Changes</a>) more difficult, so some are skipped for changes with “CLEANUP=” in the change description.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="compiler_integration">
|
||||
<h2>Compiler Integration</h2>
|
||||
|
||||
<p>Although blocking commits with static analysis is great, it is even better to notify developers of problems even earlier in the workflow.<a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="compiler integration" data-type="indexterm" id="id-DGhpseCoF2U1"> </a><a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="compiler integration" data-type="indexterm" id="id-N8hdCECDFjUy"> </a> When possible, we try to push static analysis into the compiler. <a contenteditable="false" data-primary="compiler integration with static analysis" data-type="indexterm" id="id-M2hPHLC3FNUb"> </a>Breaking the build is a warning that is not possible to ignore, but is infeasible in many cases. However, some analyses are highly mechanical and have no effective false positives. An example is <a href="https://errorprone.info/bugpatterns">Error Prone “ERROR” checks</a>. These checks are all enabled in Google’s Java compiler, preventing instances of the error from ever being introduced again into our codebase. Compiler checks need to be fast so that they don’t slow down the build. In addition, we enforce these three criteria (similar criteria exist for the C++ compiler):</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Actionable and easy to fix (whenever possible, the error should include a suggested fix that can be applied mechanically)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Produce no effective false positives (the analysis should never stop the build for correct code)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Report issues affecting only correctness rather than style or best practices</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>To enable a new check, we first need to clean up all instances of that problem in the codebase so that we don’t break the build for existing projects just because the compiler has evolved. This also implies that the value in deploying a new compiler-based check must be high enough to warrant fixing all existing instances of it. Google has infrastructure in place for running various compilers (such as clang and javac) over the entire codebase in parallel via a cluster—as a MapReduce operation. When compilers are run in this MapReduce fashion, the static analysis checks run must produce fixes in order to automate the cleanup. After a pending code change is prepared and tested that applies the fixes across the entire codebase, we commit that change and remove all existing instances of the problem. We then turn the check on in the compiler so that no new instances of the problem can be committed without breaking the build. Build breakages are caught after commit by our Continuous Integration (CI) system, or before commit by presubmit checks (see the earlier discussion).</p>
|
||||
|
||||
<p>We also aim to never issue compiler warnings. We have found repeatedly that developers ignore compiler warnings. We either enable a compiler check as an error (and break the build) or don’t show it in compiler output. Because the same compiler flags are used throughout the codebase, this decision is made globally. Checks that can’t be made to break the build are either suppressed or shown in code review (e.g., through Tricorder). Although not every language at Google has this policy, the most frequently used ones do. Both of the Java and C++ compilers have been configured to avoid displaying compiler warnings. The Go compiler takes this to extreme; some things that other languages would consider warnings (such as unused variables or package imports) are errors in Go.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="analysis_while_editing_and_browsing_cod">
|
||||
<h2>Analysis While Editing and Browsing Code</h2>
|
||||
|
||||
<p>Another potential integration point for static analysis is<a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-tertiary="analysis while editing and browsing code" data-type="indexterm" id="id-N8hKsEC0ijUy"> </a> in an integrated development environment (IDE). <a contenteditable="false" data-primary="Tricorder static analysis platform" data-secondary="analysis while editing and browsing code" data-type="indexterm" id="id-M2hbCLCMiNUb"> </a><a contenteditable="false" data-primary="IDEs (integrated development environments)" data-secondary="static analysis and" data-type="indexterm" id="id-pMhXH5CGiqUk"> </a>However, IDE analyses require quick analysis times (typically less than 1 second and ideally less than 100 ms), and so some tools are not suitable to integrate here. In addition, there is the problem of making sure the same analysis runs identically in multiple IDEs. We also note that IDEs can rise and fall in popularity (we don’t mandate a single IDE); hence IDE integration tends to be messier than plugging into the review process. Code review also has specific benefits for displaying analysis results. Analyses can take into account the entire context of the change; some analyses can be inaccurate on partial code (such as a dead code analysis when a function is implemented before adding callsites). Showing analysis results in code review also means that code authors have to convince reviewers as well if they want to ignore analysis results. That said, IDE integration for suitable analyses is another great place to display static analysis results.</p>
|
||||
|
||||
<p>Although we mostly focus on showing newly introduced static analysis warnings, or warnings on edited code, for some analyses, developers actually do want the ability to view analysis results over the entire codebase during code browsing. An example of this are some security analyses. Specific security teams at Google want to see a holistic view of all instances of a problem. Developers also like viewing analysis results over the codebase when planning a cleanup. In other words, there are times when showing results when code browsing is the right choice.<a contenteditable="false" data-primary="Tricorder static analysis platform" data-startref="ix_Tric" data-type="indexterm" id="id-M2hDsrHMiNUb"> </a><a contenteditable="false" data-primary="static analysis" data-secondary="Tricorder platform" data-startref="ix_statanTri" data-type="indexterm" id="id-pMhlCKHGiqUk"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="conclusion-id00024">
|
||||
<h1 class="less_space">Conclusion</h1>
|
||||
|
||||
<p>Static analysis can be a great tool to improve a codebase, find bugs early, and allow more expensive processes (such as human review and testing) to focus on issues that are not mechanically verifiable. By improving the scalability and usability of our static analysis infrastructure, we have made static analysis an effective component of software development at Google.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00126">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p><em>Focus on developer happiness</em>. We have invested considerable effort in building feedback channels between analysis users and analysis writers in our tools, and aggressively tune analyses to reduce the number of false positives.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Make static analysis part of the core developer workflow</em>. The main integration point for static analysis at Google is through code review, where analysis tools provide fixes and involve reviewers. However, we also integrate analyses at additional points (via compiler checks, gating code commits, in IDEs, and when browsing code).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Empower users to contribute</em>. We can scale the work we do building and maintaining analysis tools and platforms by leveraging the expertise of domain experts. Developers are continuously adding new analyses and checks that make their lives easier and our codebase better.<a contenteditable="false" data-primary="static analysis" data-startref="ix_statan" data-type="indexterm" id="id-yQhGCPsGH6Clf9"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn193"><sup><a href="ch20.html#ch01fn193-marker">1</a></sup>See <a href="http://errorprone.info/bugpatterns"><em class="hyperlink">http://errorprone.info/bugpatterns</em></a>.</p><p data-type="footnote" id="ch01fn194"><sup><a href="ch20.html#ch01fn194-marker">2</a></sup>Caitlin Sadowski et al. <a href="https://oreil.ly/9Y-tP">Tricorder: Building a Program Analysis Ecosystem</a>, International Conference on Software Engineering (ICSE), May 2015.</p><p data-type="footnote" id="ch01fn195"><sup><a href="ch20.html#ch01fn195-marker">3</a></sup>A good academic reference for static analysis theory is: Flemming Nielson et al. <em>Principles of Program Analysis</em> (Germany: Springer, 2004).</p><p data-type="footnote" id="ch01fn196"><sup><a href="ch20.html#ch01fn196-marker">4</a></sup>Note that there are some specific analyses for which reviewers might be willing to tolerate a much higher false-positive rate: one example is security analyses that identify critical problems.</p><p data-type="footnote" id="ch01fn197"><sup><a href="ch20.html#ch01fn197-marker">5</a></sup>See later in this chapter for more information on additional integration points when editing and browsing code.</p><p data-type="footnote" id="ch01fn198"><sup><a href="ch20.html#ch01fn198-marker">6</a></sup>Louis Wasserman, “<a href="https://oreil.ly/XUkFp">Scalable, Example-Based Refactorings with Refaster</a>.” Workshop on Refactoring Tools, 2013.</p><p data-type="footnote" id="ch01fn199"><sup><a href="ch20.html#ch01fn199-marker">7</a></sup>Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter, <a href="https://oreil.ly/mJXTD">Tricorder: Building a Program Analysis Ecosystem</a>, International Conference on Software Engineering (ICSE), May 2015.</p><p data-type="footnote" id="ch01fn200"><sup><a href="ch20.html#ch01fn200-marker">8</a></sup>Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan, “Lessons from Building Static Analysis Tools at Google”, <em>Communications of the ACM</em>, 61 No. 4 (April 2018): 58–66, <a href="https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext"><em>https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext</em></a>.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
471
clones/abseil.io/resources/swe-book/html/ch21.html
Normal file
|
@ -0,0 +1,471 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="dependency_management">
|
||||
<h1>Dependency Management</h1>
|
||||
|
||||
<p class="byline">Written by Titus Winters</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>Dependency management—the management <a contenteditable="false" data-primary="dependency management" data-type="indexterm" id="ix_depmgt"> </a>of networks of libraries, packages, and dependencies that we don’t control—is one of the least understood and most challenging problems in software engineering. Dependency management focuses on questions like: how do we update between versions of external dependencies? How do we describe versions, for that matter? What types of changes are allowed or expected in our dependencies? How do we decide when it is wise to depend on code produced by other organizations?</p>
|
||||
|
||||
<p>For comparison, the most closely related topic here is source control. Both areas describe how we work with source code. Source control covers the easier part: where do we check things in?<a contenteditable="false" data-primary="trunk-based development" data-secondary="source control questions and" data-type="indexterm" id="id-RQsvC8UG"> </a> How do we get things into the build? After we accept the value of trunk-based development, most of the day-to-day source control questions for an organization are fairly mundane: “I’ve got a new thing, what directory do I add it to?”</p>
|
||||
|
||||
<p>Dependency management adds additional complexity in both time and scale. In a trunk-based source control problem, it’s fairly clear when you make a change that you need to run the tests and not break existing code. That’s predicated on the idea that you’re working in a shared codebase, have visibility into how things are being used, and can trigger the build and run the tests. Dependency management focuses on the problems that arise when changes are being made outside of your organization, without full access or visibility. Because your upstream dependencies can’t coordinate with your private code, they are more likely to break your build and cause your tests to fail. How do we manage that? Should we not take external dependencies? Should we ask for greater consistency between releases of external dependencies? When do we update to a new version?</p>
|
||||
|
||||
<p>Scale makes all of these questions more complex, with the realization that we aren’t really talking about single dependency imports, and in the general case that we’re depending on an entire network of external dependencies. When we begin dealing with a network, it is easy to construct scenarios in which your organization’s use of two dependencies becomes unsatisfiable at some point in time. Generally, this happens because one dependency stops working without some requirement,<sup><a data-type="noteref" id="ch01fn201-marker" href="ch21.html#ch01fn201">1</a></sup> whereas the other is incompatible with the same requirement. Simple solutions about how to manage a single outside dependency usually fail to account for the realities of managing a large network. We’ll spend much of this chapter discussing various forms of these conflicting requirement problems.</p>
|
||||
|
||||
<p>Source control and dependency management <a contenteditable="false" data-primary="source control" data-secondary="dependency management and" data-type="indexterm" id="id-1XsLC7I2"> </a>are related issues separated by the question: “Does our organization control the development/update/management of this subproject?” For example, if every team in your company has separate repositories, goals, and development practices, the interaction and management of code produced by those teams is going to have more to do with dependency management than source control. On the other hand, a large organization with a (virtual?) single repository (monorepo) can scale up significantly farther with source control policies—this is Google’s approach. Separate open source projects certainly count as separate organizations: interdependencies between unknown and not-necessarily-collaborating projects are a dependency management problem. Perhaps our strongest single piece of advice on this topic is this: <em>All else being equal, prefer source control problems over dependency-management problems.</em> If you have the option to redefine “organization” more broadly (your entire company rather than just one team), that’s very often a good trade-off. Source control problems are a lot easier to think about and a lot cheaper to deal with than dependency-management ones.</p>
|
||||
|
||||
<p>As the Open Source Software (OSS) model continues<a contenteditable="false" data-primary="Open Source Software (OSS)" data-secondary="dependency management and" data-type="indexterm" id="id-6XsvCRfl"> </a> to grow and expand into new domains, and the dependency graph for many popular projects continues to expand over time, dependency management is perhaps becoming the most important problem in software engineering policy. We are no longer disconnected islands built on one or two layers outside an API. Modern software is built on towering pillars of dependencies; but just because we can build those pillars doesn’t mean we’ve yet figured out how to keep them standing and stable over time.</p>
|
||||
|
||||
<p>In this chapter, we’ll look at the particular challenges of dependency management, explore solutions (common and novel) and their limitations, and look at the realities of working with dependencies, including how we’ve handled things in Google. It is important to preface all of this with an admission: we’ve invested a lot of <em>thought</em> into this problem and have extensive experience with refactoring and maintenance issues that show the practical shortcomings with existing approaches. We don’t have firsthand evidence of solutions that work well across organizations at scale. To some extent, this chapter is a summary of what we know does not work (or at least might not work at larger scales) and where we think there is the potential for better outcomes. We definitely cannot claim to have all the answers here; if we could, we wouldn’t be calling this one of the most important problems in software engineering.</p>
|
||||
|
||||
<section data-type="sect1" id="why_is_dependency_management_so_difficu">
|
||||
<h1>Why Is Dependency Management So Difficult?</h1>
|
||||
|
||||
<p>Even defining the dependency-management problem presents some unusual challenges. <a contenteditable="false" data-primary="dependency management" data-secondary="difficulty of, reasons for" data-type="indexterm" id="ix_depmgtdiff"> </a>Many half-baked solutions in this space focus on a too-narrow problem formulation: “How do we import a package that our locally developed code can depend upon?” This is a necessary-but-not-sufficient formulation. The trick isn’t just finding a way to manage one dependency—the trick is how to manage a <em>network</em> of dependencies and their changes over time. Some subset of this network is directly necessary for your first-party code, some of it is only pulled in by transitive dependencies. Over a long enough period, all of the nodes in that dependency network will have new versions, and some of those updates will be important.<sup><a data-type="noteref" id="ch01fn202-marker" href="ch21.html#ch01fn202">2</a></sup> How do we manage the resulting cascade of upgrades for the rest of the dependency network? Or, specifically, how do we make it easy to find mutually compatible versions of all of our dependencies given that we do not control those dependencies? How do we analyze our dependency network? How do we manage that network, especially in the face of an ever-growing graph of dependencies?</p>
|
||||
|
||||
<section data-type="sect2" id="conflicting_requirements_and_diamond_de">
|
||||
<h2>Conflicting Requirements and Diamond Dependencies</h2>
|
||||
|
||||
<p>The central problem in dependency<a contenteditable="false" data-primary="dependency management" data-secondary="difficulty of, reasons for" data-tertiary="conflicting requirements and diamond dependencies" data-type="indexterm" id="ix_depmgtdiffCRDD"> </a> management highlights the importance of thinking in terms of dependency networks, not individual dependencies. Much of the difficulty stems from one problem: what happens when two nodes in the dependency network have conflicting requirements, and your organization depends on them both? This can arise for many reasons, ranging from platform considerations (operating system [OS], language version, compiler version, etc.) to the much more mundane issue of version incompatibility. The canonical example of version incompatibility as an unsatisfiable version<a contenteditable="false" data-primary="diamond dependency issue" data-type="indexterm" id="ix_diadep"> </a> requirement is the <em>diamond dependency</em> problem. Although we don’t generally include things like “what version of the compiler” are you using in a dependency graph, most of these conflicting requirements problems are isomorphic to “add a (hidden) node to the dependency graph representing this requirement.” As such, we’ll primarily discuss conflicting requirements in terms of diamond dependencies, but keep in mind that <code>libbase</code> might actually be absolutely any piece of software involved in the construction of two or more nodes in your dependency network.</p>
|
||||
|
||||
<p>The diamond dependency problem, and other forms of conflicting requirements, require at least three layers of dependency, as demonstrated in <a data-type="xref" href="ch21.html#the_diamond_dependency_problem">Figure 21-1</a>.</p>
|
||||
|
||||
<figure id="the_diamond_dependency_problem"><img alt="The diamond dependency problem" src="images/seag_2101.png">
|
||||
<figcaption><span class="label">Figure 21-1. </span>The diamond dependency problem</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>In this simplified model, <code>libbase</code> is used by both <code>liba</code> and <code>libb</code>, and <code>liba</code> and <code>libb</code> are both used by a higher-level component <code>libuser</code>. If <code>libbase</code> ever introduces an incompatible change, there is a chance that <code>liba</code> and <code>libb</code>, as products of separate organizations, don’t update simultaneously. If <code>liba</code> depends on the new <code>libbase</code> version and <code>libb</code> depends on the old version, there’s no general way for <code>libuser</code> (aka your code) to put everything together. This diamond can form at any scale: in the entire network of your dependencies, if there is ever a low-level node that is required to be in two incompatible versions at the same time (by virtue of there being two paths from some higher level node to those two versions), there will be a problem.</p>
|
||||
|
||||
<p>Different programming languages tolerate the diamond dependency problem to different degrees. For some languages, it is possible to embed multiple (isolated) versions of a dependency within a build: a call into <code>libbase</code> from <code>liba</code> might call a different version of the same API as a call into <code>libbase</code> from <code>libb</code>. For example, Java provides fairly well-established mechanisms to rename the symbols provided by such a dependency.<sup><a data-type="noteref" id="ch01fn203-marker" href="ch21.html#ch01fn203">3</a></sup> Meanwhile, C++ has nearly zero tolerance for diamond dependencies in a normal build, and they are very likely to trigger arbitrary bugs and undefined behavior (UB) as a result of a clear violation of C++’s <a href="https://oreil.ly/VTZe5">One Definition Rule</a>. You can at best use a similar idea as Java’s shading to hide some symbols in a dynamic-link library (DLL) or in cases in which you’re building and linking separately. However, in all programming languages that we’re aware of, these workarounds are partial solutions at best: embedding multiple versions can be made to work by tweaking the names of <em>functions</em>, but if there are <em>types</em> that are passed around between dependencies, all bets are off. For example, there is simply no way for a <code>map</code> defined in <code>libbase</code> v1 to be passed through some libraries to an API provided by <code>libbase</code> v2 in a semantically consistent fashion. Language-specific hacks to hide or rename entities in separately compiled libraries can provide some cushion for diamond dependency problems, but are not a solution in the general case.<a contenteditable="false" data-primary="diamond dependency issue" data-startref="ix_diadep" data-type="indexterm" id="id-aRsyTlupcLsm"> </a></p>
|
||||
|
||||
<p>If you encounter a conflicting requirement problem, the only easy answer is to skip forward or backward in versions for those dependencies to find something compatible. When that isn’t possible, we must resort to locally patching the dependencies in question, which is particularly challenging because the cause of the incompatibility in both provider and consumer is probably not known to the engineer that first discovers the incompatibility. This is inherent: <code>liba</code> developers are still working in a compatible fashion with <code>libbase</code> v1, and <code>libb</code> devs have already upgraded to v2. Only a dev who is pulling in both of those projects has the chance to discover the issue, and it’s certainly not guaranteed that they are familiar enough with <code>libbase</code> and <code>liba</code> to work through the upgrade. The easier answer is to downgrade <code>libbase</code> and <code>libb</code>, although that is not an option if the upgrade was originally forced because of security issues.</p>
|
||||
|
||||
<p>Systems of policy and technology for dependency management largely boil down to the question, “How do we avoid conflicting requirements while still allowing change among noncoordinating groups?” If you have a solution for the general form of the diamond dependency problem that allows for the reality of continuously changing requirements (both dependencies and platform requirements) at all levels of the network, you’ve described the interesting part of a<a contenteditable="false" data-primary="dependency management" data-secondary="difficulty of, reasons for" data-startref="ix_depmgtdiffCRDD" data-tertiary="conflicting requirements and diamond dependencies" data-type="indexterm" id="id-GDs0CgIncjs2"> </a> dependency-management solution.<a contenteditable="false" data-primary="dependency management" data-secondary="difficulty of, reasons for" data-startref="ix_depmgtdiff" data-type="indexterm" id="id-DysyHVIjcPsX"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="importing_dependencies">
|
||||
<h1>Importing Dependencies</h1>
|
||||
|
||||
<p>In programming terms, it’s clearly better to reuse some existing infrastructure rather than build it yourself. <a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-type="indexterm" id="ix_depmgtimp"> </a>This is obvious, and part of the fundamental march of technology: if every novice had to reimplement their own JSON parser and regular expression engine, we’d never get anywhere. Reuse is healthy, especially compared to the cost of redeveloping quality software from scratch. So long as you aren’t downloading trojaned software, if your external dependency satisfies the requirements for your programming task, you should use it.</p>
|
||||
|
||||
<section data-type="sect2" id="compatibility_promises">
|
||||
<h2>Compatibility Promises</h2>
|
||||
|
||||
<p>When we start <a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-tertiary="compatibility promises" data-type="indexterm" id="ix_depmgtimpcmp"> </a>considering time, the situation gains some complicated trade-offs. Just because you get to avoid a <em>development</em> cost doesn’t mean importing a dependency is the correct choice. In a software engineering organization that is aware of time and change, we need to also be mindful of its ongoing maintenance costs. Even if we import a dependency with no intent of upgrading it, discovered security vulnerabilities, changing platforms, and evolving dependency networks can conspire to force that upgrade, regardless of our intent. When that day comes, how expensive is it going to be? Some dependencies are more explicit than others about the expected maintenance cost for merely using that dependency: how much compatibility is assumed? How much evolution is assumed? How are changes handled? For how long are releases supported?</p>
|
||||
|
||||
<p>We suggest that a dependency provider should be clearer about the answers to these questions. Consider the example set by large infrastructure projects with millions of users and their compatibility promises.</p>
|
||||
|
||||
<section data-type="sect3" id="cplusplus">
|
||||
<h3>C++</h3>
|
||||
|
||||
<p>For the C++ standard library, the model is one of nearly indefinite backward compatibility. <a contenteditable="false" data-primary="C++" data-secondary="compatibility promises" data-type="indexterm" id="id-yRsqC7H9tRc9TB"> </a>Binaries built against an older version of the standard library are expected to build and link with the newer standard: the standard provides not only API compatibility, but ongoing backward compatibility<a contenteditable="false" data-primary="ABI compatibility" data-type="indexterm" id="id-bRsDHwHytkcATk"> </a> for the binary artifacts, known as <em>ABI compatibility</em>. The extent to which this has been upheld varies from platform to platform. For users of gcc on Linux, it’s likely that most code works fine over a range of roughly a decade. The standard doesn’t explicitly call out its commitment to ABI compatibility—there are no public-facing policy documents on that point. However, the standard does publish <a href="https://oreil.ly/LoJq8">Standing Document 8</a> (SD-8), which calls out a small set of types of change that the standard library can make between versions, defining implicitly what type of changes to be prepared for. Java is similar: source is compatible between language versions, and JAR files from older releases will readily work with newer versions.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="go">
|
||||
<h3>Go</h3>
|
||||
|
||||
<p>Not all languages prioritize the same amount of compatibility.<a contenteditable="false" data-primary="Go programming language" data-secondary="compatibility promises" data-type="indexterm" id="id-bRszCwHzUkcATk"> </a> The Go programming language explicitly promises source compatibility between most releases, but no binary compatibility. You cannot build a library in Go with one version of the language and link that library into a Go program built with a different version of the language.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="abseil">
|
||||
<h3>Abseil</h3>
|
||||
|
||||
<p>Google’s Abseil <a contenteditable="false" data-primary="Abseil, compatibility promises" data-type="indexterm" id="id-GDs0C0H1u3cjTB"> </a>project is much like Go, with an important caveat about time. We are unwilling to commit to compatibility <em>indefinitely</em>: Abseil lies at the foundation of most of our most computationally heavy services internally, which we believe are likely to be in use for many years to come. This means we’re careful to reserve the right to make changes, especially in implementation details and ABI, in order to allow better performance. We have experienced far too many instances of an API turning out to be confusing and error prone after the fact; publishing such known faults to tens of thousands of developers for the indefinite future feels wrong. Internally, we already have roughly 250 million lines of C++ code that depend on this library—we aren’t going to make API changes lightly, but it must be possible. To that end, Abseil explicitly does not promise ABI compatibility, but does promise a slightly limited form of API compatibility: we won’t make a breaking API change without also <span class="keep-together">providing</span> an automated refactoring tool that will transform code from the old API to the new transparently. We feel that shifts the risk of unexpected costs significantly in favor of users: no matter what version a dependency was written against, a user of that dependency and Abseil should be able to use the most current version. The highest cost should be “run this tool,” and presumably send the resulting patch for review in the mid-level dependency (<code>liba</code> or <code>libb</code>, continuing our example from earlier). In practice, the project is new enough that we haven’t had to make any significant API breaking changes. We can’t say how well this will work for the ecosystem as a whole, but in theory, it seems like a good balance for stability versus ease of upgrade.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="boost">
|
||||
<h3>Boost</h3>
|
||||
|
||||
<p>By comparison, the Boost C++ library makes no<a contenteditable="false" data-primary="Boost C++ library, compatibility promises" data-type="indexterm" id="id-DysmCrH5SqcdTW"> </a> promises <a contenteditable="false" data-primary="C++" data-secondary="Boost library, compatibility promises" data-type="indexterm" id="id-9xsjH2HmSlcPTp"> </a>of <a href="https://www.boost.org/users/faq.html">compatibility between versions</a>. Most code doesn’t change, of course, but “many of the Boost libraries are actively maintained and improved, so backward compatibility with prior version isn’t always possible.” Users are advised to upgrade only at a period in their project life cycle in which some change will not cause problems. The goal for Boost is fundamentally different than the standard library or Abseil: Boost is an experimental proving ground. A particular release from the Boost stream is probably perfectly stable and appropriate for use in many projects, but Boost’s project goals do not prioritize compatibility between versions—other long-lived projects might experience some friction keeping up to date. The Boost developers are every bit as expert as the developers for the standard library<sup><a data-type="noteref" id="ch01fn205-marker" href="ch21.html#ch01fn205">4</a></sup>—none of this is about technical expertise: this is purely a matter of what a project does or does not promise and prioritize.</p>
|
||||
|
||||
<p>Looking at the libraries in this discussion, it’s important to recognize that these compatibility issues are <em>software engineering</em> issues, not <em>programming</em> issues. You can download something like Boost with no compatibility promise and embed it deeply in the most critical, long-lived systems in your organization; it will <em>work</em> just fine. All of the concerns here are about how those dependencies will change over time, keeping up with updates, and the difficulty of getting developers to worry about maintenance instead of just getting features working. Within Google, there is a constant stream of guidance directed to our engineers to help them consider this difference between “I got it to work” and “this is working in a supported fashion.” That’s unsurprising: it’s basic application of Hyrum’s Law, after all.</p>
|
||||
|
||||
<p>Put more broadly: it is important to realize that dependency management has a wholly different nature in a programming task versus a software engineering task. If you’re in a problem space for which maintenance over time is relevant, dependency management is difficult. If you’re purely developing a solution for today with no need to ever update anything, it is perfectly reasonable to grab as many readily available dependencies as you like with no thought of how to use them responsibly or plan for upgrades. Getting your program to work today by violating everything in SD-8 and also relying on binary compatibility from Boost and Abseil works fine…so long as you never upgrade the standard library, Boost, or Abseil, and neither does anything that depends on you.<a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-startref="ix_depmgtimpcmp" data-tertiary="compatibility promises" data-type="indexterm" id="id-rRsnCLtwSRcbTp"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="considerations_when_importing">
|
||||
<h2>Considerations When Importing</h2>
|
||||
|
||||
<p>Importing a dependency<a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-tertiary="considerations in" data-type="indexterm" id="id-OasXCzHAt6TR"> </a> for use in a programming project is nearly free: assuming that you’ve taken the time to ensure that it does what you need and isn’t secretly a security hole, it is almost always cheaper to reuse than to reimplement functionality. Even if that dependency has taken the step of clarifying what compatibility promise it will make, so long as we aren’t ever upgrading, anything you build on top of that snapshot of your dependency is fine, no matter how many rules you violate in consuming that API. But when we move from programming to software engineering, those dependencies become subtly more expensive, and there are a host of hidden costs and questions that need to be answered. Hopefully, you consider these costs before importing, and, hopefully, you know when you’re working on a programming project versus working on a software engineering project.</p>
|
||||
|
||||
<p>When engineers at Google try to import dependencies, we encourage them to ask this (incomplete) list of questions first:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Does the project have tests that you can run?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Do those tests pass?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Who is providing that dependency? Even among “No warranty implied” OSS projects, there is a significant range of experience and skill set—it’s a very different thing to depend on compatibility from the C++ standard library or Java’s Guava library than it is to select a random project from GitHub or npm. Reputation isn’t everything, but it is worth investigating.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>What sort of compatibility is the project aspiring to?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Does the project detail what sort of usage is expected to be supported?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>How popular is the project?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>How long will we be depending on this project?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>How often does the project make breaking changes?</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Add to this a short selection of internally focused questions:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>How complicated would it be to implement that functionality within Google?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>What incentives will we have to keep this dependency up to date?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Who will perform an upgrade?</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>How difficult do we expect it to be to perform an upgrade?</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Our own Russ Cox has <a href="https://research.swtch.com/deps">written about this more extensively</a>. We can’t give a perfect formula for deciding when it’s cheaper in the long term to import versus reimplement; we fail at this ourselves, more often than not.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="how_google_handles_importing_dependende">
|
||||
<h2>How Google Handles Importing Dependencies</h2>
|
||||
|
||||
<p>In short: we could do better.<a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-tertiary="Google's handling of" data-type="indexterm" id="ix_depmgtimpG"> </a></p>
|
||||
|
||||
<p>The overwhelming majority of dependencies in any given Google project are internally developed. This means that the vast majority of our internal dependency-management story isn’t really dependency management, it’s just source control—by design. As we have mentioned, it is a far easier thing to manage and control the complexities and risks involved in adding dependencies when the providers and consumers are part of the same organization and have proper visibility and Continuous Integration (CI; see <a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a>) available. Most problems in dependency management stop being problems when you can see exactly how your code is being used and know exactly the impact of any given change. Source control (when you control the projects in question) is far easier than dependency management (when you don’t).</p>
|
||||
|
||||
<p>That ease of use begins failing when it comes to our handling of external projects. For projects that we are importing from the OSS ecosystem or commercial partners, those dependencies are <a contenteditable="false" data-primary="third_party directory" data-type="indexterm" id="id-bRszCMtzUPTP"> </a>added into a separate directory of our monorepo, labeled <em>third_party</em>. Let’s examine how a new OSS project is added to <em>third_party</em>.</p>
|
||||
|
||||
<p>Alice, a software engineer at Google, is working on a project and realizes that there is an open source solution available. She would really like to have this project completed and demo’ed soon, to get it out of the way before going on vacation. The choice then is whether to reimplement that functionality from scratch or download the OSS package and get it added to <em>third_party</em>. It’s very likely that Alice decides that the faster development solution makes sense: she downloads the package and follows a few steps in our <em>third_party</em> policies. This is a fairly simple checklist: make sure it builds with our build system, make sure there isn’t an existing version of that package, and make sure at least two engineers are signed up as OWNERS to maintain the package in the event that any maintenance is necessary. Alice gets her teammate Bob to say, “Yes, I’ll help.” Neither of them need to have any experience maintaining a <em>third_party</em> package, and they have conveniently avoided the need to understand anything about the <em>implementation</em> of this package. At most, they have gained a little experience with its interface as part of using it to solve the prevacation demo problem.</p>
|
||||
|
||||
<p>From this point on, the package is usually available to other Google teams to use in their own projects. The act of adding additional dependencies is completely <span class="keep-together">transparent</span> to Alice and Bob: they might be completely unaware that the package they downloaded and promised to maintain has become popular. Subtly, even if they are monitoring for new direct usage of their package, they might not necessarily notice growth in the <em>transitive</em> usage of their package. If they use it for a demo, while Charlie adds a dependency from within the guts of our Search infrastructure, the package will have suddenly moved from fairly innocuous to being in the critical infrastructure for important Google systems. However, we don’t have any particular signals surfaced to Charlie when he is considering whether to add this dependency.</p>
|
||||
|
||||
<p>Now, it’s possible that this scenario is perfectly fine. Perhaps that dependency is well written, has no security bugs, and isn’t depended upon by other OSS projects. It might be <em>possible</em> for it to go quite a few years without being updated. It’s not necessarily <em>wise</em> for that to happen: changes externally might have optimized it or added important new functionality, or cleaned up security holes before CVEs<sup><a data-type="noteref" id="ch01fn206-marker" href="ch21.html#ch01fn206">5</a></sup> were discovered. The longer that the package exists, the more dependencies (direct and indirect) are likely to accrue. The more that the package remains stable, the more that we are likely to accrete Hyrum’s Law reliance on the particulars of the version that is checked into <em>third_party</em>.</p>
|
||||
|
||||
<p>One day, Alice and Bob are informed that an upgrade is critical. It could be the disclosure of a security vulnerability in the package itself or in an OSS project that depends upon it that forces an upgrade. Bob has transitioned to management and hasn’t touched the codebase in a while. Alice has moved to another team since the demo and hasn’t used this package again. Nobody changed the OWNERS file. Thousands of projects depend on this indirectly—we can’t just delete it without breaking the build for Search and a dozen other big teams. Nobody has any experience with the implementation details of this package. Alice isn’t necessarily on a team that has a lot of experience undoing Hyrum’s Law subtleties that have accrued over time.</p>
|
||||
|
||||
<p>All of which is to say: Alice and the other users of this package are in for a costly and difficult upgrade, with the security team exerting pressure to get this resolved immediately. Nobody in this scenario has practice in performing the upgrade, and the upgrade is extra difficult because it is covering many smaller releases covering the entire period between initial introduction of the package into <em>third_party</em> and the security disclosure.</p>
|
||||
|
||||
<p>Our <em>third_party</em> policies don’t work for these unfortunately common scenarios. We roughly understand that we need a higher bar for ownership, we need to make it easier (and more rewarding) to update regularly and more difficult for <em>third_party</em> packages to be orphaned and important at the same time. The difficulty is that it is difficult for codebase maintainers and <em>third_party</em> leads to say, “No, you can’t use this thing that solves your development problem perfectly because we don’t have resources to update everyone with new versions constantly.” Projects that are popular and have no compatibility promise (like Boost) are particularly risky: our developers might be very familiar with using that dependency to solve programming problems outside of Google, but allowing it to become ingrained into the fabric of our codebase is a big risk. Our codebase has an expected lifespan of decades at this point: upstream projects that are not explicitly prioritizing stability are a risk.<a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-startref="ix_depmgtimpG" data-tertiary="Google's handling of" data-type="indexterm" id="id-zRsktdhRUyTq"> </a><a contenteditable="false" data-primary="dependency management" data-secondary="importing dependencies" data-startref="ix_depmgtimp" data-type="indexterm" id="id-aRsWUohGUrTm"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="dependency_managementcomma_in_theory">
|
||||
<h1>Dependency Management, In Theory</h1>
|
||||
|
||||
<p>Having looked at the ways that dependency management is difficult<a contenteditable="false" data-primary="dependency management" data-secondary="in theory" data-type="indexterm" id="ix_depmgtthe"> </a> and how it can go wrong, let’s discuss more specifically the problems we’re trying to solve and how we might go about solving them. Throughout this chapter, we call back to the formulation, “How do we manage code that comes from outside our organization (or that we don’t perfectly control): how do we update it, how do we manage the things it depends upon over time?” We need to be clear that any good solution here avoids conflicting requirements of any form, including diamond dependency version conflicts, even in a dynamic ecosystem in which new dependencies or other requirements might be added (at any point in the network). We also need to be aware of the impact of time: all software has bugs, some of those will be security critical, and some fraction of our dependencies will therefore be <em>critical</em> to update over a long enough period of time.</p>
|
||||
|
||||
<p>A stable dependency-management scheme must therefore be flexible with time and scale: we can’t assume indefinite stability of any particular node in the dependency graph, nor can we assume that no new dependencies are added (either in code we control or in code we depend upon). If a solution to dependency management prevents conflicting requirement problems among your dependencies, it’s a good solution. If it does so without assuming stability in dependency version or dependency fan-out, coordination or visibility between organizations, or significant compute resources, it’s a great solution.</p>
|
||||
|
||||
<p>When proposing solutions to dependency management, there are four common options that we know of that exhibit at least some of the appropriate properties: nothing ever changes, semantic versioning, bundle everything that you need (coordinating not per project, but per distribution), or Live at Head.</p>
|
||||
|
||||
<section data-type="sect2" id="nothing_changes_left_parenthesisaka_the">
|
||||
<h2>Nothing Changes (aka The Static Dependency Model)</h2>
|
||||
|
||||
<p>The simplest way to ensure stable<a contenteditable="false" data-primary="dependency management" data-secondary="in theory" data-tertiary="nothing changes (static dependency model)" data-type="indexterm" id="id-yRsqC7HWUbF6"> </a> dependencies is to never change them: no API changes, no behavioral changes, nothing.<a contenteditable="false" data-primary="static dependency model" data-type="indexterm" id="id-bRsDHwHzUnFP"> </a> Bug fixes are allowed only if no user code could be broken. This prioritizes compatibility and stability over all else. Clearly, such a scheme is not ideal due to the assumption of indefinite stability. If, somehow, we get to a world in which security issues and bug fixes are a nonissue and dependencies aren’t changing, the Nothing Changes model is very appealing: if we start with satisfiable constraints, we’ll be able to maintain that property indefinitely.</p>
|
||||
|
||||
<p>Although not sustainable in the long term, practically speaking, this is where every organization starts: up until you’ve demonstrated that the expected lifespan of your project is long enough that change becomes necessary, it’s really easy to live in a world where we assume that nothing changes. It’s also important to note: this is probably the right model for most new organizations. It is comparatively rare to know that you’re starting a project that is going to live for decades and have a <em>need</em> to be able to update dependencies smoothly. It’s much more reasonable to hope that stability is a real option and pretend that dependencies are perfectly stable for the first few years of a project.</p>
|
||||
|
||||
<p>The downside to this model is that, over a long enough time period, it <em>is</em> false, and there isn’t a clear indication of exactly how long you can pretend that it is legitimate. We don’t have long-term early warning systems for security bugs or other critical issues that might force you to upgrade a dependency—and because of chains of dependencies, a single upgrade can in theory become a forced update to your entire dependency network.</p>
|
||||
|
||||
<p>In this model, version selection is simple: there are no decisions to be made, because there are no versions.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="semantic_versioning">
|
||||
<h2>Semantic Versioning</h2>
|
||||
|
||||
<p>The de <a contenteditable="false" data-primary="dependency management" data-secondary="in theory" data-tertiary="semantic versioning" data-type="indexterm" id="id-bRszCwH2unFP"> </a>facto <a contenteditable="false" data-primary="semantic versioning" data-type="indexterm" id="id-GDsRH0H1uOF2"> </a>standard for “how do we manage a network of dependencies today?” is semantic versioning (SemVer).<sup><a data-type="noteref" id="ch01fn207-marker" href="ch21.html#ch01fn207">6</a></sup> SemVer is the nearly ubiquitous practice of representing a version number for some dependency (especially libraries) using three decimal-separated integers, such as 2.4.72 or 1.1.4. In the most common convention, the three component numbers represent major, minor, and patch versions, with the implication that a changed major number indicates a change to an existing API that can break existing usage, a changed minor number indicates purely added functionality that should not break existing usage, and a changed patch version is reserved for non-API-impacting implementation details and bug fixes that are viewed as particularly low risk.</p>
|
||||
|
||||
<p>With the SemVer separation of major/minor/patch versions, the assumption is that a version requirement can generally be expressed as “anything newer than,” barring API-incompatible changes (major version changes). Commonly, we’ll see “Requires <code>libbase</code> ≥ 1.5,” that requirement would be compatible with any <code>libbase</code> in 1.5, including 1.5.1, and anything in 1.6 onward, but not <code>libbase</code> 1.4.9 (missing the API introduced in 1.5) or 2.x (some APIs in <code>libbase</code> were changed incompatibly). Major version changes are a significant incompatibility: because an existing piece of functionality has changed (or been removed), there are potential incompatibilities for all dependents. Version requirements exist (explicitly or implicitly) whenever one dependency uses another: we might see “<code>liba</code> requires <code>libbase</code> ≥ 1.5” and “<code>libb</code> requires <code>libbase</code> ≥ 1.4.7.”</p>
|
||||
|
||||
<p>If we formalize these requirements, we can conceptualize a dependency network as a collection of software components (nodes) and the requirements between them (edges). Edge labels in this network change as a function of the version of the source node, either as dependencies are added (or removed) or as the SemVer requirement is updated because of a change in the source node (requiring a newly added feature in a dependency, for instance). Because this whole network is changing asynchronously over time, the process of finding a mutually compatible set of dependencies that satisfy all the transitive requirements of your application can be challenging.<sup><a data-type="noteref" id="ch01fn208-marker" href="ch21.html#ch01fn208">7</a></sup> Version-satisfiability solvers for SemVer are very much akin to SAT-solvers in logic and algorithms research: given a set of constraints (version requirements on dependency edges), can we find a set of versions for the nodes in question that satisfies all constraints? Most package management ecosystems are built on top of these sorts of graphs, governed by their SemVer SAT-solvers.</p>
|
||||
|
||||
<p>SemVer and its SAT-solvers aren’t in any way promising that there <em>exists</em> a solution to a given set of dependency constraints. Situations in which dependency constraints cannot be satisfied are created constantly, as we’ve already seen: if a lower-level component (<code>libbase</code>) makes a major-number bump, and some (but not all) of the libraries that depend on it (<code>libb</code> but not <code>liba</code>) have upgraded, we will encounter the diamond dependency issue.</p>
|
||||
|
||||
<p>SemVer solutions to dependency management are usually SAT-solver based. Version selection is a matter of running some algorithm to find an assignment of versions for dependencies in the network that satisfies all of the version-requirement constraints. When no such satisfying assignment of versions exists, we colloquially call it “dependency hell.”</p>
|
||||
|
||||
<p>We’ll look at some of the limitations of SemVer in more detail later in this chapter.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="bundled_distribution_models">
|
||||
<h2>Bundled Distribution Models</h2>
|
||||
|
||||
<p>As an industry, we’ve<a contenteditable="false" data-primary="dependency management" data-secondary="in theory" data-tertiary="bundled distribution models" data-type="indexterm" id="id-GDs0C0HbSOF2"> </a> seen the application <a contenteditable="false" data-primary="bundled distribution models" data-type="indexterm" id="id-DysyHrH5SVFX"> </a>of a powerful model of managing dependencies for decades now: an organization gathers up a collection of dependencies, finds a mutually compatible set of those, and releases the collection as a single unit. This is what happens, for instance, with Linux distributions—there’s no guarantee that the various pieces that are included in a distro are cut from the same point in time. In fact, it’s somewhat more likely that the lower-level dependencies are somewhat older than the higher-level ones, just to account for the time it takes to integrate them.</p>
|
||||
|
||||
<p>This “draw a bigger box around it all and release that collection” model introduces entirely new actors: the distributors. Although the maintainers of all of the individual dependencies may have little or no knowledge of the other dependencies, these higher-level <em>distributors</em> are involved in the process of finding, patching, and testing a mutually compatible set of versions to include. Distributors are the engineers responsible for proposing a set of versions to bundle together, testing those to find bugs in that dependency tree, and resolving any issues.</p>
|
||||
|
||||
<p>For an outside user, this works great, so long as you can properly rely on only one of these bundled distributions. This is effectively the same as changing a dependency network into a single aggregated dependency and giving that a version number. Rather than saying, “I depend on these 72 libraries at these versions,” this is, “I depend on Red Hat version N,” or, “I depend on the pieces in the NPM graph at time T.”</p>
|
||||
|
||||
<p>In the bundled distribution approach, version selection is handled by dedicated <span class="keep-together">distributors.</span></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="live_at_head">
|
||||
<h2>Live at Head</h2>
|
||||
|
||||
<p>The model that some of us at Google<sup><a data-type="noteref" id="ch01fn209-marker" href="ch21.html#ch01fn209">8</a></sup> have been pushing for is theoretically sound, but places new and costly burdens on participants in a dependency network.<a contenteditable="false" data-primary="dependency management" data-secondary="in theory" data-tertiary="Live at Head" data-type="indexterm" id="id-9xsjH2H7I1Fw"> </a> It’s wholly unlike the models that exist in OSS ecosystems today, and it is not clear how to get from here to there as an industry. Within the boundaries of an organization like Google, it is costly but effective, and we feel that it places most of the costs and incentives into the correct places.<a contenteditable="false" data-primary="Live at Head model" data-type="indexterm" id="id-rRs7cxHgIDFP"> </a> We call this model “Live at Head.” It is viewable as the dependency-management extension of trunk-based development: where trunk-based development talks about source control policies, we’re extending that model to apply to upstream dependencies as well.<a contenteditable="false" data-primary="trunk-based development" data-secondary="Live at Head model and" data-type="indexterm" id="id-vRsQtkHQIwFP"> </a></p>
|
||||
|
||||
<p>Live at Head presupposes that we can unpin dependencies, drop SemVer, and rely on dependency providers to test changes against the entire ecosystem before committing. Live at Head is an explicit attempt to take time and choice out of the issue of dependency management: always depend on the current version of everything, and never change anything in a way in which it would be difficult for your dependents to adapt. <a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="Live at Head dependency management and" data-type="indexterm" id="id-9xs3COc7I1Fw"> </a>A change that (unintentionally) alters API or behavior will in general be caught by CI on downstream dependencies, and thus should not be committed. For cases in which such a change <em>must</em> happen (i.e., for security reasons), such a break should be made only after either the downstream dependencies are updated or an automated tool is provided to perform the update in place. (This tooling is essential for closed-source downstream consumers: the goal is to allow any user the ability to update use of a changing API without expert knowledge of the use or the API. That property significantly mitigates the “mostly bystanders” costs of breaking changes.) This philosophical shift in responsibility in the open source ecosystem is difficult to motivate initially: putting the burden on an API provider to test against and change all of its downstream customers is a significant revision to the responsibilities of an API <span class="keep-together">provider.</span></p>
|
||||
|
||||
<p>Changes in a Live at Head model are not reduced to a SemVer “I think this is safe or not.” Instead, tests and CI systems are used to test against visible dependents to determine experimentally how safe a change is. So, for a change that alters only efficiency or implementation details, all of the visible affected tests might likely pass, which demonstrates that there are no obvious ways for that change to impact users—it’s safe to commit. A change that modifies more obviously observable parts of an API (syntactically or semantically) will often yield hundreds or even thousands of test failures. It’s then up to the author of that proposed change to determine whether the work involved to resolve those failures is worth the resulting value of committing the change. Done well, that author will work with all of their dependents to resolve the test failures ahead of time (i.e., unwinding brittle assumptions in the tests) and might potentially create a tool to perform as much of the necessary refactoring as possible.</p>
|
||||
|
||||
<p>The incentive structures and technological assumptions here are materially different than other scenarios: we assume that there exist unit tests and CI, we assume that API providers will be bound by whether downstream dependencies will be broken, and we assume that API consumers are keeping their tests passing and relying on their dependency in supported ways. This works significantly better in an open source ecosystem (in which fixes can be distributed ahead of time) than it does in the face of hidden/closed-source dependencies. API providers are incentivized when making changes to do so in a way that can be smoothly migrated to. API consumers are incentivized to keep their tests working so as not to be labeled as a low-signal test and potentially skipped, reducing the protection provided by that test.</p>
|
||||
|
||||
<p>In the Live at Head approach, version selection is handled by asking “What is the most recent stable version of everything?” If providers have made changes responsibly, it will all work together smoothly.<a contenteditable="false" data-primary="dependency management" data-secondary="in theory" data-startref="ix_depmgtthe" data-type="indexterm" id="id-nRsyCJuOIeFw"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="the_limitations_of_semver">
|
||||
<h1>The Limitations of SemVer</h1>
|
||||
|
||||
<p>The Live at Head approach may build on recognized practices for version control (trunk-based development) but is largely unproven at scale.<a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-type="indexterm" id="ix_depmgtSV"> </a><a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-type="indexterm" id="ix_semverlim"> </a> SemVer is the de facto standard for dependency management today, but as we’ve suggested, it is not without its limitations. Because it is such a popular approach, it is worth looking at it in more detail and highlighting what we believe to be its potential pitfalls.<a contenteditable="false" data-primary="SemVer" data-see="semantic versioning" data-type="indexterm" id="id-lRsBcNHwir"> </a></p>
|
||||
|
||||
<p>There’s a lot to unpack in the SemVer definition of what a dotted-triple version number really means. Is this a promise? Or is the version number chosen for a release an estimate? That is, when the maintainers of <code>libbase</code> cut a new release and choose whether this is a major, minor, or patch release, what are they saying? Is it provable that an upgrade from 1.1.4 to 1.2.0 is safe and easy, because there were only API additions and bug fixes? Of course not. There’s a host of things that ill-behaved users of <code>libbase</code> could have done that could cause build breaks or behavioral changes in the face of a “simple” API addition.<sup><a data-type="noteref" id="ch01fn210-marker" href="ch21.html#ch01fn210">9</a></sup> Fundamentally, you can’t <em>prove</em> anything about compatibility when only considering the source API; you have to know <em>with which</em> things you are asking about compatibility.</p>
|
||||
|
||||
<p>However, this idea of “estimating” compatibility begins to weaken when we talk about networks of dependencies and SAT-solvers applied to those networks. The fundamental problem in this formulation is the difference between node values in traditional SAT and version values in a SemVer dependency graph. A node in a three-SAT graph <em>is</em> either True or False. A version value (1.1.14) in a dependency graph is provided by the maintainer as an <em>estimate</em> of how compatible the new version is, given code that used the previous version. We’re building all of our version-satisfaction logic on top of a shaky foundation, treating estimates and self-attestation as absolute. As we’ll see, even if that works OK in limited cases, in the aggregate, it doesn’t necessarily have enough fidelity to underpin a healthy ecosystem.</p>
|
||||
|
||||
<p>If we acknowledge that SemVer is a lossy estimate and represents only a subset of the possible scope of changes, we can begin to see it as a blunt instrument. In theory, it works fine as a shorthand. In practice, especially when we build SAT-solvers on top of it, SemVer can (and does) fail us by both overconstraining and underprotecting us.</p>
|
||||
|
||||
<section data-type="sect2" id="semver_might_overconstrain">
|
||||
<h2>SemVer Might Overconstrain</h2>
|
||||
|
||||
<p>Consider what happens when <code>libbase</code> is recognized to be more than a single monolith: there are almost always independent interfaces within a library.<a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-tertiary="overconstrains" data-type="indexterm" id="id-DysyHrHOuniX"> </a> Even if there are only two functions, we can see situations in which SemVer overconstrains us.<a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-tertiary="overconstrains" data-type="indexterm" id="id-9xspc2H9ukiw"> </a> Imagine that <code>libbase</code> is indeed composed of only two functions, Foo and Bar. Our mid-level dependencies <code>liba</code> and <code>libb</code> use only Foo. If the maintainer of <code>libbase</code> makes a breaking change to Bar, it is incumbent on them to bump the major version of <code>libbase</code> in a SemVer world. <code>liba</code> and <code>libb</code> are known to depend on <code>libbase</code> 1.x—SemVer dependency solvers won’t accept a 2.x version of that dependency. However, in reality these libraries would work together perfectly: only Bar changed, and that was unused. The compression inherent in “I made a breaking change; I must bump the major version number” is lossy when it doesn’t apply at the granularity of an individual atomic API unit. Although some dependencies might be fine grained enough for that to be accurate,<sup><a data-type="noteref" id="ch01fn211-marker" href="ch21.html#ch01fn211">10</a></sup> that is not the norm for a SemVer ecosystem.</p>
|
||||
|
||||
<p>If SemVer overconstrains, either because of an unnecessarily severe version bump or insufficiently fine-grained application of SemVer numbers, automated package managers and SAT-solvers will report that your dependencies cannot be updated or installed, even if everything would work together flawlessly by ignoring the SemVer checks. Anyone who has ever been exposed to dependency hell during an upgrade might find this particularly infuriating: some large fraction of that effort was a complete waste of time.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="semver_might_overpromise">
|
||||
<h2>SemVer Might Overpromise</h2>
|
||||
|
||||
<p>On the flip side, the application of SemVer makes the <a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-tertiary="overpromising compatibility" data-type="indexterm" id="id-DysmCrH5SniX"> </a>explicit assumption<a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-tertiary="overpromising compatibility" data-type="indexterm" id="id-9xsjH2HmSkiw"> </a> that an API provider’s estimate of compatibility can be fully predictive and that changes fall into three buckets: breaking (by modification or removal), strictly additive, or non-API-impacting. If SemVer is a perfectly faithful representation of the risk of a change by classifying syntactic and semantic changes, how do we characterize a change that adds a one-millisecond delay to a time-sensitive API? Or, more plausibly: how do we characterize a change that alters the format of our logging output? Or that alters the order that we import external dependencies? Or that alters the order that results are returned in an “unordered” stream? Is it reasonable to assume that those changes are “safe” merely because those aren’t part of the syntax or contract of the API in question? What if the documentation said “This may change in the future”? Or the API was named “ForInternalUseByLibBaseOnlyDoNotTouchThisIReallyMeanIt?”<sup><a data-type="noteref" id="ch01fn212-marker" href="ch21.html#ch01fn212">11</a></sup></p>
|
||||
|
||||
<p>The idea that SemVer patch versions, which in theory are only changing implementation details, are “safe” changes absolutely runs afoul of Google’s experience with Hyrum’s Law—“With a sufficient number of users, every observable behavior of your system will be depended upon by someone.” Changing the order that dependencies are imported, or changing the output order for an “unordered” producer will, at scale, invariably break assumptions that some consumer was (perhaps incorrectly) relying upon. The very term “breaking change” is misleading: there are changes that are theoretically breaking but safe in practice (removing an unused API). There are also changes that are theoretically safe but break client code in practice (any of our earlier Hyrum’s Law examples). We can see this in any SemVer/dependency-management system for which the version-number requirement system allows for restrictions on the patch number: if you can say <code>liba requires libbase >1.1.14</code> rather than <code>liba requires libbase 1.1</code>, that’s clearly an admission that there are observable differences in patch versions.</p>
|
||||
|
||||
<p><em>A change in isolation isn’t breaking or nonbreaking—</em>that statement can be evaluated only in the context of how it is being used. There is no absolute truth in the notion of “This is a breaking change”; a change can been seen to be breaking for only a (known or unknown) set of existing users and use cases. The reality of how we evaluate a change inherently relies upon information that isn’t present in the SemVer formulation of dependency management: how are downstream users consuming this dependency?</p>
|
||||
|
||||
<p>Because of this, a SemVer constraint solver might report that your dependencies work together when they don’t, either because a bump was applied incorrectly or because something in your dependency network had a Hyrum’s Law dependence on something that wasn’t considered part of the observable API surface. In these cases, you might have either build errors or runtime bugs, with no theoretical upper bound on their severity.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="motivations-id00080">
|
||||
<h2>Motivations</h2>
|
||||
|
||||
<p>There is a further argument that SemVer doesn’t always incentivize the creation of stable code.<a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-tertiary="motivations" data-type="indexterm" id="id-9xs3C2H7Ikiw"> </a><a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-tertiary="motivations" data-type="indexterm" id="id-rRspHxHgIgiP"> </a> For a maintainer of an arbitrary dependency, there is variable systemic incentive to <em>not</em> make breaking changes and bump major versions. Some projects care deeply about compatibility and will go to great lengths to avoid a major-version bump. Others are more aggressive, even intentionally bumping major versions on a fixed schedule. The trouble is that most users of any given dependency are indirect users—they wouldn’t have any significant reasons to be aware of an upcoming change. Even most direct users don’t subscribe to mailing lists or other release <span class="keep-together">notifications.</span></p>
|
||||
|
||||
<p>All of which combines to suggest that no matter how many users will be inconvenienced by adoption of an incompatible change to a popular API, the maintainers bear a tiny fraction of the cost of the resulting version bump. For maintainers who are also users, there can also be an incentive <em>toward</em> breaking: it’s always easier to design a better interface in the absence of legacy constraints. This is part of why we think projects should publish clear statements of intent with respect to compatibility, usage, and breaking changes. Even if those are best-effort, nonbinding, or ignored by many users, it still gives us a starting point to reason about whether a breaking change/major version bump is “worth it,” without bringing in these conflicting incentive structures.</p>
|
||||
|
||||
<p><a href="https://research.swtch.com/vgo-import">Go</a> and <a href="https://oreil.ly/Iq9f_">Clojure</a> both handle this nicely: in their standard package management ecosystems, the equivalent of a major-version bump is expected to be a fully new package. <a contenteditable="false" data-primary="Clojure package management ecosystem" data-type="indexterm" id="id-PqsAc7t8IdiV"> </a><a contenteditable="false" data-primary="Go programming language" data-secondary="standard package management ecosystem" data-type="indexterm" id="id-8XsptVtwIgiV"> </a>This has a certain sense of justice to it: if you’re willing to break backward compatibility for your package, why do we pretend this is the same set of APIs? Repackaging and renaming everything seems like a reasonable amount of work to expect from a provider in exchange for them taking the nuclear option and throwing away backward <span class="keep-together">compatibility.</span></p>
|
||||
|
||||
<p>Finally, there’s the human fallibility of the process. In general, SemVer version bumps should be applied to <em>semantic</em> changes just as much as syntactic ones; changing the behavior of an API matters just as much as changing its structure. Although it’s plausible that tooling could be developed to evaluate whether any particular release involves syntactic changes to a set of public APIs, discerning whether there are meaningful and intentional semantic changes is computationally infeasible.<sup><a data-type="noteref" id="ch01fn213-marker" href="ch21.html#ch01fn213">12</a></sup> Practically speaking, even the potential tools for identifying syntactic changes are limited. In almost all cases, it is up to the human judgement of the API provider whether to bump major, minor, or patch versions for any given change. If you’re relying on only a handful of professionally maintained dependencies, your expected exposure to this form of SemVer clerical error is probably low.<sup><a data-type="noteref" id="ch01fn214-marker" href="ch21.html#ch01fn214">13</a></sup> If you have a network of thousands of dependencies underneath your product, you should be prepared for some amount of chaos simply from human error.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="minimum_version_selection">
|
||||
<h2>Minimum Version Selection</h2>
|
||||
|
||||
<p>In 2018, as part of an essay series on building a package management system for the Go programming<a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-tertiary="Minimum Version Selection" data-type="indexterm" id="id-rRsnCxH9fgiP"> </a> language, Google’s own Russ Cox described an interesting variation <a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-tertiary="Minimum Version Selection" data-type="indexterm" id="id-vRsrHkHWfXiP"> </a>on SemVer dependency<a contenteditable="false" data-primary="Minimum Version Selection (MVS)" data-type="indexterm" id="id-nRsqcaHlf6iw"> </a> management: <a href="https://research.swtch.com/vgo-mvs">Minimum Version Selection</a> (MVS). When updating the version for some node in the dependency network, it is possible that its dependencies need to be updated to newer versions to satisfy an updated SemVer requirement—this can then trigger further changes transitively. In most constraint-satisfaction/version-selection formulations, the newest possible versions of those downstream dependencies are chosen: after all, you’ll need to update to those new versions eventually, right?</p>
|
||||
|
||||
<p>MVS makes the opposite choice: when <code>liba</code>’s specification requires <code>libbase</code> ≥1.7, we’ll try <code>libbase</code> 1.7 directly, even if a 1.8 is available. This “produces high-fidelity builds in which the dependencies a user builds are as close as possible to the ones the author developed against.”<sup><a data-type="noteref" id="ch01fn-marker" href="ch21.html#ch01fn">14</a></sup> There is a critically important truth revealed in this point: when <code>liba</code> says it requires <code>libbase</code> ≥1.7, that almost certainly means that the developer of <code>liba</code> had <code>libbase</code> 1.7 installed. Assuming that the maintainer performed even basic testing before publishing,<sup><a data-type="noteref" id="ch01fn215-marker" href="ch21.html#ch01fn215">15</a></sup> we have at least anecdotal evidence of interoperability testing for that version of <code>liba</code> and version 1.7 of <code>libbase</code>. It’s not CI or proof that everything has been unit tested together, but it’s something.</p>
|
||||
|
||||
<p>Absent accurate input constraints derived from 100% accurate prediction of the future, it’s best to make the smallest jump forward possible. Just as it’s usually safer to commit an hour of work to your project instead of dumping a year of work all at once, smaller steps forward in your dependency updates are safer. MVS just walks forward each affected dependency only as far as is required and says, "OK, I’ve walked forward far enough to get what you asked for (and not farther). Why don’t you run some tests and see if things are good?"</p>
|
||||
|
||||
<p>Inherent in the idea of MVS is the admission that a newer version might introduce an incompatibility in practice, even if the version numbers <em>in theory</em> say otherwise. This is recognizing the core concern with SemVer, using MVS or not: there is some loss of fidelity in this compression of software changes into version numbers. MVS gives some additional practical fidelity, trying to produce selected versions closest to those that have presumably been tested together. This might be enough of a boost to make a larger set of dependency networks function properly. Unfortunately, we haven’t found a good way to empirically verify that idea. The jury is still out on whether MVS makes SemVer “good enough” without fixing the basic theoretical and incentive problems with the approach, but we still believe it represents a manifest improvement in the application of SemVer constraints as they are used today.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="socomma_does_semver_workquestion_mark">
|
||||
<h2>So, Does SemVer Work?</h2>
|
||||
|
||||
<p>SemVer works well enough in limited scales.<a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-tertiary="questioning if it works" data-type="indexterm" id="id-vRsnCkHnhXiP"> </a> It’s deeply important, however, to recognize what it is actually saying and what it cannot.<a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-tertiary="questioning whether it works" data-type="indexterm" id="id-nRsxHaHjh6iw"> </a> SemVer will work fine provided that:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Your dependency providers are accurate and responsible (to avoid human error in SemVer bumps)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Your dependencies are fine grained (to avoid falsely overconstraining when unused/unrelated APIs in your dependencies are updated, and the associated risk of unsatisfiable SemVer requirements)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>All usage of all APIs is within the expected usage (to avoid being broken in surprising fashion by an assumed-compatible change, either directly or in code you depend upon transitively)</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>When you have only a few carefully chosen and well-maintained dependencies in your dependency graph, SemVer can be a perfectly suitable solution.</p>
|
||||
|
||||
<p>However, our experience at Google suggests that it is unlikely that you can have <em>any</em> of those three properties at scale and keep them working constantly over time. Scale tends to be the thing that shows the weaknesses in SemVer. As your dependency network scales up, both in the size of each dependency and the number of dependencies (as well as any monorepo effects from having multiple projects depending on the same network of external dependencies), the compounded fidelity loss in SemVer will begin to dominate. These failures manifest as both false positives (practically incompatible versions that theoretically should have worked) and false negatives (compatible versions disallowed by SAT-solvers and resulting dependency hell).<a contenteditable="false" data-primary="dependency management" data-secondary="limitations of semantic versioning" data-startref="ix_depmgtSV" data-type="indexterm" id="id-zRsKHqU1hWiq"> </a><a contenteditable="false" data-primary="semantic versioning" data-secondary="limitations of" data-startref="ix_semverlim" data-type="indexterm" id="id-aRsxcnUPhbim"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="dependency_management_with_infinite_res">
|
||||
<h1>Dependency Management with Infinite Resources</h1>
|
||||
|
||||
<p>Here’s a useful thought experiment when considering <a contenteditable="false" data-primary="dependency management" data-secondary="with infinite resources" data-type="indexterm" id="ix_depmgtIR"> </a>dependency-management solutions: what would dependency management look like if we all had access to infinite compute resources? That is, what’s the best we could hope for, if we aren’t resource constrained but are limited only by visibility and weak coordination among organizations? As we see it currently, the industry relies on SemVer for three reasons:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>It requires only local information (an API provider doesn’t <em>need</em> to know the particulars of downstream users)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It doesn’t assume the availability of tests (not ubiquitous in the industry yet, but definitely moving that way in the next decade), compute resources to run the tests, or CI systems to monitor the test results</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>It’s the existing practice</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>The “requirement” of local information isn’t really necessary, specifically because dependency networks tend to form in only two environments:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Within a single organization</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Within the OSS ecosystem, where source is visible even if the projects are not necessarily collaborating</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>In either of those cases, significant information about downstream usage is <em>available</em>, even if it isn’t being readily exposed or acted upon today. That is, part of SemVer’s effective dominance is that we’re choosing to ignore information that is theoretically available to us. If we had access to more compute resources and that dependency information was surfaced readily, the community would probably find a use for it.</p>
|
||||
|
||||
<p>Although an OSS package can have innumerable closed-source dependents, the common case is that popular OSS packages are popular both publicly and privately. Dependency networks don’t (can’t) aggressively mix public and private dependencies: generally, there is a public subset and a separate private subgraph.<sup><a data-type="noteref" id="ch01fn216-marker" href="ch21.html#ch01fn216">16</a></sup></p>
|
||||
|
||||
<p>Next, we must remember the <em>intent</em> of SemVer: “In my estimation, this change will be easy (or not) to adopt.” Is there a better way of conveying that information? Yes, in the form of practical experience demonstrating that the change is easy to adopt. How do we get such experience? If most (or at least a representative sample) of our dependencies are publicly visible, we run the tests for those dependencies with every proposed change. With a sufficiently large number of such tests, we have at least a statistical argument that the change is safe in the practical Hyrum’s-Law sense. The tests still pass, the change is good—it doesn’t matter whether this is API impacting, bug fixing, or anything in between; there’s no need to classify or estimate.</p>
|
||||
|
||||
<p>Imagine, then, that the OSS ecosystem moved to a world in which changes were accompanied with <em>evidence</em> of whether they are safe. If we pull compute costs out of the equation, the <em>truth</em><sup><a data-type="noteref" id="ch01fn217-marker" href="ch21.html#ch01fn217">17</a></sup> of “how safe is this” comes from running affected tests in downstream dependencies.</p>
|
||||
|
||||
<p>Even without formal CI applied to the entire OSS ecosystem, we can of course use such a dependency graph and other secondary signals to do a more targeted presubmit analysis. Prioritize tests in dependencies that are heavily used. Prioritize tests in dependencies that are well maintained. Prioritize tests in dependencies that have a history of providing good signal and high-quality test results. Beyond just prioritizing tests based on the projects that are likely to give us the most information about experimental change quality, we might be able to use information from the change authors to help estimate risk and select an appropriate testing strategy. Running “all affected” tests is theoretically necessary if the goal is “nothing that anyone relies upon is changed in a breaking fashion.” If we consider the goal to be more in line with “risk mitigation,” a statistical argument becomes a more appealing (and cost-effective) approach.</p>
|
||||
|
||||
<p>In <a data-type="xref" href="ch12.html#unit_testing">Unit Testing</a>, we identified four varieties of change, ranging from pure refactorings to modification of existing functionality. Given a CI-based model for dependency updating, we can begin to map those varieties of change onto a SemVer-like model for which the author of a change estimates the risk and applies an appropriate level of testing. For example, a pure refactoring change that modifies only internal APIs might be assumed to be low risk and justify running tests only in our own project and perhaps a sampling of important direct dependents. On the other hand, a change that removes a deprecated interface or changes observable behaviors might require as much testing as we can afford.</p>
|
||||
|
||||
<p>What changes would we need to the OSS ecosystem to apply such a model? Unfortunately, quite a few:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>All dependencies must provide unit tests. Although we are moving inexorably toward a world in which unit testing is both well accepted and ubiquitous, we are not there yet.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The dependency network for the majority of the OSS ecosystem is understood. It is unclear that any mechanism is currently available to perform graph algorithms on that network—the information is <em>public</em> and <em>available,</em> but not actually generally indexed or usable. Many package-management systems/dependency-management ecosystems allow you to see the dependencies of a project, but not the reverse edges, the dependents.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>The availability of compute resources for executing CI is still very limited. Most developers don’t have access to build-and-test compute clusters.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Dependencies are often expressed in a pinned fashion. As a maintainer of <span class="keep-together"><code>libbase</code></span>, we can’t experimentally run a change through the tests for <code>liba</code> and <code>libb</code> if those dependencies are explicitly depending on a specific pinned version of <span class="keep-together"><code>libbase</code></span>.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>We might want to explicitly include history and reputation in CI calculations. A proposed change that breaks a project that has a longstanding history of tests continuing to pass gives us a different form of evidence than a breakage in a project that was only added recently and has a history of breaking for unrelated reasons.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Inherent in this is a scale question: against which versions of each dependency in the network do you test presubmit changes? If we test against the full combination of all historical versions, we’re going to burn a truly staggering amount of compute <span class="keep-together">resources,</span> even by Google standards. The most obvious simplification to this version-selection strategy would seem to be “test the current stable version” (trunk-based development is the goal, after all). And thus, the model of dependency management given infinite resources is effectively that of the Live at Head model. The outstanding question is whether that model can apply effectively with a more practical resource availability and whether API providers are willing to take greater responsibility for testing the practical safety of their changes. Recognizing where our existing low-cost facilities are an oversimplification of the difficult-to-compute truth that we are looking for is still a useful exercise.</p>
|
||||
|
||||
<section data-type="sect2" id="exporting_dependencies">
|
||||
<h2>Exporting Dependencies</h2>
|
||||
|
||||
<p>So far, we’ve only talked about taking on dependencies; that is, depending on software that other people have written.<a contenteditable="false" data-primary="dependency management" data-secondary="with infinite resources" data-tertiary="exporting dependencies" data-type="indexterm" id="ix_depmgtIRexp"> </a> It’s also worth thinking about how we build software that can be <em>used</em> as a dependency. This goes beyond just the mechanics of packaging software and uploading it to a repository: we need to think about the benefits, costs, and risks of providing software, for both us and our potential dependents.</p>
|
||||
|
||||
<p>There are two major ways that an innocuous and hopefully charitable act like “open sourcing a library” can become a possible loss for an organization. First, it can eventually become a drag on the reputation of your organization if implemented poorly or not maintained properly. As the Apache community saying goes, we ought to prioritize “community over code.” If you provide great code but are a poor community member, that can still be harmful to your organization and the broader community. Second, a well-intentioned release can become a tax on engineering efficiency if you can’t keep things in sync. Given time, all forks will become expensive.</p>
|
||||
|
||||
<section data-type="sect3" id="example_open_sourcing_gflags">
|
||||
<h3>Example: open sourcing gflags</h3>
|
||||
|
||||
<p>For reputation loss, consider the case of something like Google’s experience circa 2006 open sourcing our C++ command-line flag libraries.<a contenteditable="false" data-primary="C++" data-secondary="open sourcing command-line flag libraries" data-type="indexterm" id="id-koskCeHGtmCnCX"> </a><a contenteditable="false" data-primary="open sourcing gflags" data-type="indexterm" id="id-NzsrHQHRtyCzCj"> </a> Surely giving back to the open source community is a purely good act that won’t come back to haunt us, right? Sadly, no. A host of reasons conspired to make this good act into something that certainly hurt our reputation and possibly damaged the OSS community as well:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>At the time, we didn’t have the ability to execute large-scale refactorings, so everything that used that library internally had to remain exactly the same—we couldn’t move the code to a new location in the codebase.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>We segregated our repository into “code developed in-house” (which can be copied freely if it needs to be forked, so long as it is renamed properly) and “code that may have legal/licensing concerns” (which can have more nuanced usage requirements).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>If an OSS project accepts code from outside developers, that’s generally a legal issue—the project originator doesn’t <em>own</em> that contribution, they only have rights to it.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>As a result, the gflags project was doomed to be either a “throw over the wall” release or a disconnected fork. Patches contributed to the project couldn’t be reincorporated into the original source inside of Google, and we couldn’t move the project within our monorepo because we hadn’t yet mastered that form of refactoring, nor could we make everything internally depend on the OSS version.</p>
|
||||
|
||||
<p>Further, like most organizations, our priorities have shifted and changed over time. Around the time of the original release of that flags library, we were interested in products outside of our traditional space (web applications, search), including things like Google Earth, which had a much more traditional distribution mechanism: precompiled binaries for a variety of platforms. In the late 2000s, it was unusual but not unheard of for a library in our monorepo, especially something low-level like flags, to be used on a variety of platforms. As time went on and Google grew, our focus narrowed to the point that it was extremely rare for any libraries to be built with anything other than our in-house configured toolchain, then deployed to our production fleet. The “portability” concerns for properly supporting an OSS project like <span>flags</span> were nearly impossible to maintain: our internal tools simply didn’t have support for those platforms, and our average developer didn’t have to interact with external tools. It was a constant battle to try to maintain portability.</p>
|
||||
|
||||
<p>As the original authors and OSS supporters moved on to new companies or new teams, it eventually became clear that nobody internally was really supporting our OSS <span>flags</span> project—nobody could tie that support back to the priorities for any particular team. Given that it was no specific team’s job, and nobody could say why it was important, it isn’t surprising that we basically let that project rot externally.<sup><a data-type="noteref" id="ch01fn218-marker" href="ch21.html#ch01fn218">18</a></sup> The internal and external versions diverged slowly over time, and eventually some external developers took the external version and forked it, giving it some proper <span class="keep-together">attention.</span></p>
|
||||
|
||||
<p>Other than the initial “Oh look, Google contributed something to the open source world,” no part of that made us look good, and yet every little piece of it made sense given the priorities of our engineering organization. Those of us who have been close to it have learned, “Don’t release things without a plan (and a mandate) to support it for the long term.” Whether the whole of Google engineering has learned that or not remains to be seen. It’s a big organization.</p>
|
||||
|
||||
<p class="pagebreak-before">Above and beyond the nebulous “We look bad,” there are also parts of this story that illustrate how we can be subject to technical problems stemming from poorly released/poorly maintained external dependencies. Although the <span>flags</span> library was shared but ignored, there were still some Google-backed open source projects, or projects that needed to be shareable outside of our monorepo ecosystem. Unsurprisingly, the authors of those other projects were able to identify<sup><a data-type="noteref" id="ch01fn219-marker" href="ch21.html#ch01fn219">19</a></sup> the common API subset between the internal and external forks of that library. Because that common subset stayed fairly stable between the two versions for a long period, it silently became “the way to do this” for the rare teams that had unusual portability requirements between roughly 2008 and 2017. Their code could build in both internal and external ecosystems, switching out forked versions of the <span>flags</span> library depending on environment.</p>
|
||||
|
||||
<p>Then, for unrelated reasons, C++ library teams began tweaking observable-but-not-documented pieces of the internal <span>flag</span> implementation. At that point, everyone who was depending on the stability and equivalence of an unsupported external fork started screaming that their builds and releases were suddenly broken. An optimization opportunity worth some thousands of aggregate CPUs across Google’s fleet was significantly delayed, not because it was difficult to update the API that 250 million lines of code depended upon, but because a tiny handful of projects were relying on unpromised and unexpected things. Once again, Hyrum’s Law affects software changes, in this case even for forked APIs maintained by separate organizations.</p>
|
||||
|
||||
<aside data-type="sidebar" id="example_appengine">
|
||||
<h5>Case Study: AppEngine</h5>
|
||||
|
||||
<p>A more serious example of exposing ourselves to greater risk of unexpected technical dependency comes from publishing Google’s AppEngine service.<a contenteditable="false" data-primary="AppEngine example, exporting resources" data-type="indexterm" id="id-gRs0CvHkhNteCaCo"> </a> This service allows users to write their applications on top of an existing framework in one of several popular programming languages. So long as the application is written with a proper storage/state management model, the AppEngine service allows those applications to scale up to huge usage levels: backing storage and frontend management are managed and cloned on demand by Google’s production infrastructure.</p>
|
||||
|
||||
<p>Originally, AppEngine’s support for Python was a 32-bit build running with an older version of the Python interpreter. The AppEngine system itself was (of course) implemented in our monorepo and built with the rest of our common tools, in Python and in C++ for backend support. In 2014 we started the process of doing a major update to the Python runtime alongside our C++ compiler and standard library installations, with the result being that we effectively tied “code that builds with the current C++ compiler” to “code that uses the updated Python version”—a project that upgraded one of those dependencies inherently upgraded the other at the same time. For most projects, this was a non-issue. For a few projects, because of edge cases and Hyrum’s Law, our language platform experts wound up doing some investigation and debugging to unblock the transition. In a terrifying instance of Hyrum’s Law running into business practicalities, AppEngine discovered that many of its users, our paying customers, couldn’t (or wouldn’t) update: either they didn’t want to take the change to the newer Python version, or they couldn’t afford the resource consumption changes involved in moving from 32-bit to 64-bit Python. Because there were some customers that were paying a significant amount of money for AppEngine services, AppEngine was able to make a strong business case that a forced switch to the new language and compiler versions must be delayed. This inherently meant that every piece of C++ code in the transitive closure of dependencies from AppEngine had to be compatible with the older compiler and standard library versions: any bug fixes or performance optimizations that could be made to that infrastructure had to be compatible across versions. That situation persisted for almost three years.</p>
|
||||
</aside>
|
||||
|
||||
<p>With enough users, any "observable" of your system will come to be depended upon by somebody. At Google, we constrain all of our internal users within the boundaries of our technical stack and ensure visibility into their usage with the monorepo and code indexing systems, so it is far easier to ensure that useful change remains possible. When we shift from source control to dependency management and lose visibility into how code is used or are subject to competing priorities from outside groups (especially ones that are paying you), it becomes much more difficult to make pure engineering trade-offs. Releasing APIs of any sort exposes you to the possibility of competing priorities and unforeseen constraints by outsiders. This isn’t to say that you shouldn’t release APIs; it serves only to provide the reminder: external users of an API cost a lot more to maintain than internal ones.</p>
|
||||
|
||||
<p>Sharing code with the outside world, either as an open source release or as a closed-source library release, is not a simple matter of charity (in the OSS case) or business opportunity (in the closed-source case). Dependent users that you cannot monitor, in different organizations, with different priorities, will eventually exert some form of Hyrum’s Law inertia on that code. Especially if you are working with long timescales, it is impossible to accurately predict the set of necessary or useful changes that could become valuable. When evaluating whether to release something, be aware of the long-term risks: externally shared dependencies are often much more expensive to modify over time.<a contenteditable="false" data-primary="dependency management" data-secondary="with infinite resources" data-startref="ix_depmgtIRexp" data-tertiary="exporting dependencies" data-type="indexterm" id="id-VLsXCdTAtACLC3"> </a><a contenteditable="false" data-primary="dependency management" data-secondary="with infinite resources" data-startref="ix_depmgtIR" data-type="indexterm" id="id-WMsbHeTBtKCJCa"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="conclusion-id00025">
|
||||
<h1 class="less_space">Conclusion</h1>
|
||||
|
||||
<p>Dependency management is inherently challenging—we’re looking for solutions to management of complex API surfaces and webs of dependencies, where the maintainers of those dependencies generally have little or no assumption of coordination. The de facto standard for managing a network of dependencies is semantic versioning, or SemVer, which provides a lossy summary of the perceived risk in adopting any particular change. SemVer presupposes that we can a priori predict the severity of a change, in the absence of knowledge of how the API in question is being consumed: Hyrum’s Law informs us otherwise. However, SemVer works well enough at small scale, and even better when we include the MVS approach. As the size of the dependency network grows, Hyrum’s Law issues and fidelity loss in SemVer make managing the selection of new versions increasingly difficult.</p>
|
||||
|
||||
<p>It is possible, however, that we move toward a world in which maintainer-provided estimates of compatibility (SemVer version numbers) are dropped in favor of experience-driven evidence: running the tests of affected downstream packages. If API providers take greater responsibility for testing against their users and clearly advertise what types of changes are expected, we have the possibility of higher-fidelity dependency networks at even larger scale.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00127">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Prefer source control problems to dependency management problems: if you can get more code from your organization to have better transparency and coordination, those are important simplifications.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Adding a dependency isn’t free for a software engineering project, and the complexity in establishing an “ongoing” trust relationship is challenging. Importing dependencies into your organization needs to be done carefully, with an understanding of the ongoing support costs.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>A dependency is a contract: there is a give and take, and both providers and consumers have some rights and responsibilities in that contract. Providers should be clear about what they are trying to promise over time.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>SemVer is a lossy-compression shorthand estimate for “How risky does a human think this change is?” SemVer with a SAT-solver in a package manager takes those estimates and escalates them to function as absolutes. This can result in either overconstraint (dependency hell) or underconstraint (versions that should work together that don’t).</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>By comparison, testing and CI provide actual evidence of whether a new set of versions work together.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Minimum-version update strategies in SemVer/package management are higher fidelity. This still relies on humans being able to assess incremental version risk accurately, but distinctly improves the chance that the link between API provider and consumer has been tested by an expert.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Unit testing, CI, and (cheap) compute resources have the potential to change our understanding and approach to dependency management. That phase-change requires a fundamental change in how the industry considers the problem of dependency management, and the responsibilities of providers and consumers both.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Providing a dependency isn’t free: “throw it over the wall and forget” can cost you reputation and become a challenge for compatibility. Supporting it with stability can limit your choices and pessimize internal usage. Supporting without stability can cost goodwill or expose you to risk of important external groups depending on something via Hyrum’s Law and messing up your “no stability” plan.<a contenteditable="false" data-primary="dependency management" data-startref="ix_depmgt" data-type="indexterm" id="id-nRsyC6COIkHNcD"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn201"><sup><a href="ch21.html#ch01fn201-marker">1</a></sup>This could be any of language version, version of a lower-level library, hardware version, operating system, compiler flag, compiler version, and so on.</p><p data-type="footnote" id="ch01fn202"><sup><a href="ch21.html#ch01fn202-marker">2</a></sup>For instance, security bugs, deprecations, being in the dependency set of a higher-level dependency that has a security bug, and so on.</p><p data-type="footnote" id="ch01fn203"><sup><a href="ch21.html#ch01fn203-marker">3</a></sup>This is called <em>shading</em> or <em>versioning</em>.</p><p data-type="footnote" id="ch01fn205"><sup><a href="ch21.html#ch01fn205-marker">4</a></sup>In many cases, there is significant overlap in those populations.</p><p data-type="footnote" id="ch01fn206"><sup><a href="ch21.html#ch01fn206-marker">5</a></sup>Common Vulnerabilities and Exposures</p><p data-type="footnote" id="ch01fn207"><sup><a href="ch21.html#ch01fn207-marker">6</a></sup>Strictly speaking, SemVer refers only to the emerging practice of applying semantics to major/minor/patch version numbers, not the application of compatible version requirements among dependencies numbered in that fashion. There are numerous minor variations on those requirements among different ecosystems, but in general, the version-number-plus-constraints system described here as SemVer is representative of the practice at large.</p><p data-type="footnote" id="ch01fn208"><sup><a href="ch21.html#ch01fn208-marker">7</a></sup>In fact, it has been proven that SemVer constraints applied to a dependency network are <a href="https://research.swtch.com/version-sat">NP-complete</a>.</p><p data-type="footnote" id="ch01fn209"><sup><a href="ch21.html#ch01fn209-marker">8</a></sup>Especially the author and others in the Google C++ community.</p><p data-type="footnote" id="ch01fn210"><sup><a href="ch21.html#ch01fn210-marker">9</a></sup>For example: a poorly implemented polyfill that adds the new <code>libbase</code> API ahead of time, causing a conflicting definition. Or, use of language reflection APIs to depend upon the precise number of APIs provided by <span class="keep-together"><code>libbase</code></span>, introducing crashes if that number changes. These shouldn’t happen and are certainly rare even if they do happen by accident—the point is that the <code>libbase</code> providers can’t <em>prove</em> compatibility.</p><p data-type="footnote" id="ch01fn211"><sup><a href="ch21.html#ch01fn211-marker">10</a></sup>The Node ecosystem has noteworthy examples of dependencies that provide exactly one API.</p><p data-type="footnote" id="ch01fn212"><sup><a href="ch21.html#ch01fn212-marker">11</a></sup>It’s worth noting: in our experience, naming like this doesn’t fully solve the problem of users reaching in to access private APIs. Prefer languages that have good control over public/private access to APIs of all forms.</p><p data-type="footnote" id="ch01fn213"><sup><a href="ch21.html#ch01fn213-marker">12</a></sup>In a world of ubiquitous unit tests, we could identify changes that required a change in test behavior, but it would still be difficult to algorithmically separate “This is a behavioral change” from “This is a bug fix to a behavior that wasn’t intended/promised.”</p><p data-type="footnote" id="ch01fn214"><sup><a href="ch21.html#ch01fn214-marker">13</a></sup>So, when it matters in the long term, choose well-maintained dependencies.</p><p data-type="footnote" id="ch01fn"><sup><a href="ch21.html#ch01fn-marker">14</a></sup>Russ Cox, "Minimal Version Selection," February 21, 2018, <a href="https://research.swtch.com/vgo-mvs">https://research.swtch.com/vgo-mvs</a>.</p><p data-type="footnote" id="ch01fn215"><sup><a href="ch21.html#ch01fn215-marker">15</a></sup>If that assumption doesn’t hold, you should really stop depending on <code>liba</code>.</p><p data-type="footnote" id="ch01fn216"><sup><a href="ch21.html#ch01fn216-marker">16</a></sup>Because the public OSS dependency network can’t generally depend on a bunch of private nodes, graphics firmware notwithstanding.</p><p data-type="footnote" id="ch01fn217"><sup><a href="ch21.html#ch01fn217-marker">17</a></sup>Or something very close to it.</p><p data-type="footnote" id="ch01fn218"><sup><a href="ch21.html#ch01fn218-marker">18</a></sup>That isn’t to say it’s <em>right</em> or <em>wise</em>, just that as an organization we let some things slip through the cracks.</p><p data-type="footnote" id="ch01fn219"><sup><a href="ch21.html#ch01fn219-marker">19</a></sup>Often through trial and error.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
414
clones/abseil.io/resources/swe-book/html/ch22.html
Normal file
|
@ -0,0 +1,414 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="large-scale_changes">
|
||||
<h1>Large-Scale Changes</h1>
|
||||
|
||||
<p class="byline">Written by Hyrum Wright</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>Think for a moment about your own codebase.<a contenteditable="false" data-primary="large-scale changes" data-type="indexterm" id="ix_LSC"> </a> How many files can you reliably update in a single, simultaneous commit? What are the factors that constrain that number? Have you ever tried committing a change that large? Would you be able to do it in a reasonable amount of time in an emergency? How does your largest commit size compare to the actual size of your codebase? How would you test such a change? How many people would need to review the change before it is committed? Would you be able to roll back that change if it did get committed? The answers to these questions might surprise you (both what you <em>think</em> the answers are and what they actually turn out to be for your organization).</p>
|
||||
|
||||
<p>At Google, we’ve long ago abandoned the idea of making sweeping changes across our codebase in these types of large atomic changes. Our observation has been that, as a codebase and the number of engineers working in it grows, the largest atomic change possible counterintuitively <em>decreases—</em>running all affected presubmit checks and tests becomes difficult, to say nothing of even ensuring that every file in the change is up to date before submission. As it has become more difficult to make sweeping changes to our codebase, given our general desire to be able to continually improve underlying infrastructure, we’ve had to develop new ways of reasoning about large-scale changes and how to implement them.</p>
|
||||
|
||||
<p>In this chapter, we’ll talk about the techniques, both social and technical, that enable us to keep the large Google codebase flexible and responsive to changes in underlying infrastructure. We’ll also provide some real-life examples of how and where we’ve used these approaches. Although your codebase might not look like Google’s, understanding these principles and adapting them locally will help your development organization scale while still being able to make broad changes across your codebase.</p>
|
||||
|
||||
<section data-type="sect1" id="what_is_a_large-scale_changequestion_ma">
|
||||
<h1>What Is a Large-Scale Change?</h1>
|
||||
|
||||
<p>Before going much further, we should dig into what qualifies as a large-scale change (LSC). <a contenteditable="false" data-primary="large-scale changes" data-secondary="qualities of" data-type="indexterm" id="id-aguVHXskS6"> </a>In our experience, an LSC is any set of changes that are logically related but cannot practically be submitted as a single atomic unit.<a contenteditable="false" data-primary="LSCs" data-see="large-scale changes" data-type="indexterm" id="id-J6u6sDs8SO"> </a> This might be because it touches so many files that the underlying tooling can’t commit them all at once, or it might be because the change is so large that it would always have merge conflicts. In many cases, an LSC is dictated by your repository topology: if your organization uses a collection of distributed or federated repositories,<sup><a data-type="noteref" id="ch01fn220-marker" href="ch22.html#ch01fn220">1</a></sup> making atomic changes across them might not even be technically possible.<sup><a data-type="noteref" id="ch01fn221-marker" href="ch22.html#ch01fn221">2</a></sup> We’ll look at potential barriers to atomic changes in more detail later in this chapter.<a contenteditable="false" data-primary="changes to code" data-secondary="large-scale" data-see="large-scale changes" data-type="indexterm" id="id-wgu4cMsxSn"> </a></p>
|
||||
|
||||
<p>LSCs at Google are almost always generated using automated tooling. Reasons for making an LSC vary, but the changes themselves generally fall into a few basic <span class="keep-together">categories:</span></p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Cleaning up common antipatterns using codebase-wide analysis tooling</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Replacing uses of deprecated library features</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Enabling low-level infrastructure improvements, such as compiler upgrades</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Moving users from an old system to a newer one<sup><a data-type="noteref" id="ch01fn222-marker" href="ch22.html#ch01fn222">3</a></sup></p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>The number of engineers working on these specific tasks in a given organization might be low, but it is useful for their customers to have insight into the LSC tools and process. By their very nature, LSCs will affect a large number of customers, and the LSC tools easily scale down to teams making only a few dozen related changes.</p>
|
||||
|
||||
<p>There can be broader motivating causes behind specific LSCs. For example, a new language standard might introduce a more efficient idiom for accomplishing a given task, an internal library interface might change, or a new compiler release might require fixing existing problems that would be flagged as errors by the new release. The majority of LSCs across Google actually have near-zero functional impact: they tend to be widespread textual updates for clarity, optimization, or future compatibility. But LSCs are not theoretically limited to this behavior-preserving/refactoring class of change.</p>
|
||||
|
||||
<p>In all of these cases, on a codebase the size of Google’s, infrastructure teams might routinely need to change hundreds of thousands of individual references to the old pattern or symbol. In the largest cases so far, we’ve touched millions of references, and we expect the process to continue to scale well. Generally, we’ve found it advantageous to invest early and often in tooling to enable LSCs for the many teams doing infrastructure work. We’ve also found that efficient tooling also helps engineers performing smaller changes. The same tools that make changing thousands of files efficient also scale down to tens of files reasonably well.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="who_deals_with_lscsquestion_mark">
|
||||
<h1>Who Deals with LSCs?</h1>
|
||||
|
||||
<p>As just indicated, the infrastructure teams that build and manage our systems are responsible for much of the work of performing LSCs, but the tools and resources are available across the company.<a contenteditable="false" data-primary="large-scale changes" data-secondary="responsibility for" data-type="indexterm" id="ix_LSCresp"> </a> If you skipped <a data-type="xref" href="ch01.html#what_is_software_engineeringquestion_ma">What Is Software Engineering?</a>, you might wonder why infrastructure teams are the ones responsible for this work. Why can’t we just introduce a new class, function, or system and dictate that everybody who uses the old one move to the updated analogue? Although this might seem easier in practice, it turns out not to scale very well for several reasons.</p>
|
||||
|
||||
<p>First, the infrastructure teams that build and manage the underlying systems are also the ones with the domain knowledge required to fix the hundreds of thousands of references to them. Teams that consume the infrastructure are unlikely to have the context for handling many of these migrations, and it is globally inefficient to expect them to each relearn expertise that infrastructure teams already have. Centralization also allows for faster recovery when faced with errors because errors generally fall into a small set of categories, and the team running the migration can have a playbook—formal or informal—for addressing them.</p>
|
||||
|
||||
<p>Consider the amount of time it takes to do the first of a series of semi-mechanical changes that you don’t understand. You probably spend some time reading about the motivation and nature of the change, find an easy example, try to follow the provided suggestions, and then try to apply that to your local code. Repeating this for every team in an organization greatly increases the overall cost of execution. By making only a few centralized teams responsible for LSCs, Google both internalizes those costs and drives them down by making it possible for the change to happen more efficiently.</p>
|
||||
|
||||
<p>Second, nobody likes unfunded mandates.<sup><a data-type="noteref" id="ch01fn223-marker" href="ch22.html#ch01fn223">4</a></sup> Even though a new system might be categorically better than the one it replaces, those benefits are often diffused across an organization and thus unlikely to matter enough for individual teams to want to update on their own initiative. If the new system is important enough to migrate to, the costs of migration will be borne somewhere in the organization. Centralizing the migration and accounting for its costs is almost always faster and cheaper than depending on individual teams to organically migrate.</p>
|
||||
|
||||
<p>Additionally, having teams that own the systems requiring LSCs helps align incentives to ensure the change gets done. In our experience, organic migrations are unlikely to fully succeed, in part because engineers tend to use existing code as examples when writing new code. Having a team that has a vested interest in removing the old system responsible for the migration effort helps ensure that it actually gets done. Although funding and staffing a team to run these kinds of migrations can seem like an additional cost, it is actually just internalizing the externalities that an unfunded mandate creates, with the additional benefits of economies of scale.</p>
|
||||
|
||||
<aside data-type="sidebar" id="filling_potholes">
|
||||
<h5>Case Study: Filling Potholes</h5>
|
||||
|
||||
<p>Although the LSC systems at Google are used for high-priority migrations, we’ve also discovered that just having them available opens up opportunities for various small fixes across our codebase, which just wouldn’t have been possible without them.<a contenteditable="false" data-primary="small fixes across the codebase with LSCs" data-type="indexterm" id="id-Qbu4HEsRSYt7"> </a> Much like transportation infrastructure tasks consist of building new roads as well as repairing old ones, infrastructure groups at Google spend a lot of time fixing existing code, in addition to developing new systems and moving users to them.</p>
|
||||
|
||||
<p>For example, early in our history, a template library emerged to supplement the C++ Standard Template Library. Aptly named the Google Template Library, this library consisted of several header files’ worth of implementation. For reasons lost in the mists of time, one of these header files was named <em>stl_util.h</em> and another was named <em>map-util.h</em> (note the different separators in the file names). In addition to driving the consistency purists nuts, this difference also led to reduced productivity, and engineers had to remember which file used which separator, and only discovered when they got it wrong after a potentially lengthy compile cycle.</p>
|
||||
|
||||
<p>Although fixing this single-character change might seem pointless, particularly across a codebase the size of Google’s, the maturity of our LSC tooling and process enabled us to do it with just a couple weeks’ worth of background-task effort. Library authors could find and apply this change en masse without having to bother end users of these files, and we were able to quantitatively reduce the number of build failures caused by this specific issue. The resulting increases in productivity (and happiness) more than paid for the time to make the change.</p>
|
||||
|
||||
<p>As the ability to make changes across our entire codebase has improved, the diversity of changes has also expanded, and we can make some engineering decisions knowing that they aren’t immutable in the future. Sometimes, it’s worth the effort to fill a few potholes.<a contenteditable="false" data-primary="large-scale changes" data-secondary="responsibility for" data-startref="ix_LSCresp" data-type="indexterm" id="id-0OuJHNc5SDtO"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="barriers_to_atomic_changes">
|
||||
<h1>Barriers to Atomic Changes</h1>
|
||||
|
||||
<p>Before we<a contenteditable="false" data-primary="atomic changes, barriers to" data-type="indexterm" id="ix_atom"> </a> discuss the process<a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-type="indexterm" id="ix_LSCbarr"> </a> that Google uses to actually effect LSCs, we should talk about why many kinds of changes can’t be committed atomically. In an ideal world, all logical changes could be packaged into a single atomic commit that could be tested, reviewed, and committed independent of other changes. Unfortunately, as a repository—and the number of engineers working in it—grows, that ideal becomes less feasible. It can be completely infeasible even at small scale when using a set of distributed or federated repositories.</p>
|
||||
|
||||
<section data-type="sect2" id="technical_limitations">
|
||||
<h2>Technical Limitations</h2>
|
||||
|
||||
<p>To begin with, most Version Control Systems (VCSs) have operations that scale linearly with the size of a change. <a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-tertiary="technical limitations" data-type="indexterm" id="id-wguNHMsVfahG"> </a><a contenteditable="false" data-primary="centralized version control systems (VCSs)" data-secondary="operations scaling linearly with size of a change" data-type="indexterm" id="id-dXuZs1sXf3ha"> </a>Your system might be able to handle small commits (e.g., on the order of tens of files) just fine, but might not have sufficient memory or processing power to atomically commit thousands of files at once. In centralized VCSs, commits can block other writers (and in older systems, readers) from using the system as they process, meaning that large commits stall other users of the system.</p>
|
||||
|
||||
<p>In short, it might not be just “difficult” or “unwise” to make a large change atomically: it might simply be impossible with a given infrastructure. Splitting the large change into smaller, independent chunks gets around these limitations, although it makes the execution of the change more complex.<sup><a data-type="noteref" id="ch01fn224-marker" href="ch22.html#ch01fn224">5</a></sup></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="merge_conflicts">
|
||||
<h2>Merge Conflicts</h2>
|
||||
|
||||
<p>As the size of a change grows, the<a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-tertiary="merge conflicts" data-type="indexterm" id="id-dXuvH1s8C3ha"> </a> potential for merge conflicts also increases.<a contenteditable="false" data-primary="merge conflicts, size of changes and" data-type="indexterm" id="id-Gku8sEs4Cnhq"> </a> Every version control system we know of requires updating and merging, potentially with manual resolution, if a newer version of a file exists in the central repository. As the number of files in a change increases, the probability of encountering a merge conflict also grows and is compounded by the number of engineers working in the <span class="keep-together">repository.</span></p>
|
||||
|
||||
<p>If your company is small, you might be able to sneak in a change that touches every file in the repository on a weekend when nobody is doing development. Or you might have an informal system of grabbing the global repository lock by passing a virtual (or even physical!) token around your development team. At a large, global company like Google, these approaches are just not feasible: somebody is always making changes to the repository.</p>
|
||||
|
||||
<p>With few files in a change, the probability of merge conflicts shrinks, so they are more likely to be committed without problems. This property also holds for the following areas as well.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="no_haunted_graveyards">
|
||||
<h2>No Haunted Graveyards</h2>
|
||||
|
||||
<p>The SREs who run Google’s production services have a mantra: “No Haunted Graveyards.” A haunted<a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-tertiary="no haunted graveyards" data-type="indexterm" id="id-Gku5HEszcnhq"> </a> graveyard in this<a contenteditable="false" data-primary="haunted graveyards" data-type="indexterm" id="id-QbubsEswczh7"> </a> sense is a system that is so ancient, obtuse, or complex that no one dares enter it. Haunted graveyards are often business-critical systems that are frozen in time because any attempt to change them could cause the system to fail in incomprehensible ways, costing the business real money. They pose a real existential risk and can consume an inordinate amount of resources.</p>
|
||||
|
||||
<p>Haunted graveyards don’t just exist in production systems, however; they can be found in codebases. Many organizations have bits of software that are old and unmaintained, written by someone long off the team, and on the critical path of some important revenue-generating functionality. These systems are also frozen in time, with layers of bureaucracy built up to prevent changes that might cause instability. Nobody wants to be the network support engineer II who flipped the wrong bit!</p>
|
||||
|
||||
<p>These parts of a codebase are anathema to the LSC process because they prevent the completion of large migrations, the decommissioning of other systems upon which they rely, or the upgrade of compilers or libraries that they use. From an LSC perspective, haunted graveyards prevent all kinds of meaningful progress.</p>
|
||||
|
||||
<p>At Google, we’ve found the counter to this to be good, ol’-fashioned testing. When software is thoroughly tested, we can make arbitrary changes to it and know with confidence whether those changes are breaking, no matter the age or complexity of the system. Writing those tests takes a lot of effort, but it allows a codebase like Google’s to evolve over long periods of time, consigning the notion of haunted software graveyards to a graveyard of its own.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="heterogeneity">
|
||||
<h2>Heterogeneity</h2>
|
||||
|
||||
<p>LSCs really work only when the bulk of the effort for them can be done by computers, not humans.<a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-tertiary="heterogeneity" data-type="indexterm" id="id-Qbu4HEsjIzh7"> </a> As good as humans can be with ambiguity, computers rely upon consistent environments to apply the proper code transformations to the correct places. If your organization has many different VCSs, Continuous Integration (CI) systems, project-specific tooling, or formatting guidelines, it is difficult to make sweeping changes across your entire codebase. Simplifying the environment to add more consistency will help both the humans who need to move around in it and the robots making automated transformations.</p>
|
||||
|
||||
<p class="pagebreak-before">For example, many projects at Google have presubmit tests configured to run before changes are made to their codebase. Those checks can be very complex, ranging from checking new dependencies against a whitelist, to running tests, to ensuring that the change has an associated bug. Many of these checks are relevant for teams writing new features, but for LSCs, they just add additional irrelevant complexity.</p>
|
||||
|
||||
<p>We’ve decided to embrace some of this complexity, such as running presubmit tests, by making it standard across our codebase. For other inconsistencies, we advise teams to omit their special checks when parts of LSCs touch their project code. Most teams are happy to help given the benefit these kinds of changes are to their projects.</p>
|
||||
|
||||
<div data-type="note" id="id-znUZcZIMhM"><h6>Note</h6>
|
||||
<p>Many of the benefits of consistency for humans mentioned in <a data-type="xref" href="ch08.html#style_guides_and_rules">Style Guides and Rules</a> also apply to automated tooling.</p>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="testing-id00096">
|
||||
<h2>Testing</h2>
|
||||
|
||||
<p>Every change should be tested (a process we’ll talk about more in just a moment), but the<a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-tertiary="testing" data-type="indexterm" id="id-AAuZHKsNSyhn"> </a> larger <a contenteditable="false" data-primary="testing" data-secondary="as barrier to atomic changes" data-type="indexterm" id="id-zgugsos9Sahy"> </a>the change, the more difficult it is to actually test it appropriately. Google’s CI system will run not only the tests immediately impacted by a change, but also any tests that transitively depend on the changed files.<sup><a data-type="noteref" id="ch01fn225-marker" href="ch22.html#ch01fn225">6</a></sup> This means a change gets broad coverage, but we’ve also observed that the farther away in the dependency graph a test is from the impacted files, the more unlikely a failure is to have been caused by the change itself.</p>
|
||||
|
||||
<p>Small, independent changes are easier to validate, because each of them affects a smaller set of tests, but also because test failures are easier to diagnose and fix. Finding the root cause of a test failure in a change of 25 files is pretty straightforward; finding 1 in a 10,000-file change is like the proverbial needle in a haystack.</p>
|
||||
|
||||
<p>The trade-off in this decision is that smaller changes will cause the same tests to be run multiple times, particularly tests that depend on large parts of the codebase. Because engineer time spent tracking down test failures is much more expensive than the compute time required to run these extra tests, we’ve made the conscious decision that this is a trade-off we’re willing to make. That same trade-off might not hold for all organizations, but it is worth examining what the proper balance is for yours.<a contenteditable="false" data-primary="atomic changes, barriers to" data-startref="ix_atom" data-type="indexterm" id="id-0OuJHbC5S9hO"> </a><a contenteditable="false" data-primary="large-scale changes" data-secondary="barriers to atomic changes" data-startref="ix_LSCbarr" data-type="indexterm" id="id-9auAsqC4SxhQ"> </a></p>
|
||||
|
||||
<aside data-type="sidebar" id="testing_lscs">
|
||||
<h5>Case Study: Testing LSCs</h5>
|
||||
|
||||
<p class="byline">Adam Bender</p>
|
||||
|
||||
<p>Today it is common<a contenteditable="false" data-primary="testing" data-secondary="of large-scale changes" data-type="indexterm" id="ix_tstLSC"> </a> for a double-digit percentage (10% to 20%) of <a contenteditable="false" data-primary="large-scale changes" data-secondary="testing" data-type="indexterm" id="ix_LSCtst"> </a>the changes in a project to be the result of LSCs, meaning a substantial amount of code is changed in projects by people whose full-time job is unrelated to those projects. Without good tests, such work would be impossible, and Google’s codebase would quickly atrophy under its own weight. LSCs enable us to systematically migrate our entire codebase to newer APIs, deprecate older APIs, change language versions, and remove popular but dangerous practices.</p>
|
||||
|
||||
<p>Even a simple one-line signature change becomes complicated when made in a thousand different places across hundreds of different products and services.<sup><a data-type="noteref" id="ch01fn226-marker" href="ch22.html#ch01fn226">7</a></sup> After the change is written, you need to coordinate code reviews across dozens of teams. Lastly, after reviews are approved, you need to run as many tests as you can to be sure the change is safe.<sup><a data-type="noteref" id="ch01fn227-marker" href="ch22.html#ch01fn227">8</a></sup> We say “as many as you can,” because a good-sized LSC could trigger a rerun of every single test at Google, and that can take a while. In fact, many LSCs have to plan time to catch downstream clients whose code backslides while the LSC makes its way through the process.</p>
|
||||
|
||||
<p>Testing an LSC can be a slow and frustrating process. When a change is sufficiently large, your local environment is almost guaranteed to be permanently out of sync with head as the codebase shifts like sand around your work. In such circumstances, it is easy to find yourself running and rerunning tests just to ensure your changes continue to be valid. When a project has flaky tests or is missing unit test coverage, it can require a lot of manual intervention and slow down the entire process. To help speed things up, we use a strategy called the TAP (Test Automation Platform) train.</p>
|
||||
|
||||
<h3>Riding the TAP Train</h3>
|
||||
|
||||
<p>The core <a contenteditable="false" data-primary="large-scale changes" data-secondary="testing" data-tertiary="riding the TAP train" data-type="indexterm" id="id-6WuYHKSqcMSkhV"> </a>insight to LSCs is that they rarely interact with one another, and most affected tests are going to pass for most LSCs.<a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-secondary="train model and testing of LSCs" data-type="indexterm" id="id-nguQsWSwc8S2hY"> </a> As a result, we can test more than one change at a time and reduce the total number of tests executed. The train model has proven to be very effective for testing LSCs.</p>
|
||||
|
||||
<p>The TAP train takes advantage of two facts:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>LSCs tend to be pure refactorings and therefore very narrow in scope, preserving local semantics.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Individual changes are often simpler and highly scrutinized, so they are correct more often than not.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>The train model also has the advantage that it works for multiple changes at the same time and doesn’t require that each individual change ride in isolation.<sup><a data-type="noteref" id="ch01fn228-marker" href="ch22.html#ch01fn228">9</a></sup></p>
|
||||
|
||||
<p>The train has five steps and is started fresh every three hours:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>For each change on the train, run a sample of 1,000 randomly-selected tests.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Gather up all the changes that passed their 1,000 tests and create one uber-change from all of them: “the train.”</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Run the union of all tests directly affected by the group of changes. Given a large enough (or low-level enough) LSC, this can mean running every single test in Google’s repository. This process can take more than six hours to complete.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>For each nonflaky test that fails, rerun it individually against each change that made it into the train to determine which changes caused it to fail.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>TAP generates a report for each change that boarded the train. The report describes all passing and failing targets and can be used as evidence that an LSC is safe to submit.</p>
|
||||
</li>
|
||||
</ol>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="code_review">
|
||||
<h2>Code Review</h2>
|
||||
|
||||
<p>Finally, as we mentioned in <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a>, all changes need to be reviewed before submission, and this policy applies even for LSCs. <a contenteditable="false" data-primary="large-scale changes" data-secondary="testing" data-tertiary="code reviews" data-type="indexterm" id="id-0Ou2sMsdt9hO"> </a><a contenteditable="false" data-primary="code reviews" data-secondary="for large-scale changes" data-type="indexterm" id="id-9auLfRsZtxhQ"> </a>Reviewing large commits can be tedious, onerous, and even error prone, particularly if the changes are generated by hand (a process you want to avoid, as we’ll discuss shortly). In just a moment, we’ll look at how tooling can often help in this space, but for some classes of changes, we still want humans to explicitly verify they are correct. Breaking an LSC into separate shards makes this much easier.</p>
|
||||
|
||||
<aside data-type="sidebar" id="scoped_ptr_to_stdunique_ptr">
|
||||
<h5>Case Study: scoped_ptr to std::unique_ptr</h5>
|
||||
|
||||
<p>Since its<a contenteditable="false" data-primary="C++" data-secondary="scoped_ptr to std::unique_ptr" data-type="indexterm" id="id-9aupHRs3f4tPhW"> </a> earliest days, Google’s C++ codebase has had a self-destructing smart pointer for wrapping <a contenteditable="false" data-primary="scoped_ptr in C++" data-type="indexterm" id="id-YDujsds9f8tZhm"> </a>heap-allocated C++ objects and ensuring that they are destroyed when the smart pointer goes out of scope.<a contenteditable="false" data-primary="large-scale changes" data-secondary="testing" data-tertiary="scoped_ptr to std::unique_ptr" data-type="indexterm" id="id-Knuafas2fwtMh4"> </a> This type was called <code>scoped_ptr</code> and was used extensively throughout Google’s codebase to ensure that object lifetimes were appropriately managed. It wasn’t perfect, but given the limitations of the then-current C++ standard (C++98) when the type was first introduced, it made for safer programs.</p>
|
||||
|
||||
<p>In C++11, the language<a contenteditable="false" data-primary="std::unique_ptr in C++" data-type="indexterm" id="id-YDu2Hzf9f8tZhm"> </a> introduced a new type: <code>std::unique_ptr</code>. It fulfilled the same function as <code>scoped_ptr</code>, but also prevented other classes of bugs that the language now could detect. <code>std::unique_ptr</code> was strictly better than <code>scoped_ptr</code>, yet Google’s codebase had more than 500,000 references to <code>scoped_ptr</code> scattered among millions of source files. Moving to the more modern type required the largest LSC attempted to that point within Google.</p>
|
||||
|
||||
<p>Over the course of several months, several engineers attacked the problem in parallel. Using Google’s large-scale migration infrastructure, we were able to change references to <code>scoped_ptr</code> into references to <code>std::unique_ptr</code> as well as slowly adapt <code>scoped_ptr</code> to behave more closely to <code>std::unique_ptr</code>. At the height of the migration process, we were consistently generating, testing and committing more than 700 independent changes, touching more than 15,000 files <em>per day</em>. Today, we sometimes manage 10 times that throughput, having refined our practices and improved our tooling.</p>
|
||||
|
||||
<p>Like almost all LSCs, this one had a very long tail of tracking down various nuanced behavior dependencies (another manifestation of Hyrum’s Law), fighting race conditions with other engineers, and uses in generated code that weren’t detectable by our automated tooling. We continued to work on these manually as they were discovered by the testing infrastructure.</p>
|
||||
|
||||
<p><code>scoped_ptr</code> was also used as a parameter type in some widely used APIs, which made small independent changes difficult. We contemplated writing a call-graph analysis system that could change an API and its callers, transitively, in one commit, but were concerned that the resulting changes would themselves be too large to commit <span class="keep-together">atomically.</span></p>
|
||||
|
||||
<p>In the end, we were able to finally remove <code>scoped_ptr</code> by first making it a type alias of <code>std::unique_ptr</code> and then performing the textual substitution between the old alias and the new, before eventually just removing the old <code>scoped_ptr</code> alias. Today, Google’s codebase benefits from using the same standard type as the rest of the C++ ecosystem, which was possible only because of our technology and tooling for LSCs.<a contenteditable="false" data-primary="testing" data-secondary="of large-scale changes" data-startref="ix_tstLSC" data-type="indexterm" id="id-43u4CwSmfjtgh7"> </a><a contenteditable="false" data-primary="large-scale changes" data-secondary="testing" data-startref="ix_LSCtst" data-type="indexterm" id="id-Z3ugc0SRfWtDh6"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="lsc_infrastructure">
|
||||
<h1>LSC Infrastructure</h1>
|
||||
|
||||
<p>Google has invested in a <a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-type="indexterm" id="ix_LSCinfr"> </a>significant amount of infrastructure to make LSCs possible. This infrastructure includes tooling for change creation, change management, change review, and testing. However, perhaps the most important support for LSCs has been the evolution of cultural norms around large-scale changes and the oversight given to them. Although the sets of technical and social tools might differ for your organization, the general principles should be the same.</p>
|
||||
|
||||
<section data-type="sect2" id="policies_and_culture">
|
||||
<h2>Policies and Culture</h2>
|
||||
|
||||
<p>As we’ve described in <a data-type="xref" href="ch16.html#version_control_and_branch_management">Version Control and Branch Management</a>, Google stores the bulk<a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-tertiary="policies and culture" data-type="indexterm" id="id-Gku8sEsZfMUq"> </a> of its source code in a single monolithic repository (monorepo), and every engineer has visibility into almost all of this code. This high degree of openness means that any engineer can edit any file and send those edits for review to those who can approve them. However, each of those edits has costs, both to generate as well as review.<a contenteditable="false" data-primary="policies for large-scale changes" data-type="indexterm" id="id-Qbu7fEs3fwU7"> </a><sup><a data-type="noteref" id="ch01fn229-marker" href="ch22.html#ch01fn229">10</a></sup></p>
|
||||
|
||||
<p>Historically, these costs have been somewhat symmetric, which limited the scope of changes a single engineer or team could generate. As Google’s LSC tooling improved, it became easier to generate a large number of changes very cheaply, and it became equally easy for a single engineer to impose a burden on a large number of reviewers across the company. Even though we want to encourage widespread improvements to our codebase, we want to make sure there is some oversight and thoughtfulness behind them, rather than indiscriminate tweaking.<sup><a data-type="noteref" id="ch01fn230-marker" href="ch22.html#ch01fn230">11</a></sup></p>
|
||||
|
||||
<p>The end result is a lightweight approval process for teams and individuals seeking to make LSCs across Google. This process is overseen by a group of experienced engineers who are familiar with the nuances of various languages, as well as invited domain experts for the particular change in question. The goal of this process is not to prohibit LSCs, but to help change authors produce the best possible changes, which make the best use of Google’s technical and human capital. Occasionally, this group might suggest that a cleanup just isn’t worth it: for example, cleaning up a common typo without any way of preventing recurrence.</p>
|
||||
|
||||
<p>Related to these policies was a shift in cultural norms surrounding LSCs.<a contenteditable="false" data-primary="culture" data-secondary="changes in norms surrounding LSCs" data-type="indexterm" id="id-AAuZHjcbfzUn"> </a> Although it is important for code owners to have a sense of responsibility for their software, they also needed to learn that LSCs were an important part of Google’s effort to scale our software engineering practices. Just as product teams are the most familiar with their own software, library infrastructure teams know the nuances of the infrastructure, and getting product teams to trust that domain expertise is an important step toward social acceptance of LSCs. As a result of this culture shift, local product teams have grown to trust LSC authors to make changes relevant to those authors’ domains.</p>
|
||||
|
||||
<p>Occasionally, local owners question the purpose of a specific commit being made as part of a broader LSC, and change authors respond to these comments just as they would other review comments. Socially, it’s important that code owners understand the changes happening to their software, but they also have come to realize that they don’t hold a veto over the broader LSC. Over time, we’ve found that a good FAQ and a solid historic track record of improvements have generated widespread endorsement of LSCs throughout Google.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="codebase_insight">
|
||||
<h2>Codebase Insight</h2>
|
||||
|
||||
<p>To do LSCs, we’ve found it invaluable<a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-tertiary="codebase insight" data-type="indexterm" id="id-Gku5HEs4CMUq"> </a> to be able to do <a contenteditable="false" data-primary="codebase" data-secondary="analysis of, large-scale changes and" data-type="indexterm" id="id-QbubsEsMCwU7"> </a>large-scale analysis of our codebase, both on a textual level using traditional tools, as well as on a semantic level. <a contenteditable="false" data-primary="Kythe" data-type="indexterm" id="id-AAumfKsECzUn"> </a>For example, Google’s use of the semantic indexing tool <a href="https://kythe.io">Kythe</a> provides a complete map of the links between parts of our codebase, allowing us to ask questions such as “Where are the callers of this function?” or “Which classes derive from this one?” Kythe and similar tools also provide programmatic access to their data so that they can be incorporated into refactoring tools. (For further examples, see Chapters <a data-type="xref" data-xrefstyle="select:labelnumber" href="ch17.html#code_search">Code Search</a> and <a data-type="xref" data-xrefstyle="select:labelnumber" href="ch20.html#static_analysis-id00082">Static Analysis</a>.)</p>
|
||||
|
||||
<p>We also use compiler-based indices to run abstract syntax tree-based analysis and transformations over our codebase. Tools such as <a href="https://oreil.ly/c6xvO">ClangMR</a>, JavacFlume, or <a href="https://oreil.ly/Er03J">Refaster</a>, which can perform transformations in a highly parallelizable way, depend on these insights as part of their function. For smaller changes, authors can use specialized, custom tools, <code>perl</code> or <code>sed</code>, regular expression matching, or even a simple shell script.</p>
|
||||
|
||||
<p>Whatever tool your organization uses for change creation, it’s important that its human effort scale sublinearly with the codebase; in other words, it should take roughly the same amount of human time to generate the collection of all required changes, no matter the size of the repository. The change creation tooling should also be comprehensive across the codebase, so that an author can be assured that their change covers all of the cases they’re trying to fix.</p>
|
||||
|
||||
<p>As with other areas in this book, an early investment in tooling usually pays off in the short to medium term. As a rule of thumb, we’ve long held that if a change requires more than 500 edits, it’s usually more efficient for an engineer to learn and execute our change-generation tools rather than manually execute that edit. For experienced “code janitors,” that number is often much smaller.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="change_management">
|
||||
<h2>Change Management</h2>
|
||||
|
||||
<p>Arguably the most <a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-tertiary="change management" data-type="indexterm" id="id-Qbu4HEswcwU7"> </a>important piece of large-scale change<a contenteditable="false" data-primary="change management for large-scale changes" data-type="indexterm" id="id-AAu4sKsGczUn"> </a> infrastructure is the set of tooling that shards a master change into smaller pieces and manages the process of testing, mailing, reviewing, and committing them independently. <a contenteditable="false" data-primary="Rosie tool" data-type="indexterm" id="id-zguofoskcKUy"> </a>At Google, this tool is called Rosie, and we discuss its use more completely in a few moments when we examine our LSC process. In many respects, Rosie is not just a tool, but an entire platform for making LSCs at Google scale. It provides the ability to split the large sets of comprehensive changes produced by tooling into smaller shards, which can be tested, reviewed, and submitted independently.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="testing">
|
||||
<h2>Testing</h2>
|
||||
|
||||
<p>Testing is another important piece<a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-tertiary="testing" data-type="indexterm" id="id-AAuZHKs3IzUn"> </a> of large-scale-change–enabling infrastructure.<a contenteditable="false" data-primary="testing" data-secondary="in large-scale change infrastructure" data-type="indexterm" id="id-zgugsos7IKUy"> </a> As discussed in <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>, tests are one of the important ways that we validate our software will behave as expected. This is particularly important when applying changes that are not authored by humans. A robust testing culture and infrastructure means that other tooling can be confident that these changes don’t have unintended effects.</p>
|
||||
|
||||
<p>Google’s testing strategy for LSCs differs slightly from that of normal changes while still using the same underlying CI infrastructure. Testing LSCs means not just ensuring the large master change doesn’t cause failures, but that each shard can be submitted safely and independently. Because each shard can contain arbitrary files, we don’t use the standard project-based presubmit tests. Instead, we run each shard over the transitive closure of every test it might affect, which we discussed earlier.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="language_support">
|
||||
<h2>Language Support</h2>
|
||||
|
||||
<p>LSCs at Google are typically done on a per-language basis, and some languages support them much more easily than others.<a contenteditable="false" data-primary="programming languages" data-secondary="support for large-scale changes" data-type="indexterm" id="id-zguVHos9SKUy"> </a><a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-tertiary="language support" data-type="indexterm" id="id-0Ou2sMs5S4UO"> </a> We’ve found that language features such as type aliasing and forwarding functions are invaluable for allowing existing users to continue to function while we introduce new systems and migrate users to them non-atomically. For languages that lack these features, it is often difficult to migrate systems incrementally.<sup><a data-type="noteref" id="ch01fn231-marker" href="ch22.html#ch01fn231">12</a></sup></p>
|
||||
|
||||
<p>We’ve also found that statically typed languages are much easier to perform large automated changes in than dynamically typed languages. Compiler-based tools along with strong static analysis provide a significant amount of information that we can use to build tools to affect LSCs and reject invalid transformations before they even get to the testing phase. The unfortunate result of this is that languages like Python, Ruby, and JavaScript that are dynamically typed are extra difficult for maintainers. Language choice is, in many respects, intimately tied to the question of code lifespan: languages that tend to be viewed as more focused on developer productivity tend to be more difficult to maintain. Although this isn’t an intrinsic design requirement, it is where the current state of the art happens to be.</p>
|
||||
|
||||
<p>Finally, it’s worth pointing out that automatic language formatters are a crucial part of the LSC infrastructure. Because we work toward optimizing our code for readability, we want to make sure that any changes produced by automated tooling are intelligible to both immediate reviewers and future readers of the code. All of the LSC-generation tools run the automated formatter appropriate to the language being changed as a separate pass so that the change-specific tooling does not need to <span class="keep-together">concern</span> itself with formatting specifics. Applying automated formatting, such as <a href="https://github.com/google/google-java-format">google-java-format</a> or <a href="https://clang.llvm.org/docs/ClangFormat.html">clang-format</a>, to our codebase means that automatically produced changes will “fit in” with code written by a human, reducing future development friction. Without automated formatting, large-scale automated changes would never have become the accepted status quo at Google.</p>
|
||||
</section>
|
||||
|
||||
<aside data-type="sidebar" id="operation_rosehub">
|
||||
<h5>Case Study: Operation RoseHub</h5>
|
||||
|
||||
<p>LSCs have become a large part of Google’s internal<a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-tertiary="Operation RoseHub" data-type="indexterm" id="id-0OuJHMsdt4UO"> </a> culture, but they are starting to have implications in the broader world. <a contenteditable="false" data-primary="Operation RoseHub" data-type="indexterm" id="id-9auAsRsZt7UQ"> </a>Perhaps the best known case so far was “<a href="https://oreil.ly/txtDj">Operation RoseHub</a>.”</p>
|
||||
|
||||
<p>In early 2017, a vulnerability in the Apache Commons library allowed any Java application with a vulnerable version of the library in its transitive classpath to become susceptible to remote execution. This bug became known as the Mad Gadget. Among other things, it allowed an avaricious hacker to encrypt the San Francisco Municipal Transportation Agency’s systems and shut down its operations. Because the only requirement for the vulnerability was having the wrong library somewhere in its classpath, anything that depended on even one of many open source projects on GitHub was vulnerable.</p>
|
||||
|
||||
<p>To solve this problem, some enterprising Googlers launched their own version of the LSC process. By using tools such as <a href="https://cloud.google.com/bigquery">BigQuery</a>, volunteers identified affected projects and sent more than 2,600 patches to upgrade their versions of the Commons library to one that addressed Mad Gadget. Instead of automated tools managing the process, more than 50 humans made this LSC work.<a contenteditable="false" data-primary="large-scale changes" data-secondary="infrastructure" data-startref="ix_LSCinfr" data-type="indexterm" id="id-KnuvsmCDtKUj"> </a></p>
|
||||
</aside>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="the_lsc_process">
|
||||
<h1>The LSC Process</h1>
|
||||
|
||||
<p>With these pieces of infrastructure in place, we can now talk about the process for actually making an LSC. <a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-type="indexterm" id="ix_LSCproc"> </a>This roughly breaks down into four phases (with very nebulous boundaries between them):</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
<p>Authorization</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Change creation</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Shard management</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Cleanup</p>
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>Typically, these steps happen after a new system, class, or function has been written, but it’s important to keep them in mind during the design of the new system. At Google, we aim to design successor systems with a migration path from older systems in mind, so that system maintainers can move their users to the new system <span class="keep-together">automatically.</span></p>
|
||||
|
||||
<section data-type="sect2" id="authorization">
|
||||
<h2>Authorization</h2>
|
||||
|
||||
<p>We ask potential authors to fill out a brief document explaining the reason for a proposed change, its estimated impact across<a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-tertiary="authorization" data-type="indexterm" id="id-AAuZHKsGcNun"> </a> the codebase (i.e., how many smaller shards the large change would generate), and answers to any questions potential reviewers might have.<a contenteditable="false" data-primary="authorization for large-scale changes" data-type="indexterm" id="id-zgugsoskcNuy"> </a> This process also forces authors to think about how they will describe the change to an engineer unfamiliar with it in the form of an FAQ and proposed change description. Authors also get “domain review” from the owners of the API being refactored.</p>
|
||||
|
||||
<p>This proposal is then forwarded to an email list with about a dozen people who have oversight over the entire process. After discussion, the committee gives feedback on how to move forward. For example, one of the most common changes made by the committee is to direct all of the code reviews for an LSC to go to a single "global approver." Many first-time LSC authors tend to assume that local project owners should review everything, but for most mechanical LSCs, it’s cheaper to have a single expert understand the nature of the change and build automation around reviewing it properly.</p>
|
||||
|
||||
<p>After the change is approved, the author can move forward in getting their change submitted. Historically, the committee has been very liberal with their approval,<sup><a data-type="noteref" id="ch01fn232-marker" href="ch22.html#ch01fn232">13</a></sup> and often gives approval not just for a specific change, but also for a broad set of related changes. Committee members can, at their discretion, fast-track obvious changes without the need for full deliberation.</p>
|
||||
|
||||
<p>The intent of this process is to provide oversight and an escalation path, without being too onerous for the LSC authors. The committee is also empowered as the escalation body for concerns or conflicts about an LSC: local owners who disagree with the change can appeal to this group who can then arbitrate any conflicts. In practice, this has rarely been needed.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="change_creation">
|
||||
<h2>Change Creation</h2>
|
||||
|
||||
<p>After getting the required approval, an LSC author will begin to produce the actual code edits. <a contenteditable="false" data-primary="changes to code" data-secondary="change creation in LSC process" data-type="indexterm" id="id-zguVHos7INuy"> </a><a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-tertiary="change creation" data-type="indexterm" id="id-0Ou2sMsGIEuO"> </a>Sometimes, these can be generated comprehensively into a single large global change that will be subsequently sharded into many smaller independent pieces. Usually, the size of the change is too large to fit in a single global change, due to technical limitations of the underlying version control system.</p>
|
||||
|
||||
<p>The change generation process should be as automated as possible so that the parent change can be updated as users backslide into old uses<sup><a data-type="noteref" id="ch01fn233-marker" href="ch22.html#ch01fn233">14</a></sup> or textual merge conflicts occur in the changed code. Occasionally, for the rare case in which technical tools aren’t able to generate the global change, we have sharded change generation across humans (see <a data-type="xref" href="ch22.html#operation_rosehub">Case Study: Operation RoseHub</a>). Although much more labor intensive than automatically generating changes, this allows global changes to happen much more quickly for time-sensitive applications.</p>
|
||||
|
||||
<p>Keep in mind that we optimize for human readability of our codebase, so whatever tool generates changes, we want the resulting changes to look as much like human-generated changes as possible. This requirement leads to the necessity of style guides and automatic formatting tools (see <a data-type="xref" href="ch08.html#style_guides_and_rules">Style Guides and Rules</a>).<sup><a data-type="noteref" id="ch01fn234-marker" href="ch22.html#ch01fn234">15</a></sup></p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="sharding_and_submitting">
|
||||
<h2>Sharding and Submitting</h2>
|
||||
|
||||
<p>After a global change has <a contenteditable="false" data-primary="sharding and submitting in LSC process" data-type="indexterm" id="ix_shrd"> </a>been generated, the author then starts <a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-tertiary="sharding and submitting" data-type="indexterm" id="ix_LSCprocsh"> </a>running <a contenteditable="false" data-primary="Rosie tool" data-secondary="sharding and submitting in LSC process" data-type="indexterm" id="ix_Rosie"> </a>Rosie. Rosie takes a large change and shards it based upon project boundaries and ownership rules into changes that <em>can</em> be submitted atomically. It then puts each individually sharded change through an independent test-mail-submit pipeline. Rosie can be a heavy user of other pieces of Google’s developer infrastructure, so it caps the number of outstanding shards for any given LSC, runs at lower priority, and communicates with the rest of the infrastructure about how much load it is acceptable to generate on our shared testing infrastructure.</p>
|
||||
|
||||
<p>We talk more about the specific test-mail-submit process for each shard below.</p>
|
||||
|
||||
<aside data-type="sidebar" id="cattle_versus_pets">
|
||||
<h5>Cattle Versus Pets</h5>
|
||||
|
||||
<p>We often use the “cattle and pets” analogy when referring to individual machines in a distributed computing environment, but the same principles can apply to changes within a codebase.<a contenteditable="false" data-primary="cattle versus pets analogy" data-secondary="applying to changes in a codebase" data-type="indexterm" id="id-KnuWHaspCbSGu4"> </a></p>
|
||||
|
||||
<p>At Google, as at most organizations, typical changes to the codebase are handcrafted by individual engineers working on specific features or bug fixes. Engineers might spend days or weeks working through the creation, testing, and review of a single change. They come to know the change intimately, and are proud when it is finally committed to the main repository. The creation of such a change is akin to owning and raising a favorite pet.</p>
|
||||
|
||||
<p>In contrast, effective handling of LSCs requires a high degree of automation and produces an enormous number of individual changes. In this environment, we’ve found it useful to treat specific changes as cattle: nameless and faceless commits that might be rolled back or otherwise rejected at any given time with little cost unless the entire herd is affected. Often this happens because of an unforeseen problem not caught by tests, or even something as simple as a merge conflict.</p>
|
||||
|
||||
<p>With a “pet” commit, it can be difficult to not take rejection personally, but when working with many changes as part of a large-scale change, it’s just the nature of the job. Having automation means that tooling can be updated and new changes generated at very low cost, so losing a few cattle now and then isn’t a problem.</p>
|
||||
</aside>
|
||||
|
||||
<section data-type="sect3" id="testing-id00098">
|
||||
<h3>Testing</h3>
|
||||
|
||||
<p>Each independent shard is tested by running it through TAP, Google’s CI framework. <a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-secondary="testing LSC shards" data-type="indexterm" id="id-kgu2HzsqcVS8uz"> </a>We run every test that depends on the files in a given change transitively, which often creates high load on our CI system.</p>
|
||||
|
||||
<p>This might sound computationally expensive, but in practice, the vast majority of shards affect fewer than one thousand tests, out of the millions across our codebase. For those that affect more, we can group them together: first running the union of all affected tests for all shards, and then for each individual shard running just the intersection of its affected tests with those that failed the first run. Most of these unions cause almost every test in the codebase to be run, so adding additional changes to that batch of shards is nearly free.</p>
|
||||
|
||||
<p>One of the drawbacks of running such a large number of tests is that independent low-probability events are almost certainties at large enough scale. Flaky and brittle tests, such as those discussed in <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>, which often don’t harm the teams that write and maintain them, are particularly difficult for LSC authors. Although fairly low impact for individual teams, flaky tests can seriously affect the throughput of an LSC system. Automatic flake detection and elimination systems help with this issue, but it can be a constant effort to ensure that teams that write flaky tests are the ones that bear their costs.</p>
|
||||
|
||||
<p>In our experience with LSCs as semantic-preserving, machine-generated changes, we are now much more confident in the correctness of a single change than a test with any recent history of flakiness—so much so that recently flaky tests are now ignored when submitting via our automated tooling. In theory, this means that a single shard can cause a regression that is detected only by a flaky test going from flaky to failing. In practice, we see this so rarely that it’s easier to deal with it via human communication rather than automation.</p>
|
||||
|
||||
<p>For any LSC process, individual shards should be committable independently. This means that they don’t have any interdependence or that the sharding mechanism can group dependent changes (such as to a header file and its implementation) together. Just like any other change, large-scale change shards must also pass project-specific checks before being reviewed and committed.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="mailing_reviewers">
|
||||
<h3>Mailing reviewers</h3>
|
||||
|
||||
<p>After Rosie has validated that a change is safe through testing, it mails the change to an appropriate reviewer. In a company as large as Google, with thousands of engineers, reviewer discovery itself is a challenging problem. Recall from <a data-type="xref" href="ch09.html#code_review-id00002">Code Review</a> that code in the repository is organized with OWNERS files, which list users with approval privileges for a specific subtree in the repository. Rosie uses an owners detection service that understands these OWNERS files and weights each owner based upon their expected ability to review the specific shard in question. If a particular owner proves to be unresponsive, Rosie adds additional reviewers automatically in an effort to get a change reviewed in a timely manner.</p>
|
||||
|
||||
<p>As part of the mailing process, Rosie also runs the per-project precommit tools, which might perform additional checks. For LSCs, we selectively disable certain checks such as those for nonstandard change description formatting. Although useful for individual changes on specific projects, such checks are a source of heterogeneity across the codebase and can add significant friction to the LSC process. This heterogeneity is a barrier to scaling our processes and systems, and LSC tools and authors can’t be expected to understand special policies for each team.</p>
|
||||
|
||||
<p>We also aggressively ignore presubmit check failures that preexist the change in question. When working on an individual project, it’s easy for an engineer to fix those and continue with their original work, but that technique doesn’t scale when making LSCs across Google’s codebase. Local code owners are responsible for having no preexisting failures in their codebase as part of the social contract between them and infrastructure teams.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="reviewing">
|
||||
<h3>Reviewing</h3>
|
||||
|
||||
<p>As with other changes, changes generated by Rosie are expected to go through the standard code review process.<a contenteditable="false" data-primary="code reviews" data-secondary="for large-scale changes" data-type="indexterm" id="id-ygu1H6s6SDSAun"> </a> In practice, we’ve found that local owners don’t often treat LSCs with the same rigor as regular changes—they trust the engineers generating LSCs too much. Ideally these changes would be reviewed as any other, but in practice, local project owners have come to trust infrastructure teams to the point where these changes are often given only cursory review. We’ve come to only send changes to local owners for which their review is required for context, not just approval permissions. All other changes can go to a “global approver”: someone who has ownership rights to approve <em>any</em> change throughout the repository.</p>
|
||||
|
||||
<p>When using a global approver, all of the individual shards are assigned to that person, rather than to individual owners of different projects. Global approvers generally have specific knowledge of the language and/or libraries they are reviewing and work with the large-scale change author to know what kinds of changes to expect. They know what the details of the change are and what potential failure modes for it might exist and can customize their workflow accordingly.</p>
|
||||
|
||||
<p>Instead of reviewing each change individually, global reviewers use a separate set of pattern-based tooling to review each of the changes and automatically approve ones that meet their expectations. Thus, they need to manually examine only a small subset that are anomalous because of merge conflicts or tooling malfunctions, which allows the process to scale very well.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="submitting">
|
||||
<h3>Submitting</h3>
|
||||
|
||||
<p>Finally, individual changes are committed. As with the mailing step, we ensure that the change passes the various project precommit checks before actually finally being committed to the repository.</p>
|
||||
|
||||
<p>With Rosie, we are able to effectively create, test, review, and submit thousands of changes per day across all of Google’s codebase and have given teams the ability to effectively migrate their users. Technical decisions that used to be final, such as the name of a widely used symbol or the location of a popular class within a codebase, no longer need to be final.<a contenteditable="false" data-primary="Rosie tool" data-secondary="sharding and submitting in LSC process" data-startref="ix_Rosie" data-type="indexterm" id="id-nguaHwfxt8SvuY"> </a><a contenteditable="false" data-primary="sharding and submitting in LSC process" data-startref="ix_shrd" data-type="indexterm" id="id-43uWsQfktKSMu7"> </a><a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-startref="ix_LSCprocsh" data-tertiary="sharding and submitting" data-type="indexterm" id="id-Z3uAf2f6tRSqu6"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="cleanup">
|
||||
<h2>Cleanup</h2>
|
||||
|
||||
<p>Different LSCs have different definitions of “done,” which can<a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-tertiary="cleanup" data-type="indexterm" id="id-9aupHRsZtDuQ"> </a> vary<a contenteditable="false" data-primary="cleanup in LSC process" data-type="indexterm" id="id-YDujsdspt2uP"> </a> from completely removing an old system to migrating only high-value references and leaving old ones to organically disappear.<sup><a data-type="noteref" id="ch01fn235-marker" href="ch22.html#ch01fn235">16</a></sup> In almost all cases, it’s important to have a system that prevents additional introductions of the symbol or system that the large-scale change worked hard to remove. At Google, we use the Tricorder framework mentioned in Chapters <a data-type="xref" data-xrefstyle="select:labelnumber" href="ch19.html#critique_googleapostrophes_code_review">Critique: Google’s Code Review Tool</a> and <a data-type="xref" data-xrefstyle="select:labelnumber" href="ch20.html#static_analysis-id00082">Static Analysis</a> to flag at review time when an engineer introduces a new use of a deprecated object, and this <a contenteditable="false" data-primary="deprecation" data-secondary="preventing new uses of deprecated object" data-type="indexterm" id="id-ygu6I6sWtauj"> </a>has proven an effective method to prevent backsliding. We talk more about<a contenteditable="false" data-primary="large-scale changes" data-secondary="process" data-startref="ix_LSCproc" data-type="indexterm" id="id-6WuoS1sbtNuO"> </a> the entire deprecation process in <a data-type="xref" href="ch15.html#deprecation">Deprecation</a>.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00026">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>LSCs form an important part of Google’s software engineering ecosystem. At design time, they open up more possibilities, knowing that some design decisions don’t need to be as fixed as they once were. The LSC process also allows maintainers of core infrastructure the ability to migrate large swaths of Google’s codebase from old systems, language versions, and library idioms to new ones, keeping the codebase consistent, spatially and temporally. And all of this happens with only a few dozen engineers supporting tens of thousands of others.</p>
|
||||
|
||||
<p>No matter the size of your organization, it’s reasonable to think about how you would make these kinds of sweeping changes across your collection of source code. Whether by choice or by necessity, having this ability will allow greater flexibility as your organization scales while keeping your source code malleable over time.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00128">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>An LSC process makes it possible to rethink the immutability of certain technical decisions.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Traditional models of refactoring break at large scales.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Making LSCs means making a<a contenteditable="false" data-primary="large-scale changes" data-startref="ix_LSC" data-type="indexterm" id="id-AAuZH0Hbf6sNFL"> </a> habit of making LSCs.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn220"><sup><a href="ch22.html#ch01fn220-marker">1</a></sup>For some ideas about why, see <a data-type="xref" href="ch16.html#version_control_and_branch_management">Version Control and Branch Management</a>.</p><p data-type="footnote" id="ch01fn221"><sup><a href="ch22.html#ch01fn221-marker">2</a></sup>It’s possible in this federated world to say “we’ll just commit to each repo as fast as possible to keep the duration of the build break small!" But that approach really doesn’t scale as the number of federated repositories grows.</p><p data-type="footnote" id="ch01fn222"><sup><a href="ch22.html#ch01fn222-marker">3</a></sup>For a further discussion about this practice, see <a data-type="xref" href="ch15.html#deprecation">Deprecation</a>.</p><p data-type="footnote" id="ch01fn223"><sup><a href="ch22.html#ch01fn223-marker">4</a></sup>By “unfunded mandate,” we mean “additional requirements imposed by an external entity without balancing compensation.” Sort of like when the CEO says that everybody must wear an evening gown for “formal Fridays” but doesn’t give you a corresponding raise to pay for your formal wear.</p><p data-type="footnote" id="ch01fn224"><sup><a href="ch22.html#ch01fn224-marker">5</a></sup>See <a href="https://ieeexplore.ieee.org/abstract/document/8443579"><em class="hyperlink">https://ieeexplore.ieee.org/abstract/document/8443579</em></a>.</p><p data-type="footnote" id="ch01fn225"><sup><a href="ch22.html#ch01fn225-marker">6</a></sup>This probably sounds like overkill, and it likely is. We’re doing active research on the best way to determine the “right” set of tests for a given change, balancing the cost of compute time to run the tests, and the human cost of making the wrong choice.</p><p data-type="footnote" id="ch01fn226"><sup><a href="ch22.html#ch01fn226-marker">7</a></sup>The largest series of LSCs ever executed removed more than one billion lines of code from the repository over the course of three days. This was largely to remove an obsolete part of the repository that had been migrated to a new home; but still, how confident do you have to be to delete one billion lines of code?</p><p data-type="footnote" id="ch01fn227"><sup><a href="ch22.html#ch01fn227-marker">8</a></sup>LSCs are usually supported by tools that make finding, making, and reviewing changes relatively straight <span class="keep-together">forward.</span></p><p data-type="footnote" id="ch01fn228"><sup><a href="ch22.html#ch01fn228-marker">9</a></sup>It is possible to ask TAP for single change “isolated” run, but these are very expensive and are performed only during off-peak hours.</p><p data-type="footnote" id="ch01fn229"><sup><a href="ch22.html#ch01fn229-marker">10</a></sup>There are obvious technical costs here in terms of compute and storage, but the human costs in time to review a change far outweigh the technical ones.</p><p data-type="footnote" id="ch01fn230"><sup><a href="ch22.html#ch01fn230-marker">11</a></sup>For example, we do not want the resulting tools to be used as a mechanism to fight over the proper spelling of “gray” or “grey” in comments.</p><p data-type="footnote" id="ch01fn231"><sup><a href="ch22.html#ch01fn231-marker">12</a></sup>In fact, Go recently introduced these kinds of language features specifically to support large-scale refactorings (see <a href="https://talks.golang.org/2016/refactor.article"><em class="hyperlink">https://talks.golang.org/2016/refactor.article</em></a>).</p><p data-type="footnote" id="ch01fn232"><sup><a href="ch22.html#ch01fn232-marker">13</a></sup>The only kinds of changes that the committee has outright rejected have been those that are deemed dangerous, such as converting all <code>NULL</code> instances to <code>nullptr</code>, or extremely low-value, such as changing spelling from British English to American English, or vice versa. As our experience with such changes has increased and the cost of LSCs has dropped, the threshold for approval has as well.</p><p data-type="footnote" id="ch01fn233"><sup><a href="ch22.html#ch01fn233-marker">14</a></sup>This happens for many reasons: copy-and-paste from existing examples, committing changes that have been in development for some time, or simply reliance on old habits.</p><p data-type="footnote" id="ch01fn234"><sup><a href="ch22.html#ch01fn234-marker">15</a></sup>In actuality, this is the reasoning behind the original work on clang-format for C++.</p><p data-type="footnote" id="ch01fn235"><sup><a href="ch22.html#ch01fn235-marker">16</a></sup>Sadly, the systems we most want to organically decompose are those that are the most resilient to doing so. They are the plastic six-pack rings of the code ecosystem.</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
567
clones/abseil.io/resources/swe-book/html/ch23.html
Normal file
|
@ -0,0 +1,567 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="continuous_integration">
|
||||
<h1>Continuous Integration</h1>
|
||||
|
||||
<p class="byline">Written by Rachel Tannenbaum</p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p><em>Continuous Integration</em>, or CI, is generally <a contenteditable="false" data-primary="continuous integration (CI)" data-type="indexterm" id="ix_CI"> </a>defined as “a software development practice where members of a team integrate their work frequently [...] Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.”<sup><a data-type="noteref" id="ch01fn236-marker" href="ch23.html#ch01fn236">1</a></sup> Simply put, the fundamental goal of CI is to automatically catch problematic changes as early as possible.</p>
|
||||
|
||||
<p>In practice, what does “integrating work frequently” mean for the modern, distributed application? Today’s systems have many moving pieces beyond just the latest versioned code in the repository. In fact, with the recent trend toward microservices, the changes that break an application are less likely to live inside the project’s immediate codebase and more likely to be in loosely coupled microservices on the other side of a network call. Whereas a traditional continuous build tests changes in your binary, an extension of this might test changes to upstream microservices. The dependency is just shifted from your function call stack to an HTTP request or Remote Procedure Calls (RPC).</p>
|
||||
|
||||
<p>Even further from code dependencies, an application might periodically ingest data or update machine learning models. It might execute on evolving operating systems, runtimes, cloud hosting services, and devices. It might be a feature that sits on top of a growing platform or be the platform that must accommodate a growing feature base. All of these things should be considered dependencies, and we should aim to “continuously integrate” their changes, too. Further complicating things, these changing components are often owned by developers outside our team, organization, or company and deployed on their own schedules.</p>
|
||||
|
||||
<p>So, perhaps a better definition for CI in today’s world, particularly when developing at scale, is the following:</p>
|
||||
|
||||
<blockquote>
|
||||
<p><em>Continuous Integration (2)</em>: the continuous assembling and testing of our entire complex and rapidly evolving ecosystem.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>It is natural to conceptualize CI in terms of testing because the two are tightly coupled, and we’ll do so throughout this chapter. In previous chapters, we’ve discussed a comprehensive range of testing, from unit to integration, to larger-scoped systems.</p>
|
||||
|
||||
<p>From a testing perspective, CI is a paradigm<a contenteditable="false" data-primary="testing" data-secondary="continuous integration and" data-type="indexterm" id="id-58SMH4sO"> </a> to inform the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p><em>Which</em> tests to run <em>when</em> in the development/release workflow, as code (and other) changes are continuously integrated into it</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>How</em> to compose the system under test (SUT) at each point, balancing concerns like fidelity and setup cost</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>For example, which tests do we run on presubmit, which do we save for post-submit, and which do we save even later until our staging deploy? Accordingly, how do we represent our SUT at each of these points? As you might imagine, requirements for a presubmit SUT can differ significantly from those of a staging environment under test. For example, it can be dangerous for an application built from code pending review on presubmit to talk to real production backends (think security and quota vulnerabilities), whereas this is often acceptable for a staging environment.</p>
|
||||
|
||||
<p>And <em>why</em> should we try to optimize this often-delicate balance of testing “the right things” at “the right times” with CI? Plenty of prior work has already established the benefits of CI to the engineering organization and the overall business alike.<sup><a data-type="noteref" id="ch01fn237-marker" href="ch23.html#ch01fn237">2</a></sup> These outcomes are driven by a powerful guarantee: verifiable—and timely—proof that the application is good to progress to the next stage. We don’t need to just hope that all contributors are very careful, responsible, and thorough; we can instead guarantee the working state of our application at various points from build throughout release, thereby improving confidence and quality in our products and productivity of our teams.</p>
|
||||
|
||||
<p>In the rest of this chapter, we’ll introduce some key CI concepts, best practices and challenges, before looking at how we manage CI at Google with an introduction to our continuous build tool, TAP, and an in-depth study of one application’s CI <span class="keep-together">transformation.</span></p>
|
||||
|
||||
<section data-type="sect1" id="ci_concepts">
|
||||
<h1>CI Concepts</h1>
|
||||
|
||||
<p>First, let’s begin by looking at<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-type="indexterm" id="ix_CIcpt"> </a> some core concepts of CI.</p>
|
||||
|
||||
<section data-type="sect2" id="fast_feedback_loops">
|
||||
<h2>Fast Feedback Loops</h2>
|
||||
|
||||
<p>As discussed in <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>, the cost of a bug <a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-tertiary="fast feedback loops" data-type="indexterm" id="ix_CIcptfdbk"> </a>grows almost <a contenteditable="false" data-primary="feedback" data-secondary="fast feedback loops in CI" data-type="indexterm" id="ix_fdbkCI"> </a>exponentially the later it is caught. <a data-type="xref" href="ch23.html#life_of_a_code_change">Figure 23-1</a> shows all the places a problematic code change might be caught in its lifetime.</p>
|
||||
|
||||
<figure id="life_of_a_code_change"><img alt="Life of a code change" src="images/seag_2301.png">
|
||||
<figcaption><span class="label">Figure 23-1. </span>Life of a code change</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>In general, as issues progress to the "right" in our diagram, they become costlier for the following reasons:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>They must be triaged by an engineer who is likely unfamiliar with the problematic code change.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They require more work for the code change author to recollect and investigate the change.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>They negatively affect others, whether engineers in their work or ultimately the end user.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>To minimize the cost of bugs, CI encourages us to use <em>fast feedback loops.</em><sup><a data-type="noteref" id="ch01fn238-marker" href="ch23.html#ch01fn238">3</a></sup> Each time we integrate a code (or other) change into a testing scenario and observe the results, we get a new <em>feedback loop</em>. Feedback can take many forms; following are some common ones (in order of fastest to slowest):</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>The edit-compile-debug loop of local development</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Automated test results to a code change author on presubmit</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>An integration error between changes to two projects, detected after both are submitted and tested together (i.e., on post-submit)</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>An incompatibility between our project and an upstream microservice dependency, detected by a QA tester in our staging environment, when the upstream service deploys its latest changes</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Bug reports by internal users who are opted in to a feature before external users</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Bug or outage reports by external users or the press</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p><em>Canarying</em>—or deploying to a small percentage<a contenteditable="false" data-primary="canarying" data-type="indexterm" id="id-w0SlhwuptaH0"> </a> of production first—can help minimize issues that do make it to production, with a subset-of-production initial feedback loop preceding all-of-production. However, canarying can cause problems, too, particularly around compatibility between deployments when multiple versions are deployed at once. This is sometimes known as <em>version skew</em>, a state of a distributed system in which it contains multiple incompatible versions of code, data, and/or configuration. Like many issues we look at in this book, version skew is another example of a challenging problem that can arise when trying to develop and manage software over time.</p>
|
||||
|
||||
<p><em>Experiments</em> and <em>feature flags</em> are extremely powerful feedback loops.<a contenteditable="false" data-primary="experiments and feature flags" data-type="indexterm" id="id-WqSrtRU0tRHd"> </a> They reduce deployment risk by isolating changes within modular components that can be dynamically toggled in production.<a contenteditable="false" data-primary="feature flags" data-type="indexterm" id="id-JoSNcqUet9HD"> </a> Relying heavily on feature-flag-guarding is a common paradigm for Continuous Delivery, which we explore further in <a data-type="xref" href="ch24.html#continuous_delivery-id00035">Continuous Delivery</a>.</p>
|
||||
|
||||
<section data-type="sect3" id="accessible_and_actionable_feedback">
|
||||
<h3>Accessible and actionable feedback</h3>
|
||||
|
||||
<p>It’s also important that feedback from CI be widely accessible. In addition to our open culture around code visibility, we feel similarly about our test reporting. We have a unified test reporting system in which anyone can easily look up a build or test run, including all logs (excluding user Personally Identifiable Information [PII]), whether for an individual engineer’s local run or on an automated development or staging build.</p>
|
||||
|
||||
<p>Along with logs, our test reporting system provides a detailed history of when build or test targets began to fail, including audits of where the build was cut at each run, where it was run, and by whom. We also have a system for flake classification, which uses statistics to classify flakes at a Google-wide level, so engineers don’t need to figure this out for themselves to determine whether their change broke another project’s test (if the test is flaky: probably not).</p>
|
||||
|
||||
<p>Visibility into test history empowers engineers to share and collaborate on feedback, an essential requirement for disparate teams to diagnose and learn from integration failures between their systems. Similarly, bugs (e.g., tickets or issues) at Google are open with full comment history for all to see and learn from (with the exception, again, of customer PII).</p>
|
||||
|
||||
<p>Finally, any feedback from CI tests should not just be accessible but actionable—easy to use to find and fix problems. We’ll look at an example of improving user-unfriendly feedback in our case study later in this chapter. By improving test output readability, you automate the understanding of feedback.<a contenteditable="false" data-primary="feedback" data-secondary="fast feedback loops in CI" data-startref="ix_fdbkCI" data-type="indexterm" id="id-xbSkHWfas7tYH4"> </a><a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-startref="ix_CIcptfdbk" data-tertiary="fast feedback loops" data-type="indexterm" id="id-2ESWhkf0sAtqHw"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="automation">
|
||||
<h2>Automation</h2>
|
||||
|
||||
<p>It’s well known that <a href="https://oreil.ly/UafCh">automating development-related tasks saves engineering resources</a> in the long run.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-tertiary="automation" data-type="indexterm" id="ix_CIcptauto"> </a><a contenteditable="false" data-primary="automation" data-secondary="in continous integration" data-type="indexterm" id="ix_autoCI"> </a> Intuitively, because we automate processes by defining them as code, peer review when changes are checked in will reduce the probability of error. Of course, automated processes, like any other software, will have bugs; but when implemented effectively, they are still faster, easier, and more reliable than if they were attempted manually by engineers.</p>
|
||||
|
||||
<p>CI, specifically, automates the <em>build</em> and <em>release</em> processes, with a Continuous Build and Continuous Delivery. Continuous testing is applied throughout, which we’ll look at in the next section.</p>
|
||||
|
||||
<section data-type="sect3" id="continuous_build">
|
||||
<h3>Continuous Build</h3>
|
||||
|
||||
<p>The <em>Continuous Build</em> (CB) integrates the latest code changes at head<sup><a data-type="noteref" id="ch01fn239-marker" href="ch23.html#ch01fn239">4</a></sup> and runs an automated build and test. <a contenteditable="false" data-primary="continuous build (CB)" data-type="indexterm" id="id-BvSytBhDcAc7Hz"> </a>Because the CB runs tests as well as building code, “breaking the build” or “failing the build” includes breaking tests as well as breaking <span class="keep-together">compilation.</span></p>
|
||||
|
||||
<p>After a change is submitted, the CB should run all relevant tests. If a change passes all tests, the CB marks it passing or “green,” as it is often displayed in user interfaces (UIs). This process effectively introduces two different versions of head in the repository: <em>true head</em>, or the latest change that was committed, and <em>green head,</em> or the latest change the CB has verified. Engineers are able to sync to either version in their local development. It’s common to sync against green head to work with a stable environment, verified by the CB, while coding a change but have a process that requires changes to be synced to true head before submission.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="continuous_delivery">
|
||||
<h3>Continuous Delivery</h3>
|
||||
|
||||
<p>The first step in Continuous Delivery (CD; discussed more fully in <a data-type="xref" href="ch24.html#continuous_delivery-id00035">Continuous Delivery</a>) is <em>release automation</em>, which continuously assembles the latest code and configuration from head into release candidates. <a contenteditable="false" data-primary="continuous delivery (CD)" data-type="indexterm" id="id-w0SrtAh5fqc2H3"> </a>At Google, most teams cut these at green, as opposed to true, head.</p>
|
||||
|
||||
<blockquote>
|
||||
<p><em>Release candidate</em> (RC): A cohesive, deployable unit created by an automated process,<sup><a data-type="noteref" id="ch01fn240-marker" href="ch23.html#ch01fn240">5</a></sup> assembled of code, configuration, and other dependencies that have passed the continuous build.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>Note that we include configuration in release candidates—this is extremely important, even though it can slightly vary between environments as the candidate is promoted. We’re not necessarily advocating you compile configuration into your binaries—actually, we would recommend dynamic configuration, such as experiments or feature flags, for many scenarios.<sup><a data-type="noteref" id="ch01fn241-marker" href="ch23.html#ch01fn241">6</a></sup></p>
|
||||
|
||||
<p>Rather, we are saying that any static configuration you <em>do</em> have should be promoted as part of the release candidate so that it can undergo testing along with its corresponding code. Remember, a large percentage of production bugs are caused by “silly” configuration problems, so it’s just as important to test your configuration as it is your code (and to test it along <em>with</em> the same code that will use it). Version skew is often caught in this release-candidate-promotion process. This assumes, of course, that your static configuration is in version control—at Google, static configuration is in version control along with the code, and hence goes through the same code review process.</p>
|
||||
|
||||
<p>We then define CD as follows:</p>
|
||||
|
||||
<blockquote>
|
||||
<p><em>Continuous Delivery</em> (CD): a continuous assembling of release candidates, followed by the promotion and testing of those candidates throughout a series of environments—sometimes reaching production and sometimes not.</p>
|
||||
</blockquote>
|
||||
|
||||
<p>The promotion and deployment process often depends on the team. We’ll show how our case study navigated this process.</p>
|
||||
|
||||
<p>For teams at Google that want continuous feedback from new changes in production (e.g., Continuous Deployment), it’s usually infeasible to continuously push entire binaries, which are often quite large, on green. For that reason, doing a <em>selective</em> Continuous Deployment, through experiments or feature flags, is a common strategy.<sup><a data-type="noteref" id="ch01fn242-marker" href="ch23.html#ch01fn242">7</a></sup></p>
|
||||
|
||||
<p>As an RC progresses through environments, its artifacts (e.g., binaries, containers) ideally should not be recompiled or rebuilt. Using containers such as Docker helps enforce consistency of an RC between environments, from local development onward. Similarly, using orchestration tools like Kubernetes (or in our case, usually <a href="https://oreil.ly/89yPv">Borg</a>), helps enforce consistency between deployments. By enforcing consistency of our release and deployment between environments, we achieve higher-fidelity earlier testing and fewer surprises in production.<a contenteditable="false" data-primary="automation" data-secondary="in continous integration" data-startref="ix_autoCI" data-type="indexterm" id="id-eaS1hGsbfocAHp"> </a><a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-startref="ix_CIcptauto" data-tertiary="automation" data-type="indexterm" id="id-0KSkt1snfpczH2"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="continuous_testing">
|
||||
<h2>Continuous Testing</h2>
|
||||
|
||||
<p>Let’s look at<a contenteditable="false" data-primary="testing" data-secondary="continuous testing in CI" data-type="indexterm" id="ix_tstCI"> </a> how CB and CD fit in as we apply <a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-tertiary="continuous testing" data-type="indexterm" id="ix_CIcptCT"> </a>Continuous Testing (CT) to a code change throughout its lifetime, as shown <a data-type="xref" href="ch23.html#life_of_a_code_change_with_cb_and_cd">Figure 23-2</a>.</p>
|
||||
|
||||
<figure id="life_of_a_code_change_with_cb_and_cd"><img alt="Life of a code change with CB and CD" src="images/seag_2302.png">
|
||||
<figcaption><span class="label">Figure 23-2. </span>Life of a code change with CB and CD</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>The rightward arrow shows the progression of a single code change from local development to production. Again, one of our key objectives in CI is determining <em>what</em> to test <em>when</em> in this progression. Later in this chapter, we’ll introduce the different testing phases and provide some considerations for what to test in presubmit versus post-submit, and in the RC and beyond. We’ll show that, as we shift to the right, the code change is subjected to progressively larger-scoped automated tests.</p>
|
||||
|
||||
<section data-type="sect3" id="why_presubmit_isnapostrophet_enough">
|
||||
<h3>Why presubmit isn’t enough</h3>
|
||||
|
||||
<p>With the objective to catch problematic changes as soon as possible and the ability to run automated tests on presubmit, you might <a contenteditable="false" data-primary="presubmits" data-secondary="continuous testing and" data-type="indexterm" id="id-BvSBHBhJf3f7Hz"> </a>be wondering: why not just run all tests on presubmit?</p>
|
||||
|
||||
<p>The main reason is that it’s too expensive. Engineer productivity is extremely valuable, and waiting a long time to run every test during code submission can be severely disruptive. Further, by removing the constraint for presubmits to be exhaustive, a lot of efficiency gains can be made if tests pass far more frequently than they fail. For example, the tests that are run can be restricted to certain scopes, or selected based on a model that predicts their likelihood of detecting a failure.</p>
|
||||
|
||||
<p>Similarly, it’s expensive for engineers to be blocked on presubmit by failures arising from instability or flakiness that has nothing to do with their code change.</p>
|
||||
|
||||
<p>Another reason is that during the time we run presubmit tests to confirm that a change is safe, the underlying repository might have changed in a manner that is incompatible with the changes being tested. That is, it is possible for two changes that touch completely different files to cause a test to fail. We call this a mid-air collision, and though generally rare, it happens most days at our scale. CI systems for smaller repositories or projects can avoid this problem by serializing submits so that there is no difference between what is about to enter and what just did.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="presubmit_versus_postsubmit">
|
||||
<h3>Presubmit versus post-submit</h3>
|
||||
|
||||
<p>So, which tests <em>should</em> be run on presubmit? <a contenteditable="false" data-primary="presubmits" data-secondary="versus postsubmit" data-type="indexterm" id="id-47S5hXhGCafQH9"> </a>Our general rule of thumb is: only fast, reliable ones. You can accept some loss of coverage on presubmit, but that means you need to catch any issues that slip by on post-submit, and accept some number of rollbacks. On post-submit, you can accept longer times and some instability, as long as you have proper mechanisms to deal with it.</p>
|
||||
|
||||
<div data-type="note" id="id-wBsrtyC5faH0"><h6>Note</h6>
|
||||
<p>We’ll show how TAP and our case study handle failure management in <a data-type="xref" href="ch23.html#ci_at_google">CI at Google</a>.</p>
|
||||
</div>
|
||||
|
||||
<p>We don’t want to waste valuable engineer productivity by waiting too long for slow tests or for too many tests—we typically limit presubmit tests to just those for the project where the change is happening. We also run tests concurrently, so there is a resource decision to consider as well. Finally, we don’t want to run unreliable tests on presubmit, because the cost of having many engineers affected by them, debugging the same problem that is not related to their code change, is too high.</p>
|
||||
|
||||
<p>Most teams at Google run their small tests (like unit tests) on presubmit<sup><a data-type="noteref" id="ch01fn243-marker" href="ch23.html#ch01fn243">8</a></sup>—these are the obvious ones to run as they tend to be the fastest and most reliable. Whether and how to run larger-scoped tests on presubmit is the more interesting question, and this varies by team. For teams that do want to run them, hermetic testing is a proven approach to reducing their inherent instability. Another option is to allow large-scoped tests to be unreliable on presubmit but disable them aggressively when they start failing.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="release_candidate_testing">
|
||||
<h3>Release candidate testing</h3>
|
||||
|
||||
<p>After a code change has passed the CB (this might take <a contenteditable="false" data-primary="release candidate testing" data-type="indexterm" id="id-47SvHXh1IafQH9"> </a>multiple cycles if there were failures), it will soon encounter CD and be included in a pending release candidate.</p>
|
||||
|
||||
<p>As CD builds RCs, it will run larger tests against the entire candidate. We test a release candidate by promoting it through a series of test environments and testing it at each deployment. This can include a combination of sandboxed, <span class="keep-together">temporary</span> environments and shared test environments, like dev or staging. It’s common to include some manual QA testing of the RC in shared environments, too.</p>
|
||||
|
||||
<p>There are several reasons why it’s important to run a comprehensive, automated test suite against an RC, even if it is the same suite that CB just ran against the code on post-submit (assuming the CD cuts at green):</p>
|
||||
|
||||
<dl>
|
||||
<dt>As a sanity check</dt>
|
||||
<dd>We double check that nothing strange happened when the code was cut and recompiled in the RC.</dd>
|
||||
<dt>For auditability</dt>
|
||||
<dd>If an engineer wants to check an RC’s test results, they are readily available and associated with the RC, so they don’t need to dig through CB logs to find them.</dd>
|
||||
<dt>To allow for cherry picks</dt>
|
||||
<dd>If you apply a cherry-pick fix to an RC, your source code has now diverged from the latest cut tested by the CB.</dd>
|
||||
<dt>For emergency pushes</dt>
|
||||
<dd>In that case, CD can cut from true head and run the minimal set of tests necessary to feel confident about an emergency push, without waiting for the full CB to pass.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="production_testing">
|
||||
<h3>Production testing</h3>
|
||||
|
||||
<p>Our continuous, automated testing process goes all the way to the final deployed environment: production.<a contenteditable="false" data-primary="production" data-secondary="testing in" data-type="indexterm" id="id-WqSKHoh4u9fMHw"> </a> We should run the same suite of tests against production (sometimes called <em>probers</em>) that we did against the release candidate earlier on to verify: 1) the working state of production, according to our tests, and 2) the relevance of our tests, according to production.</p>
|
||||
|
||||
<p>Continuous testing at each step of the application’s progression, each with its own trade-offs, serves as a reminder of the value in a “defense in depth” approach to catching bugs—it isn’t just one bit of technology or policy that we rely upon for quality and stability, it’s many testing approaches combined.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-startref="ix_CIcptCT" data-tertiary="continuous testing" data-type="indexterm" id="id-JoSEHQtpubfdHP"> </a><a contenteditable="false" data-primary="testing" data-secondary="continuous testing in CI" data-startref="ix_tstCI" data-type="indexterm" id="id-mAS5hGtnudf3HN"> </a><a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="core concepts" data-startref="ix_CIcpt" data-type="indexterm" id="id-xbSBtqtyuafYH4"> </a></p>
|
||||
|
||||
<aside data-type="sidebar" id="ci_is_alerting">
|
||||
<h5>CI Is Alerting</h5>
|
||||
|
||||
<p class="byline">Titus Winters</p>
|
||||
|
||||
<p>As with responsibly running production<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="alerting" data-type="indexterm" id="ix_CIalrt"> </a> systems, sustainably maintaining software systems also requires continual automated monitoring. Just as we use a monitoring and alerting system to understand how production systems respond to change, CI reveals how our software is responding to changes in its environment. Whereas production monitoring relies on passive alerts and active probers of running systems, CI uses unit and integration tests to detect changes to the software before it is deployed. Drawing comparisons between these two domains lets us apply knowledge from one to the other.</p>
|
||||
|
||||
<p>Both CI and alerting serve the same overall purpose in the developer workflow—to identify problems as quickly as reasonably possible. CI emphasizes the early side of the developer workflow, and catches problems by surfacing test failures. Alerting focuses on the late end of the same workflow and catches problems by monitoring metrics and reporting when they exceed some threshold. Both are forms of “identify problems automatically, as soon as possible.”</p>
|
||||
|
||||
<p>A well-managed alerting system helps to ensure that your Service-Level Objectives (SLOs) are being met. A good CI system helps to ensure that your build is in good shape—the code compiles, tests pass, and you could deploy a new release if you needed to. Best-practice policies in both spaces focus a lot on ideas of fidelity and actionable alerting: tests should fail only when the important underlying invariant is violated, rather than because the test is brittle or flaky. A flaky test that fails every few CI runs is just as much of a problem as a spurious alert going off every few minutes and generating a page for the on-call. If it isn’t actionable, it shouldn’t be alerting. If it isn’t actually violating the invariants of the SUT, it shouldn’t be a test failure.</p>
|
||||
|
||||
<p>CI and alerting share an underlying conceptual framework. For instance, there’s a similar relationship between localized signals (unit tests, monitoring of isolated statistics/cause-based alerting) and cross-dependency signals (integration and release tests, black-box probing). The highest fidelity indicators of whether an aggregate system is working are the end-to-end signals, but we pay for that fidelity in flakiness, increasing resource costs, and difficulty in debugging root causes.</p>
|
||||
|
||||
<p>Similarly, we see an underlying connection in the failure modes for both domains. Brittle cause-based alerts fire based on crossing an arbitrary threshold (say, retries in the past hour), without there necessarily being a fundamental connection between that threshold and system health as seen by an end user. Brittle tests fail when an arbitrary test requirement or invariant is violated, without there necessarily being a fundamental connection between that invariant and the correctness of the software being tested. In most cases these are easy to write, and potentially helpful in debugging a larger issue. In both cases they are rough proxies for overall health/correctness, failing to capture the holistic behavior. If you don’t have an easy end-to-end probe, but you do make it easy to collect some aggregate statistics, teams will write threshold alerts based on arbitrary statistics. If you don’t have a high-level way to say, “Fail the test if the decoded image isn’t roughly the same as this decoded image,” teams will instead build tests that assert that the byte streams are identical.</p>
|
||||
|
||||
<p>Cause-based alerts and brittle tests can still have value; they just aren’t the ideal way to identify potential problems in an alerting scenario. In the event of an actual failure, having more debug detail available can be useful. When SREs are debugging an outage, it can be useful to have information of the form, “An hour ago users, started experiencing more failed requests. Around the same, time the number of retries started ticking up. Let’s start investigating there.” Similarly, brittle tests can still provide extra debugging information: “The image rendering pipeline started spitting out garbage. One of the unit tests suggests that we’re getting different bytes back from the JPEG compressor. Let’s start investigating there.”</p>
|
||||
|
||||
<p>Although monitoring and alerting are considered a part of the SRE/production management domain, where the insight of “Error Budgets” is well understood,<sup><a data-type="noteref" id="ch01fn244-marker" href="ch23.html#ch01fn244">9</a></sup> CI comes from a perspective that still tends to be focused on absolutes. Framing CI as the “left shift” of alerting starts to suggest ways to reason about those policies and propose better best practices:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Having a 100% green rate on CI, just like having 100% uptime for a production service, is awfully expensive. If that is <em>actually</em> your goal, one of the biggest problems is going to be a race condition between testing and submission.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Treating every alert as an equal cause for alarm is not generally the correct approach. If an alert fires in production but the service isn’t actually impacted, silencing the alert is the correct choice. The same is true for test failures: until our CI systems learn how to say, “This test is known to be failing for irrelevant reasons,” we should probably be more liberal in accepting changes that disable a failed test. Not all test failures are indicative of upcoming production issues.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Policies that say, “Nobody can commit if our latest CI results aren’t green” are probably misguided. If CI reports an issue, such failures should definitely be <em>investigated</em> before letting people commit or compound the issue. But if the root cause is well understood and clearly would not affect production, blocking commits is unreasonable.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>This “CI is alerting” insight is new, and we’re still figuring out how to fully draw parallels. Given the higher stakes involved, it’s unsurprising that SRE has put a lot of thought into best practices surrounding monitoring and alerting, whereas CI has been viewed as more of a luxury feature.<sup><a data-type="noteref" id="ch01fn245-marker" href="ch23.html#ch01fn245">10</a></sup> For the next few years, the task in software engineering will be to see where existing SRE practice can be reconceptualized in a CI context to help reformulate the testing and CI landscape—and perhaps where best practices in testing can help clarify goals and policies on monitoring and alerting.</p>
|
||||
</aside>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="ci_challenges">
|
||||
<h2>CI Challenges</h2>
|
||||
|
||||
<p>We’ve discussed some of the established best <a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="alerting" data-tertiary="CI challenges" data-type="indexterm" id="id-dzSxHlhNCAHv"> </a>practices in CI and have introduced some of the challenges involved, such as the potential disruption to engineer productivity of unstable, slow, conflicting, or simply too many tests at presubmit. Some common additional challenges when implementing CI include the following:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p><em>Presubmit optimization</em>, including <em>which</em> tests to run at presubmit time given the potential issues we’ve already described, and <em>how</em> to run them.<a contenteditable="false" data-primary="presubmits" data-secondary="optimization of" data-type="indexterm" id="id-w0SxcNHWH3tyC4HV"> </a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Culprit finding</em> and <em>failure isolation</em>: Which <a contenteditable="false" data-primary="culprit finding and failure isolation" data-type="indexterm" id="id-w0SrtNH9h3tyC4HV"> </a>code or<a contenteditable="false" data-primary="failures" data-secondary="culprit finding and failure isolation" data-type="indexterm" id="id-47SocdHrhRt1CPHd"> </a> other change caused the problem, and which system did it happen in? “Integrating upstream microservices" is one approach to failure isolation in a distributed architecture, when you want to figure out whether a problem originated in your own servers or a backend. In this approach, you stage combinations of your stable servers along with upstream microservices’ new servers. (Thus, you are integrating the microservices’ latest changes into your testing.) This approach can be particularly challenging due to version skew: not only are these environments often incompatible, but you’re also likely to encounter false positives—problems that occur in a particular staged combination that wouldn’t actually be spotted in production.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Resource constraints</em>: Tests need resources to run, and large tests can be very expensive.<a contenteditable="false" data-primary="resource constraints, CI and" data-type="indexterm" id="id-w0SlhNHpt3tyC4HV"> </a> In addition, the cost for the infrastructure for inserting automated testing throughout the process can be considerable.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>There’s also the challenge of <em>failure management—</em>what to do when tests fail. Although smaller problems can usually be fixed quickly, many of our teams find that it’s extremely difficult to have a consistently green test suite when large end-to-end tests are involved. They inherently become broken or flaky and are difficult to debug; there needs to be a mechanism to temporarily disable and keep track of them so that the release can go on. A common technique at Google is to use bug “hotlists” filed by an on-call or release engineer and triaged to the appropriate team. Even better is when these bugs can be automatically generated and filed—some of our larger products, like Google Web Server (GWS) and Google Assistant, do this. These hotlists should be curated to make sure any release-blocking bugs are fixed immediately. Nonrelease blockers should be fixed, too; they are less urgent, but should also be prioritized so the test suite remains useful and is not simply a growing pile of disabled, old tests. Often, the problems caught by end-to-end test failures are actually with tests rather than code.</p>
|
||||
|
||||
<p>Flaky tests pose another problem to this process.<a contenteditable="false" data-primary="flaky tests" data-type="indexterm" id="id-BvSBHQfzCoHl"> </a> They erode confidence similar to a broken test, but finding a change to roll back is often more difficult because the failure won’t happen all the time. Some teams rely on a tool to remove such flaky tests from presubmit temporarily while the flakiness is investigated and fixed. This keeps confidence high while allowing for more time to fix the problem.</p>
|
||||
|
||||
<p><em>Test instability</em> is another significant challenge that we’ve already looked at in the context of presubmits.<a contenteditable="false" data-primary="test instability" data-type="indexterm" id="id-47S5h6CGCJHv"> </a> One tactic for dealing with this is to allow multiple attempts of the test to run. This is a common test configuration setting that teams use. Also, within test code, retries can be introduced at various points of specificity.</p>
|
||||
|
||||
<p>Another approach that helps with test instability (and other CI challenges) is hermetic testing, which we’ll look at in the next section.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="hermetic_testing">
|
||||
<h2>Hermetic Testing</h2>
|
||||
|
||||
<p>Because talking to a live backend is unreliable, we<a contenteditable="false" data-primary="hermetic testing" data-type="indexterm" id="id-nBSYHVhxIBHX"> </a> often use <a href="https://oreil.ly/-PbRM">hermetic backends</a> for larger-scoped tests.<a contenteditable="false" data-primary="testing" data-secondary="hermetic" data-type="indexterm" id="id-BvSytBhpIoHl"> </a><a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="alerting" data-tertiary="hermetic testing" data-type="indexterm" id="id-w0SxcAhkIaH0"> </a> This is particularly useful when we want to run these tests on presubmit, when stability is of utmost importance. In <a data-type="xref" href="ch11.html#testing_overview">Testing Overview</a>, we introduced the concept of hermetic tests:</p>
|
||||
|
||||
<blockquote>
|
||||
<p><em>Hermetic tests</em>: tests run against a test environment (i.e., application servers and resources) that is entirely self-contained (i.e., no external dependencies like production <span class="keep-together">backends</span>).</p>
|
||||
</blockquote>
|
||||
|
||||
<p>Hermetic tests have two important properties: greater determinism (i.e., stability) and isolation. Hermetic servers are still prone to some sources of nondeterminism, like system time, random number generation, and race conditions. But, what goes into the test doesn’t change based on outside dependencies, so when you run a test twice with the same application and test code, you should get the same results. If a hermetic test fails, you know that it’s due to a change in your application code or tests (with a minor caveat: they can also fail due to a restructuring of your hermetic test environment, but this should not change very often). For this reason, when CI systems rerun tests hours or days later to provide additional signals, hermeticity makes test failures easier to narrow down.</p>
|
||||
|
||||
<p>The other important property, isolation, means that problems in production should not affect these tests. We generally run these tests all on the same machine as well, so we don’t have to worry about network connectivity issues. The reverse also holds: problems caused by running hermetic tests should not affect production.</p>
|
||||
|
||||
<p>Hermetic test success should not depend on the user running the test. This allows people to reproduce tests run by the CI system and allows people (e.g., library developers) to run tests owned by other teams.</p>
|
||||
|
||||
<p>One type of hermetic backend is a fake.<a contenteditable="false" data-primary="faking" data-secondary="fake hermetic backend" data-type="indexterm" id="id-WqSKHOIzIRHd"> </a> As discussed in <a data-type="xref" href="ch13.html#test_doubles">Test Doubles</a>, these can be cheaper than running a real backend, but they take work to maintain and have limited fidelity.</p>
|
||||
|
||||
<p>The cleanest option to achieve a presubmit-worthy integration test is with a fully hermetic setup—that is, starting up the entire stack sandboxed<sup><a data-type="noteref" id="ch01fn246-marker" href="ch23.html#ch01fn246">11</a></sup>—and Google provides out-of-the-box sandbox configurations for popular components, like databases, to make it easier.<a contenteditable="false" data-primary="sandboxing" data-secondary="hermetic testing and" data-type="indexterm" id="id-mAS5hPuOIRHy"> </a> This is more feasible for smaller applications with fewer components, but there are exceptions at Google, even one (by DisplayAds) that starts about four hundred servers from scratch on every presubmit as well as continuously on post-submit. Since the time that system was created, though, record/replay has emerged as a more popular paradigm for larger systems and tends to be cheaper than starting up a large sandboxed stack.</p>
|
||||
|
||||
<p>Record/replay (see <a data-type="xref" href="ch14.html#larger_testing">Larger Testing</a>) systems <a contenteditable="false" data-primary="record/replay systems" data-type="indexterm" id="id-xbSXhQUYIpHX"> </a>record live backend responses, cache them, and replay them in a hermetic test environment. Record/replay is a powerful tool for reducing test instability, but one<a contenteditable="false" data-primary="brittle tests" data-secondary="record/replay systems causing" data-type="indexterm" id="id-2ESAtaU9IvHm"> </a> downside is that it leads to brittle tests: it’s difficult to strike a balance between the following:</p>
|
||||
|
||||
<dl>
|
||||
<dt>False positives</dt>
|
||||
<dd>The test passes when it probably shouldn’t have because we are hitting the cache too much and missing problems that would surface when capturing a new response.</dd>
|
||||
<dt>False negatives</dt>
|
||||
<dd>The test fails when it probably shouldn’t have because we are hitting the cache too little. This requires responses to be updated, which can take a long time and lead to test failures that must be fixed, many of which might not be actual problems. This process is often submit-blocking, which is not ideal.</dd>
|
||||
</dl>
|
||||
|
||||
<p>Ideally, a record/replay system should detect only problematic changes and cache-miss only when a request has changed in a meaningful way. In the event that that change causes a problem, the code change author would rerun the test with an updated response, see that the test is still failing, and thereby be alerted to the problem. In practice, knowing when a request has changed in a meaningful way can be incredibly difficult in a large and ever-changing system.</p>
|
||||
|
||||
<aside data-type="sidebar" id="the_hermetic_google_assistant">
|
||||
<h5>The Hermetic Google Assistant</h5>
|
||||
|
||||
<p>Google Assistant<a contenteditable="false" data-primary="Google Assistant" data-type="indexterm" id="id-0KSQHvhBT7IzH2"> </a> provides a framework for engineers to run end-to-end tests, including a <a contenteditable="false" data-primary="hermetic testing" data-secondary="Google Assistant" data-type="indexterm" id="id-b6SBhmhGTrIeHe"> </a>test fixture with functionality for setting up queries, specifying whether to simulate on a phone or a smart home device, and validating responses throughout an exchange with Google Assistant.</p>
|
||||
|
||||
<p>One of its greatest success stories was making its test suite fully hermetic on presubmit. When the team previously used to run nonhermetic tests on presubmit, the tests would routinely fail. In some days, the team would see more than 50 code changes bypass and ignore the test results. In moving presubmit to hermetic, the team cut the runtime by a factor of 14, with virtually no flakiness. It still sees failures, but those failures tend to be fairly easy to find and roll back.</p>
|
||||
|
||||
<p>Now that nonhermetic tests have been pushed to post-submit, it results in failures accumulating there instead. Debugging failing end-to-end tests is still difficult, and some teams don’t have time to even try, so they just disable them. That’s better than having it stop all development for everyone, but it can result in production failures.</p>
|
||||
|
||||
<p>One of the team’s current challenges is to continue to fine-tuning its caching mechanisms so that presubmit can catch more types of issues that have been discovered only post-submit in the past, without introducing too much brittleness.</p>
|
||||
|
||||
<p>Another is how to do presubmit testing for the decentralized Assistant given that components are shifting into their own microservices. Because the Assistant has a large and complex stack, the cost of running a hermetic stack on presubmit, in terms of engineering work, coordination, and resources, would be very high.</p>
|
||||
|
||||
<p>Finally, the team is taking advantage of this decentralization in a clever new post-submit failure-isolation strategy. For each of the <em>N</em> microservices within the Assistant, the team will run a post-submit environment containing the microservice built at head, along with production (or close to it) versions of the other <em>N</em> – 1 services, to isolate problems to the newly built server. This setup would normally be <em>O</em>(<em>N</em><sup>2</sup>) cost to facilitate, but the team leverages a cool feature called <em>hotswapping</em> to cut this cost to <em>O</em>(<em>N</em>). Essentially, hotswapping allows a request to instruct a server to “swap” in the address of a backend to call instead of the usual one. So only <em>N</em> servers need to be run, one for each of the microservices cut at head—and they can reuse the same set of prod backends swapped in to each of these <em>N</em> “environments.”</p>
|
||||
</aside>
|
||||
|
||||
<p>As we’ve seen in this section, hermetic testing can both reduce instability in larger-scoped tests and help isolate failures—addressing two of the significant CI challenges we identified in the previous section. However, hermetic backends can also be more expensive because they use more resources and are slower to set up. Many teams use combinations of hermetic and live backends in their test environments.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="alerting" data-startref="ix_CIalrt" data-type="indexterm" id="id-0KSQHXFYIQHV"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="ci_at_google">
|
||||
<h1>CI at Google</h1>
|
||||
|
||||
<p>Now let’s look in more detail at how CI is implemented at Google.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="implementation at Google" data-type="indexterm" id="ix_CIGoo"> </a> First, we’ll look at our global continuous build, TAP, used by the vast majority of teams at Google, and how it enables some of the practices and addresses some of the challenges that we looked at in the previous section. We’ll also look at one application, Google Takeout, and how a CI transformation helped it scale both as a platform and as a service.</p>
|
||||
|
||||
<aside data-type="sidebar" id="tap_googleapostrophes_global_continuous">
|
||||
<h5>TAP: Google’s Global Continuous Build</h5>
|
||||
|
||||
<p class="byline">Adam Bender</p>
|
||||
|
||||
<p>We run a massive continuous build, called the Test Automation Platform (TAP), of our entire codebase.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="implementation at Google" data-tertiary="TAP, global continuous build" data-type="indexterm" id="ix_CIGooTAP"> </a><a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-type="indexterm" id="ix_TAP"> </a> It is responsible for running the majority of our automated tests. As a direct consequence of our use of a monorepo, TAP is the gateway for almost all changes at Google. Every day it is responsible for handling more than 50,000 unique changes <em>and</em> running more than four billion individual test cases.</p>
|
||||
|
||||
<p>TAP is the beating heart of Google’s development infrastructure. Conceptually, the process is very simple. When an engineer attempts to submit code, TAP runs the associated tests and reports success or failure. If the tests pass, the change is allowed into the codebase.</p>
|
||||
|
||||
<h3>Presubmit optimization</h3>
|
||||
|
||||
<p>To catch issues quickly<a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-secondary="presubmit optimization" data-type="indexterm" id="id-A1SEHrCXtYhQ"> </a> and consistently, it is important to ensure that tests are run against every change. <a contenteditable="false" data-primary="presubmits" data-secondary="optimization of" data-type="indexterm" id="id-BvSYhnCYtrhl"> </a>Without a CB, running tests is usually left to individual engineer discretion, and that often leads to a few motivated engineers trying to run all tests and keep up with the failures.</p>
|
||||
|
||||
<p>As discussed earlier, waiting a long time to run every test on presubmit can be severely disruptive, in some cases taking hours. To minimize the time spent waiting, Google’s CB approach allows potentially breaking changes to land in the repository (remember that they become immediately visible to the rest of the company!). All we ask is for each team to create a fast subset of tests, often a project’s unit tests, that can be run before a change is submitted (usually before it is sent for code review)—the presubmit. Empirically, a change that passes the presubmit has a very high likelihood (95%+) of passing the rest of the tests, and we optimistically allow it to be integrated so that other engineers can then begin to use it.</p>
|
||||
|
||||
<p>After a change has been submitted, we use TAP to asynchronously run all potentially affected tests, including larger and slower tests.</p>
|
||||
|
||||
<p>When a change causes a test to fail in TAP, it is imperative that the change be fixed quickly to prevent blocking other engineers. We have established a cultural norm that strongly discourages committing any new work on top of known failing tests, though flaky tests make this difficult. Thus, when a change is committed that breaks a team’s build in TAP, that change may prevent the team from making forward progress or building a new release. As a result, dealing with breakages quickly is imperative.</p>
|
||||
|
||||
<p>To deal with such breakages, each team has a “Build Cop.” The Build Cop’s responsibility is keeping all the tests passing in their particular project, regardless of who breaks them. When a Build Cop is notified of a failing test in their project, they drop whatever they are doing and fix the build. This is usually by identifying the offending change and determining whether it needs to be rolled back (the preferred solution) or can be fixed going forward (a riskier proposition).</p>
|
||||
|
||||
<p>In practice, the trade-off of allowing changes to be committed before verifying all tests has really paid off; the average wait time to submit a change is around 11 minutes, often run in the background. Coupled with the discipline of the Build Cop, we are able to efficiently detect and address breakages detected by longer running tests with a minimal amount of disruption.</p>
|
||||
|
||||
<h3>Culprit finding</h3>
|
||||
|
||||
<p>One of the problems we face with large test suites at Google is finding the specific change that broke a test. <a contenteditable="false" data-primary="culprit finding and failure isolation" data-secondary="using TAP" data-type="indexterm" id="id-xbSkHVFQtwhX"> </a><a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-secondary="culprit finding" data-type="indexterm" id="id-2ESWhBFJt1hm"> </a>Conceptually, this should be really easy: grab a change, run the tests, if any tests fail, mark the change as bad. Unfortunately, due to a prevalence of flakes and the occasional issues with the testing infrastructure itself, having confidence that a failure is real isn’t easy. To make matters more complicated, TAP must evaluate so many changes a day (more than one a second) that it can no longer run every test on every change. Instead, it falls back to batching related changes together, which reduces the total number of unique tests to be run. Although this approach can make it faster to run tests, it can obscure which change in the batch caused a test to break.</p>
|
||||
|
||||
<p>To speed up failure identification, we use two different approaches. First, TAP automatically splits a failing batch up into individual changes and reruns the tests against each change in isolation. This process can sometimes take a while to converge on a failure, so in addition, we have created culprit finding tools that an individual developer can use to binary search through a batch of changes and identify which one is the likely culprit.</p>
|
||||
|
||||
<h3>Failure management</h3>
|
||||
|
||||
<p>After a breaking change has been isolated, it is important to fix it as quickly as possible.<a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-secondary="failure management" data-type="indexterm" id="id-0KSQHphRtohV"> </a><a contenteditable="false" data-primary="failures" data-secondary="failure management with TAP" data-type="indexterm" id="id-b6SBhBhVtXhb"> </a> The presence of failing tests can quickly begin to erode confidence in the test suite. As mentioned previously, fixing a broken build is the responsibility of the Build Cop. The most effective tool the Build Cop has is the <em>rollback</em>.</p>
|
||||
|
||||
<p>Rolling a change back is often the fastest and safest route to fix a build because it quickly restores the system to a known good state.<sup><a data-type="noteref" id="ch01fn248-marker" href="ch23.html#ch01fn248">12</a></sup> In fact, TAP has recently been upgraded to automatically roll back changes when it has high confidence that they are the culprit.</p>
|
||||
|
||||
<p>Fast rollbacks work hand in hand with a test suite to ensure continued productivity. Tests give us confidence to change, rollbacks give us confidence to undo. Without tests, rollbacks can’t be done safely. Without rollbacks, broken tests can’t be fixed quickly, thereby reducing confidence in the system.</p>
|
||||
|
||||
<h3>Resource constraints</h3>
|
||||
|
||||
<p>Although engineers can run tests locally, most test executions<a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-secondary="resource constraints and" data-type="indexterm" id="id-Q0S8HECAtGhW"> </a> happen in a distributed build-and-test system called <em>Forge</em>. <a contenteditable="false" data-primary="Forge" data-type="indexterm" id="id-qWSptNCAt3h8"> </a>Forge allows engineers to run their builds and tests in our datacenters, which maximizes parallelism. At our scale, the resources required to run all tests executed on-demand by engineers and all tests being run as part of the CB process are enormous. Even given the amount of compute resources we have, systems like Forge and TAP are resource constrained. To work around these constraints, engineers working on TAP have come up with some clever ways to determine which tests should be run at which times to ensure that the minimal amount of resources are spent to validate a given change.</p>
|
||||
|
||||
<p>The primary mechanism for determining which tests need to be run is an analysis of the downstream dependency graph for every change. Google’s distributed build tools, Forge and Blaze, maintain<a contenteditable="false" data-primary="Blaze" data-secondary="global dependency graph" data-type="indexterm" id="id-KWSVHBI1tmhl"> </a> a near-real-time version of the global dependency graph and make it available to TAP. As a result, TAP can quickly determine which tests are downstream from any change and run the minimal set to be sure the change is safe.</p>
|
||||
|
||||
<p>Another factor influencing the use of TAP is the speed of tests being run. TAP is often able to run changes with fewer tests sooner than those with more tests. This bias encourages engineers to write small, focused changes. The difference in waiting time between a change that triggers 100 tests and one that triggers 1,000 can be tens of minutes on a busy day. Engineers who want to spend less time waiting end up making smaller, targeted changes, which is a win for everyone.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="implementation at Google" data-startref="ix_CIGooTAP" data-tertiary="TAP, global continuous build" data-type="indexterm" id="id-qWSOH5uAt3h8"> </a><a contenteditable="false" data-primary="Test Automation Platform (TAP)" data-startref="ix_TAP" data-type="indexterm" id="id-EdSnh8u6tVh3"> </a></p>
|
||||
</aside>
|
||||
|
||||
<section data-type="sect2" id="ci_case_study_google_takeout">
|
||||
<h2>CI Case Study: Google Takeout</h2>
|
||||
|
||||
<p>Google Takeout started out as a data backup and download product in 2011. Its founders<a contenteditable="false" data-primary="Google Takeout case study" data-type="indexterm" id="ix_GooTkcs"> </a> pioneered the <a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="implementation at Google" data-tertiary="case study, Google Takeout" data-type="indexterm" id="ix_CIGoocs"> </a>idea of “data liberation”—that users should be able to easily take their data with them, in a usable format, wherever they go. They began by integrating Takeout with a handful of Google products themselves, producing archives of users’ photos, contact lists, and so on for download at their request. However, Takeout didn’t stay small for long, growing as both a platform and a service for a wide variety of Google products. As we’ll see, effective CI is central to keeping any large project healthy, but is especially critical when applications rapidly grow.</p>
|
||||
|
||||
<section data-type="sect3" id="scenario_hashone_continuously_broken_de">
|
||||
<h3>Scenario #1: Continuously broken dev deploys</h3>
|
||||
|
||||
<p><strong>Problem:</strong> As Takeout gained a reputation as a powerful Google-wide data fetching, archiving, and download tool, other teams at the company began to turn to it, requesting APIs so that their own applications could provide backup and download functionality, too, including Google Drive (folder downloads are served by Takeout) and Gmail (for ZIP file previews). All in all, Takeout grew from being the backend for just the original Google Takeout product, to providing APIs for at least 10 other Google products, offering a wide range of functionality.</p>
|
||||
|
||||
<p>The team decided to deploy each of the new APIs as a customized instance, using the same original Takeout binaries but configuring them to work a little differently. For example, the environment for Drive bulk downloads has the largest fleet, the most quota reserved for fetching files from the Drive API, and some custom authentication logic to allow non-signed-in users to download public folders.</p>
|
||||
|
||||
<p>Before long, Takeout faced “flag issues.” Flags added for one of the instances would break the others, and their deployments would break when servers could not start up due to configuration incompatibilities. Beyond feature configuration, there was security and ACL configuration, too. For example, the consumer Drive download service should not have access to keys that encrypt enterprise Gmail exports. Configuration quickly became complicated and led to nearly nightly breakages.</p>
|
||||
|
||||
<p>Some efforts were made to detangle and modularize configuration, but the bigger problem this exposed was that when a Takeout engineer wanted to make a code change, it was not practical to manually test that each server started up under each configuration. They didn’t find out about configuration failures until the next day’s deploy. There were unit tests that ran on presubmit and post-submit (by TAP), but those weren’t sufficient to catch these kinds of issues.</p>
|
||||
|
||||
<section data-type="sect4" id="what_the_team_did">
|
||||
<h4>What the team did</h4>
|
||||
|
||||
<p>The team created temporary, sandboxed mini-environments for each of these instances that ran on presubmit and tested that all servers were healthy on startup. Running the temporary environments on presubmit prevented 95% of broken servers from bad configuration and reduced nightly deployment failures by 50%.</p>
|
||||
|
||||
<p>Although these new sandboxed presubmit tests dramatically reduced deployment failures, they didn’t remove them entirely. In particular, Takeout’s end-to-end tests would still frequently break the deploy, and these tests were difficult to run on presubmit (because they use test accounts, which still behave like real accounts in some respects and are subject to the same security and privacy safeguards). Redesigning them to be presubmit friendly would have been too big an undertaking.</p>
|
||||
|
||||
<p>If the team couldn’t run end-to-end tests in presubmit, when could it run them? It wanted to get end-to-end test results more quickly than the next day’s dev deploy and decided every two hours was a good starting point. But the team didn’t want to do a full dev deploy this often—this would incur overhead and disrupt long-running processes that engineers were testing in dev. Making a new shared test environment for these tests also seemed like too much overhead to provision resources for, plus culprit finding (i.e., finding the deployment that led to a failure) could involve some undesirable manual work.</p>
|
||||
|
||||
<p>So, the team reused the sandboxed environments from presubmit, easily extending them to a new post-submit environment. Unlike presubmit, post-submit was compliant with security safeguards to use the test accounts (for one, because the code has been approved), so the end-to-end tests could be run there. The post-submit CI runs every two hours, grabbing the latest code and configuration from green head, creates an RC, and runs the same end-to-end test suite against it that is already run in dev.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="lesson_learne">
|
||||
<h4>Lesson learned</h4>
|
||||
|
||||
<p>Faster feedback loops prevent problems in dev deploys:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Moving tests for different Takeout products from “after nightly deploy” to presubmit prevented 95% of broken servers from bad configuration and reduced nightly deployment failures by 50%.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Though end-to-end tests couldn’t be moved all the way to presubmit, they were still moved from “after nightly deploy” to “post-submit within two hours.” This effectively cut the “culprit set” by 12 times.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="scenario_hashtwo_indecipherable_test_lo">
|
||||
<h3>Scenario #2: Indecipherable test logs</h3>
|
||||
|
||||
<p><strong>Problem:</strong> As Takeout incorporated more Google products, it grew into a mature platform that allowed product teams to insert plug-ins, with product-specific data-fetching code, directly into Takeout's binary. For example, the Google Photos plug-in knows how to fetch photos, album metadata, and the like. Takeout expanded from its original “handful” of products to now integrate with more than <em>90</em>.</p>
|
||||
|
||||
<p>Takeout’s end-to-end tests dumped its failures to a log, and this approach didn’t scale to 90 product plug-ins. As more products integrated, more failures were introduced. Even though the team was running the tests earlier and more often with the addition of the post-submit CI, multiple failures would still pile up inside and were easy to miss. Going through these logs became a frustrating time sink, and the tests were almost always failing.</p>
|
||||
|
||||
<section data-type="sect4" id="what_the_team_di">
|
||||
<h4>What the team did</h4>
|
||||
|
||||
<p>The team refactored the tests into a dynamic, configuration-based suite (using a <a href="https://oreil.ly/UxkHk">parameterized test runner</a>) that reported results in a friendlier UI, clearly showing individual test results as green or red: no more digging through logs. They also made failures much easier to debug, most notably, by displaying failure information, with links to logs, directly in the error message. For example, if Takeout failed to fetch a file from Gmail, the test would dynamically construct a link that searched for that file’s ID in the Takeout logs and include it in the test failure message. This automated much of the debugging process for product plug-in engineers and required less of the Takeout team’s assistance in sending them logs, as demonstrated in <a data-type="xref" href="ch23.html#the_teamapostrophes_involvement_in_debu">Figure 23-3</a>.</p>
|
||||
|
||||
<figure id="the_teamapostrophes_involvement_in_debu"><img alt="The team’s involvement in debugging client features" src="images/seag_2303.png">
|
||||
<figcaption><span class="label">Figure 23-3. </span>The team’s involvement in debugging client failures</figcaption>
|
||||
</figure>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="lesson_learned">
|
||||
<h4>Lesson learned</h4>
|
||||
|
||||
<p>Accessible, actionable feedback from CI reduces test failures and improves productivity. These initiatives reduced the Takeout team’s involvement in debugging client (product plug-in) test failures by 35%.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="scenario_hashthree_debugging_quotation">
|
||||
<h3>Scenario #3: Debugging “all of Google”</h3>
|
||||
|
||||
<p><strong>Problem:</strong> An interesting side effect of the Takeout CI that the team did not anticipate was that, because it verified the output of 90-some odd end-user–facing products, in the form of an archive, they were basically testing “all of Google” and catching issues that had nothing to do with Takeout. This was a good thing—Takeout was able to help contribute to the quality of Google’s products overall. However, this introduced a problem for their CI processes: they needed better failure isolation so that they could determine which problems were in their build (which were the minority) and which lay in loosely coupled microservices behind the product APIs they called.</p>
|
||||
|
||||
<section data-type="sect4" id="what_the_team_did-id00141">
|
||||
<h4>What the team did</h4>
|
||||
|
||||
<p>The team’s solution was to run the exact same test suite continuously against production as it already did in its post-submit CI. This was cheap to implement and allowed the team to isolate which failures were new in its build and which were in production; for instance, the result of a microservice release somewhere else “in Google.”</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="lesson_learned-id00056">
|
||||
<h4>Lesson learned</h4>
|
||||
|
||||
<p>Running the same test suite against prod and a post-submit CI (with newly built binaries, but the same live backends) is a cheap way to isolate failures.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="remaining_challenge">
|
||||
<h4>Remaining challenge</h4>
|
||||
|
||||
<p>Going forward, the burden of testing “all of Google” (obviously, this is an exaggeration, as most product problems are caught by their respective teams) grows as Takeout integrates with more products and as those products become more complex. Manual comparisons between this CI and prod are an expensive use of the Build Cop’s time.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="future_improvement-id00045">
|
||||
<h4>Future improvement</h4>
|
||||
|
||||
<p>This presents an interesting opportunity to try hermetic testing with record/replay in Takeout’s post-submit CI. In theory, this would eliminate failures from backend product APIs surfacing in Takeout’s CI, which would make the suite more stable and effective at catching failures in the last two hours of Takeout changes—which is its intended purpose.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect3" id="scenario_hashfour_keeping_it_green">
|
||||
<h3>Scenario #4: Keeping it green</h3>
|
||||
|
||||
<p><strong>Problem:</strong> As the platform supported more product plug-ins, which each included end-to-end tests, these tests would fail and the end-to-end test suites were nearly always broken. The failures could not all be immediately fixed. Many were due to bugs in product plug-in binaries, which the Takeout team had no control over. And some failures mattered more than others—low-priority bugs and bugs in the test code did not need to block a release, whereas higher-priority bugs did. The team could easily disable tests by commenting them out, but that would make the failures too easy to forget about.</p>
|
||||
|
||||
<p>One common source of failures: tests would break when product plug-ins were rolling out a feature. For example, a playlist-fetching feature for the YouTube plug-in might be enabled for testing in dev for a few months before being enabled in prod. The Takeout tests only knew about one result to check, so that often resulted in the test needing to be disabled in particular environments and manually curated as the feature rolled out.</p>
|
||||
|
||||
<section data-type="sect4" id="what_the_team_did-id00142">
|
||||
<h4>What the team did</h4>
|
||||
|
||||
<p>The team came up with a strategic way to disable failing tests by tagging them with an associated bug and filing that off to the responsible team (usually a product plug-in team). When a failing test was tagged with a bug, the team’s testing framework would suppress its failure. This allowed the test suite to stay green and still provide confidence that everything else, besides the known issues, was passing, as illustrated in <a data-type="xref" href="ch23.html#achieving_greenness_through_left_parent">Figure 23-4</a>.</p>
|
||||
|
||||
<figure id="achieving_greenness_through_left_parent"><img alt="Achieving greenness through (responsible) test disablement" src="images/seag_2304.png">
|
||||
<figcaption><span class="label">Figure 23-4. </span>Achieving greenness through (responsible) test disablement</figcaption>
|
||||
</figure>
|
||||
|
||||
<p>For the rollout problem, the team added capability for plug-in engineers to specify the name of a feature flag, or ID of a code change, that enabled a particular feature along with the output to expect both with and without the feature. The tests were equipped to query the test environment to determine whether the given feature was enabled there and verified the expected output accordingly.</p>
|
||||
|
||||
<p>When bug tags from disabled tests began to accumulate and were not updated, the team automated their cleanup. The tests would now check whether a bug was closed by querying our bug system’s API. If a tagged-failing test actually passed and was passing for longer than a configured time limit, the test would prompt to clean up the tag (and mark the bug fixed, if it wasn’t already). There was one exception for this strategy: flaky tests. For these, the team would allow a test to be tagged as flaky, and the system wouldn’t prompt a tagged “flaky” failure for cleanup if it passed.</p>
|
||||
|
||||
<p>These changes made a mostly self-maintaining test suite, as illustrated in <a data-type="xref" href="ch23.html#mean_time_to_close_bugcomma_after_fix_s">Figure 23-5</a>.</p>
|
||||
|
||||
<figure id="mean_time_to_close_bugcomma_after_fix_s"><img alt="Mean time to close bug, after fix submitted" src="images/seag_2305.png">
|
||||
<figcaption><span class="label">Figure 23-5. </span>Mean time to close bug, after fix submitted</figcaption>
|
||||
</figure>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="lessons_learned">
|
||||
<h4>Lessons learned</h4>
|
||||
|
||||
<p>Disabling failing tests that can’t be immediately fixed is a practical approach to keeping your suite green, which gives confidence that you’re aware of all test failures. Also, automating the test suite’s maintenance, including rollout management and updating tracking bugs for fixed tests, keeps the suite clean and prevents technical debt. In DevOps parlance, we could call the metric in <a data-type="xref" href="ch23.html#mean_time_to_close_bugcomma_after_fix_s">Figure 23-5</a> MTTCU: mean time to clean up.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="future_improvement">
|
||||
<h4>Future improvement</h4>
|
||||
|
||||
<p>Automating the filing and tagging of bugs would be a helpful next step. This is still a manual and burdensome process. As mentioned earlier, some of our larger teams already do this.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect4" id="further_challenges">
|
||||
<h4>Further challenges</h4>
|
||||
|
||||
<p>The scenarios we’ve described are far from the only CI challenges faced by Takeout, and there are still more problems to solve. For example, we mentioned the difficulty of isolating failures from upstream services in <a data-type="xref" href="ch23.html#ci_challenges">CI Challenges</a>. This is a problem that Takeout still faces with rare breakages originating with upstream services, such as when a security update in the streaming infrastructure used by Takeout’s “Drive folder downloads” API broke archive decryption when it deployed to production. The upstream services are staged and tested themselves, but there is no simple way to automatically check with CI if they are compatible with Takeout after they're launched into production. An initial solution involved creating an “upstream staging” CI environment to test production Takeout binaries against the staged versions of their upstream dependencies. However, this proved difficult to maintain, with additional compatibility issues between staging and production <span class="keep-together">versions.</span><a contenteditable="false" data-primary="Google Takeout case study" data-startref="ix_GooTkcs" data-type="indexterm" id="id-0KSktvhYIXCbc5hO"> </a><a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="implementation at Google" data-startref="ix_CIGoocs" data-tertiary="case study, Google Takeout" data-type="indexterm" id="id-b6S3cmh1I2CQcBh2"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="but_i_canapostrophet_afford_ci">
|
||||
<h2>But I Can’t Afford CI</h2>
|
||||
|
||||
<p>You might be thinking that’s all well and good, but you have neither the time nor money to build any of this. We certainly acknowledge that Google might have more resources to implement CI than the typical startup does. Yet many of our products have grown so quickly that they didn’t have time to develop a CI system either (at least not an adequate one).</p>
|
||||
|
||||
<p>In your own products and organizations, try and think of the cost you are already paying for problems discovered and dealt with in production. These negatively affect the end user or client, of course, but they also affect the team. Frequent production fire-fighting is stressful and demoralizing. Although building out CI systems is expensive, it’s not necessarily a new cost as much as a cost shifted left to an earlier—and more preferable—stage, reducing the incidence, and thus the cost, of problems occurring too far to the right. CI leads to a more stable product and happier developer culture in which engineers feel more confident that “the system” will catch problems, and they can focus more on features and less on fixing.<a contenteditable="false" data-primary="continuous integration (CI)" data-secondary="implementation at Google" data-startref="ix_CIGoo" data-type="indexterm" id="id-nBSYHYtbfEhX"> </a> </p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00027">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Even though we’ve described our CI processes and some of how we’ve automated them, none of this is to say that we have developed perfect CI systems. After all, a CI system itself is just software and is never complete and should be adjusted to meet the evolving demands of the application and engineers it is meant to serve. We’ve tried to illustrate this with the evolution of Takeout’s CI and the future areas of improvement we point out.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00129">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>A CI system decides what tests to use, and when.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CI systems become progressively more necessary as your codebase ages and grows in scale.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CI should optimize quicker, more reliable tests on presubmit and slower, less deterministic tests on post-submit.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Accessible, actionable feedback allows a CI system to become <a contenteditable="false" data-primary="continuous integration (CI)" data-startref="ix_CI" data-type="indexterm" id="id-nBSYHJHncNh1cQ"> </a>more efficient.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn236"><sup><a href="ch23.html#ch01fn236-marker">1</a></sup><a href="https://www.martinfowler.com/articles/continuousIntegration.html"><em class="hyperlink">https://www.martinfowler.com/articles/continuousIntegration.html</em></a></p><p data-type="footnote" id="ch01fn237"><sup><a href="ch23.html#ch01fn237-marker">2</a></sup>Forsgren, Nicole, et al. (2018). Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution.</p><p data-type="footnote" id="ch01fn238"><sup><a href="ch23.html#ch01fn238-marker">3</a></sup>This is also sometimes called “shifting left on testing.”</p><p data-type="footnote" id="ch01fn239"><sup><a href="ch23.html#ch01fn239-marker">4</a></sup><em>Head</em> is the latest versioned code in our monorepo. In other workflows, this is also referred to as <em>master</em>, <em>mainline</em>, or <em>trunk</em>. Correspondingly, integrating at head is also known as <em>trunk-based development</em>.</p><p data-type="footnote" id="ch01fn240"><sup><a href="ch23.html#ch01fn240-marker">5</a></sup>At Google, release automation is managed by a separate system from TAP. We won’t focus on <em>how</em> release automation assembles RCs, but if you’re interested, we do refer you to <a href="https://landing.google.com/sre/books"><em>Site Reliability Engineering</em></a> (O'Reilly) in which our release automation technology (a system called Rapid) is discussed in detail.</p><p data-type="footnote" id="ch01fn241"><sup><a href="ch23.html#ch01fn241-marker">6</a></sup>CD with experiments and feature flags is discussed further in <a data-type="xref" href="ch24.html#continuous_delivery-id00035">Continuous Delivery</a>.</p><p data-type="footnote" id="ch01fn242"><sup><a href="ch23.html#ch01fn242-marker">7</a></sup>We call these “mid-air collisions” because the probability of it occurring is extremely low; however, when this does happen, the results can be quite surprising.</p><p data-type="footnote" id="ch01fn243"><sup><a href="ch23.html#ch01fn243-marker">8</a></sup>Each team at Google configures a subset of its project’s tests to run on presubmit (versus post-submit). In reality, our continuous build actually optimizes some presubmit tests to be saved for post-submit, behind the scenes. We'll further discuss this later on in this chapter.</p><p data-type="footnote" id="ch01fn244"><sup><a href="ch23.html#ch01fn244-marker">9</a></sup>Aiming for 100% uptime is the wrong target. Pick something like 99.9% or 99.999% as a business or product trade-off, define and monitor your actual uptime, and use that “budget” as an input to how aggressively you’re willing to push risky releases.</p><p data-type="footnote" id="ch01fn245"><sup><a href="ch23.html#ch01fn245-marker">10</a></sup>We believe CI is actually critical to the software engineering ecosystem: a must-have, not a luxury. But that is not universally understood yet.</p><p data-type="footnote" id="ch01fn246"><sup><a href="ch23.html#ch01fn246-marker">11</a></sup>In practice, it’s often difficult to make a <em>completely</em> sandboxed test environment, but the desired stability can be achieved by minimizing outside dependencies.</p><p data-type="footnote" id="ch01fn248"><sup><a href="ch23.html#ch01fn248-marker">12</a></sup>Any change to Google’s codebase can be rolled back with two clicks!</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
199
clones/abseil.io/resources/swe-book/html/ch24.html
Normal file
|
@ -0,0 +1,199 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="chapter" id="continuous_delivery-id00035">
|
||||
<h1>Continuous Delivery</h1>
|
||||
|
||||
<p class="byline">Written by Radha Narayan, Bobbi Jones, <span class="keep-together">Sheri Shipe, and David Owens</span></p>
|
||||
|
||||
<p class="byline">Edited by Lisa Carey</p>
|
||||
|
||||
<p>Given how quickly and<a contenteditable="false" data-primary="continuous delivery (CD)" data-type="indexterm" id="ix_CD"> </a> unpredictably the technology landscape shifts, the competitive advantage for any product lies in its ability to quickly go to market. An organization’s velocity is a critical factor in its ability to compete with other players, maintain product and service quality, or adapt to new regulation. This velocity is bottlenecked by the time to deployment. Deployment doesn’t just happen once at initial launch. There is a saying among educators that no lesson plan survives its first contact with the student body. In much the same way, no software is perfect at first launch, and the only guarantee is that you’ll have to update it. Quickly.<a contenteditable="false" data-primary="CD" data-see="continuous delivery" data-type="indexterm" id="id-RmcOCbfO"> </a></p>
|
||||
|
||||
<p>The long-term life cycle of a software product involves rapid exploration of new ideas, rapid responses to landscape shifts or user issues, and enabling developer velocity at scale. From Eric Raymond’s <em>The Cathedral and the Bazaar</em> to Eric Reis’ <em>The Lean Startup</em>, the key to any organization’s long-term success has always been in its ability to get ideas executed and into users’ hands as quickly as possible and reacting quickly to their feedback. Martin Fowler, in his book <a href="https://oreil.ly/B3WFD"><em>Continuous Delivery</em></a> (aka CD), points out that “The biggest risk to any software effort is that you end up building something that isn’t useful. The earlier and more frequently you get working software in front of real users, the quicker you get feedback to find out how valuable it really is.”</p>
|
||||
|
||||
<p>Work that stays in progress for a long time before delivering user value is high risk and high cost, and can even be a drain on morale. At Google, we strive to release early and often, or “launch and iterate,” to enable teams to see the impact of their work quickly and to adapt faster to a shifting market. The value of code is not realized at the time of submission but when features are available to your users. Reducing the time between “code complete” and user feedback minimizes the cost of work that is in progress.</p>
|
||||
|
||||
<blockquote>
|
||||
<p>You get extraordinary outcomes by realizing that the launch <em>never lands</em> but that it begins a learning cycle where you then fix the next most important thing, measure how it went, fix the next thing, etc.—and it is <em>never complete</em>.</p>
|
||||
|
||||
<p>—David Weekly, Former Google product manager</p>
|
||||
</blockquote>
|
||||
|
||||
<p>At Google, the practices we describe in this book allow hundreds (or in some cases thousands) of engineers to quickly troubleshoot problems, to independently work on new features without worrying about the release, and to understand the effectiveness of new features through A/B experimentation. This chapter focuses on the key levers of rapid innovation, including managing risk, enabling developer velocity at scale, and understanding the cost and value trade-off of each feature you launch.</p>
|
||||
|
||||
<section data-type="sect1" id="idioms_of_continuous_delivery_at_google">
|
||||
<h1>Idioms of Continuous Delivery at Google</h1>
|
||||
|
||||
<p>A core tenet of Continuous Delivery (CD) as well as<a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="idioms of CD at Google" data-type="indexterm" id="id-AmcVHYCVsq"> </a> of Agile methodology is that over time, smaller batches of changes result in higher quality; in other words, <em>faster is safer</em>. This can seem deeply controversial to teams at first glance, especially if the prerequisites for setting up CD—for example, Continuous Integration (CI) and testing—are not yet in place. Because it might take a while for all teams to realize the ideal of CD, we focus on developing various aspects that deliver value independently en route to the end goal. Here are some of these:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Agility</dt>
|
||||
<dd>Release frequently and in small batches</dd>
|
||||
<dt>Automation</dt>
|
||||
<dd>Reduce or remove repetitive overhead of frequent releases</dd>
|
||||
<dt>Isolation</dt>
|
||||
<dd>Strive for modular architecture to isolate changes and make troubleshooting easier</dd>
|
||||
<dt>Reliability</dt>
|
||||
<dd>Measure key health indicators like crashes or latency and keep improving them</dd>
|
||||
<dt>Data-driven decision making</dt>
|
||||
<dd>Use A/B testing on health metrics to ensure quality</dd>
|
||||
<dt>Phased rollout</dt>
|
||||
<dd>Roll out changes to a few users before shipping to everyone</dd>
|
||||
</dl>
|
||||
|
||||
<p>At first, releasing new versions of software frequently might seem risky. As your userbase grows, you might fear the backlash from angry users if there are any bugs that you didn’t catch in testing, and you might quite simply have too much new code in your product to test exhaustively. But this is precisely where CD can help. Ideally, there are so few changes between one release and the next that troubleshooting issues is trivial. In the limit, with CD, every change goes through the QA pipeline and is automatically deployed into production. This is often not a practical reality for many teams, and so there is often work of culture change toward CD as an intermediate step, during which teams can build their readiness to deploy at any time without actually doing so, building up their confidence to release more frequently in the future.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="velocity_is_a_team_sport_how_to_break_u">
|
||||
<h1>Velocity Is a Team Sport: How to Break Up a Deployment into Manageable Pieces</h1>
|
||||
|
||||
<p>When a team is small, changes <a contenteditable="false" data-primary="velocity is a team sport" data-type="indexterm" id="id-0rc6HmCltQ"> </a>come into a codebase at a certain rate.<a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="breaking up deployment into manageable pieces" data-type="indexterm" id="id-xncmCnCDt3"> </a><a contenteditable="false" data-primary="deployment" data-secondary="breaking up into manageable pieces" data-type="indexterm" id="id-axcmUKCwtJ"> </a> We’ve seen an antipattern emerge as a team grows over time or splits into subteams: a subteam branches off its code to avoid stepping on anyone’s feet, but then struggles, later, with integration and culprit finding. At Google, we prefer that teams continue to develop at head in the shared codebase and set up CI testing, automatic rollbacks, and culprit finding to identify issues quickly. This is discussed at length in <a data-type="xref" href="ch23.html#continuous_integration">Continuous Integration</a>.</p>
|
||||
|
||||
<p>One of our codebases, YouTube, is a large, monolithic Python application. The release process is laborious, with Build Cops, release managers, and other volunteers. Almost every release has multiple cherry-picked changes and respins. There is also a 50-hour manual regression testing cycle run by a remote QA team on every release. When the operational cost of a release is this high, a cycle begins to develop in which you wait to push out your release until you’re able to test it a bit more. Meanwhile, someone wants to add just one more feature that’s almost ready, and pretty soon you have yourself a release process that’s laborious, error prone, and slow. Worst of all, the experts who did the release last time are burned out and have left the team, and now nobody even knows how to troubleshoot those strange crashes that happen when you try to release an update, leaving you panicky at the very thought of pushing that <span class="keep-together">button.</span></p>
|
||||
|
||||
<p>If your releases are costly and sometimes risky, the <em>instinct</em> is to slow down your release cadence and increase your stability period. However, this only provides short-term stability gains, and over time it slows velocity and frustrates teams and users. The <em>answer</em> is to reduce cost, increase discipline, and make the risks more incremental, but it is critical to resist the obvious operational fixes and invest in long-term architectural changes. The obvious operational fixes to this problem lead to a few traditional approaches: reverting to a traditional planning model that leaves little room for learning or iteration, adding more governance and oversight to the development process, and implementing risk reviews or rewarding low-risk (and often low-value) features.</p>
|
||||
|
||||
<p>The investment with the best return, though, is migrating to a microservice architecture, which can empower a large product team with the ability to remain scrappy and innovative while simultaneously reducing risk. In some cases, at Google, the answer has been to rewrite an application from scratch rather than simply migrating it, establishing the desired modularity into the new architecture. Although either of these options can take months and is likely painful in the short term, the value gained in terms of operational cost and cognitive simplicity will pay off over an application’s lifespan of years.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="evaluating_changes_in_isolation_flag-gu">
|
||||
<h1>Evaluating Changes in Isolation: Flag-Guarding Features</h1>
|
||||
|
||||
<p>A key to reliable continuous <a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="evaluating changes in isolation, flag-guarding features" data-type="indexterm" id="id-xncVHnC7c3"> </a>releases is to make sure engineers "flag guard" <em>all changes</em>. As <a contenteditable="false" data-primary="flag-guarding features" data-type="indexterm" id="id-kLcXUlC8cK"> </a>a product grows, there will be multiple features under various stages of development coexisting in a binary. Flag guarding can be used to control the inclusion or expression of feature code in the product on a feature-by-feature basis and can be expressed differently for release and development builds. A feature flag disabled for a build should allow build tools to strip the feature from the build if the language permits it. For instance, a stable feature that has already shipped to customers might be enabled for both development and release builds. A feature under development might be enabled only for development, protecting users from an unfinished feature. New feature code lives in the binary alongside the old codepath—both can run, but the new code is guarded by a flag. If the new code works, you can remove the old codepath and launch the feature fully in a subsequent release. If there’s a problem, the flag value can be updated independently from the binary release via a dynamic config update.</p>
|
||||
|
||||
<p>In the old world of binary releases, we had to time press releases closely with our binary rollouts. We had to have a successful rollout before a press release about new functionality or a new feature could be issued. This meant that the feature would be out in the wild before it was announced, and the risk of it being discovered ahead of time was very real.</p>
|
||||
|
||||
<p>This is where the beauty of the flag guard comes to play. If the new code has a flag, the flag can be updated to turn your feature on immediately before the press release, thus minimizing the risk of leaking a feature. Note that flag-guarded code is not a <em>perfect</em> safety net for truly sensitive features. Code can still be scraped and analyzed if it’s not well obfuscated, and not all features can be hidden behind flags without adding a lot of complexity. Moreover, even flag configuration changes must be rolled out with care. Turning on a flag for 100% of your users all at once is not a great idea, so a configuration service that manages safe configuration rollouts is a good investment. Nevertheless, the level of control and the ability to decouple the destiny of a particular feature from the overall product release are powerful levers for long-term sustainability of the application.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="striving_for_agility_setting_up_a_relea">
|
||||
<h1>Striving for Agility: Setting Up a Release Train</h1>
|
||||
|
||||
<p>Google’s Search binary is its first and oldest. Large and complicated, its codebase can be<a contenteditable="false" data-primary="releases" data-secondary="striving for agility, setting up a release train" data-type="indexterm" id="ix_release"> </a> tied back to <a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="striving for agility, setting up a release train" data-type="indexterm" id="ix_CDagile"> </a>Google’s origin—a search through our codebase can still find code written at least as far back as 2003, often earlier. When smartphones began to take off, feature after mobile feature was shoehorned into a hairball of code written primarily for server deployment. Even though the Search experience was becoming more vibrant and interactive, deploying a viable build became more and more difficult. At one point, we were releasing the Search binary into production only once per week, and even hitting that target was rare and often based on luck.</p>
|
||||
|
||||
<p>When one of our contributing authors, Sheri Shipe, took on the project of increasing our release velocity in Search, each release cycle was taking a group of engineers days to complete. They built the binary, integrated data, and then began testing. Each bug had to be manually triaged to make sure it wouldn’t impact Search quality, the user experience (UX), and/or revenue. This process was grueling and time consuming and did not scale to the volume or rate of change. As a result, a developer could never know when their feature was going to be released into production. This made timing press releases and public launches challenging.</p>
|
||||
|
||||
<p>Releases don’t happen in a vacuum, and having reliable releases makes the dependent factors easier to synchronize. Over the course of several years, a dedicated group of engineers implemented a continuous release process, which streamlined everything about sending a Search binary into the world. We automated what we could, set deadlines for submitting features, and simplified the integration of plug-ins and data into the binary. We could now consistently release a new Search binary into production every other day.</p>
|
||||
|
||||
<p>What were the trade-offs we made to get predictability in our release cycle? They narrow down to two main ideas we baked into the system.</p>
|
||||
|
||||
<section data-type="sect2" id="no_binary_is_perfect">
|
||||
<h2>No Binary Is Perfect</h2>
|
||||
|
||||
<p>The first is that <em>no binary is perfect</em>, especially for builds<a contenteditable="false" data-primary="no binary is perfect" data-type="indexterm" id="id-1GceCwCKulTD"> </a> that are incorporating the work of tens or hundreds of developers independently developing dozens of major features.<a contenteditable="false" data-primary="releases" data-secondary="striving for agility, setting up a release train" data-tertiary="no binary is perfect" data-type="indexterm" id="id-JmcnUwCeuxTP"> </a> Even though it’s impossible to fix every bug, we constantly need to weigh questions such as: If a line has been moved two pixels to the left, will it affect an ad display and potential revenue? What if the shade of a box has been altered slightly? Will it make it difficult for visually impaired users to read the text? The rest of this book is arguably about minimizing the set of unintended outcomes for a release, but in the end we must admit that software is fundamentally complex. There is no perfect binary—decisions and trade-offs have to be made every time a new change is released into production. Key performance indicator metrics with clear thresholds allow <span class="keep-together">features</span> to launch even if they aren’t perfect<sup><a data-type="noteref" id="ch01fn250-marker" href="ch24.html#ch01fn250">1</a></sup> and can also create clarity in otherwise contentious launch decisions.</p>
|
||||
|
||||
<p>One bug involved a rare dialect spoken on only one island in the Philippines. If a user asked a search question in this dialect, instead of an answer to their question, they would get a blank web page. We had to determine whether the cost of fixing this bug was worth delaying the release of a major new feature.</p>
|
||||
|
||||
<p>We ran from office to office trying to determine how many people actually spoke this language, if it happened every time a user searched in this language, and whether these folks even used Google on a regular basis. Every quality engineer we spoke with deferred us to a more senior person. Finally, data in hand, we put the question to Search’s senior vice president. Should we delay a critical release to fix a bug that affected only a very small Philippine island? It turns out that no matter how small your island, you should get reliable and accurate search results: we delayed the release and fixed the bug.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect2" id="meet_your_release_deadline">
|
||||
<h2>Meet Your Release Deadline</h2>
|
||||
|
||||
<p>The second idea is that <em>if you’re late for the release train, it will leave without you</em>. There’s something to <a contenteditable="false" data-primary="releases" data-secondary="striving for agility, setting up a release train" data-tertiary="meeting your release deadline" data-type="indexterm" id="id-JmcoCwCPSxTP"> </a>be said for the adage, "deadlines are certain, life is not." At some point in the release timeline, you must put a stake in the ground and turn away developers and their new features. Generally speaking, no amount of pleading or begging will get a feature into today’s release after the deadline has passed.</p>
|
||||
|
||||
<p>There is the <em>rare</em> exception. The situation usually goes like this. It’s late Friday evening and six software engineers come storming into the release manager’s cube in a panic. They have a contract with the NBA and finished the feature moments ago. But it must go live before the big game tomorrow. The release must stop and we must cherry-pick the feature into the binary or we’ll be in breach of contract! A bleary-eyed release engineer shakes their head and says it will take four hours to cut and test a new binary. It’s their kid’s birthday and they still need to pick up the balloons.</p>
|
||||
|
||||
<p>A world of regular releases means that if a developer misses the release train, they’ll be able to catch the next train in a matter of hours rather than days. This limits developer panic and greatly improves <a contenteditable="false" data-primary="releases" data-secondary="striving for agility, setting up a release train" data-startref="x_release" data-type="indexterm" id="id-OmcbHLfDSOTl"> </a>work–life balance for release engineers.<a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="striving for agility, setting up a release train" data-startref="ix_CDagile" data-type="indexterm" id="id-Lmc0CdfmSrTz"> </a></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="quality_and_user-focus_ship_only_what_g">
|
||||
<h1>Quality and User-Focus: Ship Only What Gets Used</h1>
|
||||
|
||||
<p>Bloat is an unfortunate side effect of most software development life cycles, and the more successful <a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="quality and user-focus, shipping only what gets used" data-type="indexterm" id="id-kLcRHlCDFK"> </a>a product becomes, the<a contenteditable="false" data-primary="shipping only what gets used" data-type="indexterm" id="id-y9ckCZCGFk"> </a> more bloated its code base typically becomes.<a contenteditable="false" data-primary="user focus in CD, shipping only what gets used" data-type="indexterm" id="id-bacGUqCdFw"> </a><a contenteditable="false" data-primary="quality and user-focus in CD" data-type="indexterm" id="id-EmcdfBC1Fx"> </a> One downside of a speedy, efficient release train is that this bloat is often magnified and can manifest in challenges to the product team and even to the users. Especially if the software is delivered to the client, as in the case of mobile apps, this can mean the user’s device pays the cost in terms of space, download, and data costs, even for features they never use, whereas developers pay the cost of slower builds, complex deployments, and rare bugs. In this section, we’ll talk about how dynamic deployments allow you to ship only what is used, forcing necessary trade-offs between user value and feature cost. At Google, this often means staffing dedicated teams to improve the efficiency of the product on an ongoing basis.</p>
|
||||
|
||||
<p>Whereas some products are web-based and run on the cloud, many are client applications that use shared resources on a user’s device—a phone or tablet. This choice in itself showcases a trade-off between native apps that can be more performant and resilient to spotty connectivity, but also more difficult to update and more susceptible to platform-level issues. A common argument against frequent, continuous deployment for native apps is that users dislike frequent updates and must pay for the data cost and the disruption. There might be other limiting factors such as access to a network or a limit to the reboots required to percolate an update.</p>
|
||||
|
||||
<p>Even though there is a trade-off to be made in terms of how frequently to update a product, the goal is to <em>have these choices be intentional</em>. With a smooth, well-running CD process, how often a viable release is <em>created</em> can be separated from how often a user <em>receives</em> it. You might achieve the goal of being able to deploy weekly, daily, or hourly, without actually doing so, and you should intentionally choose release processes in the context of your users’ specific needs and the larger organizational goals, and determine the staffing and tooling model that will best support the long-term sustainability of your product.</p>
|
||||
|
||||
<p>Earlier in the chapter, we talked about keeping your code modular. This allows for dynamic, configurable deployments that allow better utilization of constrained resources, such as the space on a user’s device. In the absence of this practice, every user must receive code they will never use to support translations they don’t need or architectures that were meant for other kinds of devices. Dynamic deployments allow apps to maintain small sizes while only shipping code to a device that brings its users value, and A/B experiments allow for intentional trade-offs between a feature’s cost and its value to users and your business.</p>
|
||||
|
||||
<p>There is an upfront cost to setting up these processes, and identifying and removing frictions that keep the frequency of releases lower than is desirable is a painstaking process. But the long-term wins in terms of risk management, developer velocity, and enabling rapid innovation are so high that these initial costs become worthwhile.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="shifting_left_making_data-driven_decisi">
|
||||
<h1>Shifting Left: Making Data-Driven Decisions Earlier</h1>
|
||||
|
||||
<p>If you’re building for all<a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="shifting left and making data-driven decisions earlier" data-type="indexterm" id="id-y9coHZC9ik"> </a> users, you might have <a contenteditable="false" data-primary="data-driven decisions, making earlier" data-type="indexterm" id="id-bacOCqCZiw"> </a>clients on <a contenteditable="false" data-primary="shifting left" data-secondary="making data-driven decisions earlier" data-type="indexterm" id="id-Emc3UBCqix"> </a>smart screens, speakers, or Android and iOS phones and tablets, and your software may be flexible enough to allow users to customize their experience. Even if you’re building for only Android devices, the sheer diversity of the more than two billion Android devices can make the prospect of qualifying a release overwhelming. And with the pace of innovation, by the time someone reads this chapter, whole new categories of devices might have bloomed.</p>
|
||||
|
||||
<p>One of our release managers shared a piece of wisdom that turned the situation around when he said that the diversity of our client market was not a <em>problem</em>, but a <em>fact</em>. After we accepted that, we could switch our release qualification model in the following ways:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>If <em>comprehensive</em> testing is practically infeasible, aim for <em>representative</em> testing instead.<a contenteditable="false" data-primary="representative testing" data-type="indexterm" id="id-1Gc1UxHdHXf9iV"> </a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Staged rollouts to slowly<a contenteditable="false" data-primary="staged rollouts" data-type="indexterm" id="id-Qmc1HdHmCqfNid"> </a> increasing percentages of the userbase allow for fast fixes.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Automated A/B releases allow for<a contenteditable="false" data-primary="automation" data-secondary="automated A/B releases" data-type="indexterm" id="id-1Gc6HxHBUXf9iV"> </a> statistically significant results proving a release’s quality, without tired humans needing to look at dashboards and make <span class="keep-together">decisions.</span></p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>When it comes to developing for Android clients, Google apps use specialized testing tracks and staged rollouts to an increasing percentage of user traffic, carefully monitoring for issues in these channels. Because the Play Store offers unlimited testing tracks, we can also set up a QA team in each country in which we plan to launch, allowing for a global overnight turnaround in testing key features.</p>
|
||||
|
||||
<p>One issue we noticed when doing deployments to Android was that we could expect a statistically significant change in user metrics <em>simply from pushing an update</em>. This meant that even if we made no changes to our product, pushing an update could affect device and user behavior in ways that were difficult to predict. As a result, although canarying the update to a small percentage of user traffic could give us good information about crashes or stability problems, it told us very little about whether the newer version of our app was in fact better than the older one.</p>
|
||||
|
||||
<p>Dan Siroker and Pete Koomen have already discussed the value of A/B testing<sup><a data-type="noteref" id="ch01fn251-marker" href="ch24.html#ch01fn251">2</a></sup> your features, but at Google, some of our larger apps also A/B test their <em>deployments</em>. This means sending out two versions of the product: one that is the desired update, with the baseline being a placebo (your old version just gets shipped again). As the two versions roll out simultaneously to a large enough base of similar users, you can compare one release against the other to see whether the latest version of your software is in fact an improvement over the previous one. With a large enough userbase, you should be able to get statistically significant results within days, or even hours. An automated metrics pipeline can enable the fastest possible release by pushing forward a release to more traffic as soon as there is enough data to know that the guardrail metrics will not be affected.</p>
|
||||
|
||||
<p>Obviously, this method does not apply to every app and can be a lot of overhead when you don’t have a large enough userbase. In these cases, the recommended best practice is to aim for change-neutral releases. All new features are flag guarded so that the only change being tested during a rollout is the stability of the deployment itself.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="changing_team_culture_building_discipli">
|
||||
<h1>Changing Team Culture: Building Discipline into Deployment</h1>
|
||||
|
||||
<p>Although “Always Be Deploying” helps address several issues affecting developer velocity, there are also certain practices that address issues of scale.<a contenteditable="false" data-primary="continuous delivery (CD)" data-secondary="changing team culture to build disclipline into deployment" data-type="indexterm" id="id-bacjHqCvHw"> </a><a contenteditable="false" data-primary="culture" data-secondary="building discipline into deployment" data-type="indexterm" id="id-Emc7CBCXHx"> </a><a contenteditable="false" data-primary="deployment" data-secondary="building discipline into" data-type="indexterm" id="id-QmcrUPC8Hy"> </a> The initial team launching a product can be fewer than 10 people, each taking turns at deployment and production-monitoring responsibilities. Over time, your team might grow to hundreds of people, with subteams responsible for specific features. As this happens and the organization scales up, the number of changes in each deployment and the amount of risk in each release attempt is increasing superlinearly. Each release contains months of sweat and tears. Making the release successful becomes a high-touch and labor-intensive effort. Developers can often be caught trying to decide which is worse: abandoning a release that contains a quarter’s worth of new features and bug fixes, or pushing out a release without confidence in its quality.</p>
|
||||
|
||||
<p>At scale, increased complexity usually manifests as increased release latency. Even if you release every day, a release can take a week or longer to fully roll out safely, leaving you a week behind when trying to debug any issues. This is where “Always Be Deploying” can return a development project to effective form. Frequent release trains allow for minimal divergence from a known good position, with the recency of changes aiding in resolving issues. But how can a team ensure that the complexity inherent with a large and quickly expanding codebase doesn’t weigh down progress?</p>
|
||||
|
||||
<p>On Google Maps, we take the perspective that features are very important, but only very seldom is any feature so important that a release should be held for it. If releases are frequent, the pain a feature feels for missing a release is small in comparison to the pain all the new features in a release feel for a delay, and especially the pain users can feel if a not-quite-ready feature is rushed to be included.</p>
|
||||
|
||||
<p>One release responsibility is to protect the product from the developers.</p>
|
||||
|
||||
<p>When making trade-offs, the passion and urgency a developer feels about launching a new feature can never trump the user experience with an existing product. This means that new features must be isolated from other components via interfaces with strong contracts, separation of concerns, rigorous testing, communication early and often, and conventions for new feature acceptance.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="conclusion-id00029">
|
||||
<h1>Conclusion</h1>
|
||||
|
||||
<p>Over the years and across all of our software products, we’ve found that, counterintuitively, faster is safer. The health of your product and the speed of development are not actually in opposition to each other, and products that release more frequently and in small batches have better quality outcomes. They adapt faster to bugs encountered in the wild and to unexpected market shifts. Not only that, faster is <em>cheaper</em>, because having a predictable, frequent release train forces you to drive down the cost of each release and makes the cost of any abandoned release very low.</p>
|
||||
|
||||
<p>Simply having the structures in place that <em>enable</em> continuous deployment generates the majority of the value, <em>even if you don’t actually push those releases out to users</em>. What do we mean? We don’t actually release a wildly different version of Search, Maps, or YouTube every day, but to be able to do so requires a robust, well-documented continuous deployment process, accurate and real-time metrics on user satisfaction and product health, and a coordinated team with clear policies on what makes it in or out and why. In practice, getting this right often also requires binaries that can be configured in production, configuration managed like code (in version control), and a toolchain that allows safety measures like dry-run verification, rollback/rollforward mechanisms, and reliable patching.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="tlsemicolondrs-id00130">
|
||||
<h1>TL;DRs</h1>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p><em>Velocity is a team sport</em>: The optimal workflow for a large team that develops code collaboratively requires modularity of architecture and near-continuous <span class="keep-together">integration.</span></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Evaluate changes in isolation</em>: Flag guard any features to be able to isolate problems early.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Make reality your benchmark</em>: Use a staged rollout to address device diversity and the breadth of the userbase. Release qualification in a synthetic environment that isn’t similar to the production environment can lead to late surprises.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Ship only what gets used</em>: Monitor the cost and value of any feature in the wild to know whether it’s still relevant and delivering sufficient user value.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Shift left</em>: Enable faster, more data-driven decision making earlier on all changes through CI and continuous deployment.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><em>Faster is safer</em>: Ship early and often and in small batches to reduce the risk of each release and to minimize time to market.<a contenteditable="false" data-primary="continuous delivery (CD)" data-startref="ix_CD" data-type="indexterm" id="id-KmcECRHauaCyUV"> </a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</section>
|
||||
<div data-type="footnotes"><p data-type="footnote" id="ch01fn250"><sup><a href="ch24.html#ch01fn250-marker">1</a></sup>Remember the SRE “error-budget” formulation: perfection is rarely the best goal. Understand how much room for error is acceptable and how much of that budget has been spent recently and use that to adjust the trade-off between velocity and stability.</p><p data-type="footnote" id="ch01fn251"><sup><a href="ch24.html#ch01fn251-marker">2</a></sup>Dan Siroker and Pete Koomen, <em>A/B Testing: The Most Powerful Way to Turn Clicks Into Customers</em> (Hoboken: Wiley, 2013).</p></div></section>
|
||||
|
||||
</body>
|
||||
</html>
|
499
clones/abseil.io/resources/swe-book/html/ch25.html
Normal file
33
clones/abseil.io/resources/swe-book/html/ch26.html
Normal file
|
@ -0,0 +1,33 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="afterword" id="conclusion-id00032">
|
||||
|
||||
<h1>Afterword</h1>
|
||||
|
||||
<p>Software engineering at Google has been an extraordinary experiment in how to develop and maintain a large and evolving codebase. <a contenteditable="false" data-primary="software engineering" data-secondary="concluding thoughts" data-type="indexterm" id="ix_sftengrcl"> </a>I’ve seen engineering teams break ground on this front during my time here, moving Google forward both as a company that touches billions of users and as a leader in the tech industry. This wouldn’t have been possible without the principles outlined in this book, so I’m very excited to see these pages come to life.</p>
|
||||
|
||||
<p>If the past 50 years (or the preceding pages here) have proven anything, it’s that software engineering is far from stagnant. In an environment in which technology is steadily changing, the software engineering function holds a particularly important role within a given organization. Today, software engineering principles aren’t simply about how to effectively run an organization; they’re about how to be a more responsible company for users and the world at large.</p>
|
||||
|
||||
<p>Solutions to common software engineering problems are not always hidden in plain sight—most require a certain level of resolute agility to identify solutions that will work for current-day problems and also withstand inevitable changes to technical systems. This agility is a common quality of the software engineering teams I’ve had the privilege to work with and learn from since joining Google back in 2008.</p>
|
||||
|
||||
<p>The idea of sustainability is also central to software engineering. Over a codebase’s expected lifespan, we must be able to react and adapt to changes, be that in product direction, technology platforms, underlying libraries, operating systems, and more. Today, we rely on the principles outlined in this book to achieve crucial flexibility in changing pieces of our software ecosystem.</p>
|
||||
|
||||
<p>We certainly can’t prove that the ways we’ve found to attain sustainability will work for every organization, but I think it’s important to share these key learnings. Software engineering is a new discipline, so very few organizations have had the chance to achieve both sustainability and scale. By providing this overview of what we’ve seen, as well as the bumps along the way, our hope is to demonstrate the value and feasibility of long-term planning for code health. The passage of time and the importance of change cannot be ignored.</p>
|
||||
|
||||
<p>This book outlines some of our key guiding principles as they relate to software engineering. At a high level, it also illuminates the influence of technology on society. As software engineers, it’s our responsibility to ensure that our code is designed with inclusion, equity, and accessibility for everyone. Building for the sole purpose of innovation is no longer acceptable; technology that helps only a set of users isn’t innovative at all.</p>
|
||||
|
||||
<p>Our responsibility at Google has always been to provide developers, internally and externally, with a well-lit path. With the rise of new technologies like artificial intelligence, quantum computing, and ambient computing, there’s still plenty for us to learn as a company. I’m particularly excited to see where the industry takes software engineering in the coming years, and I’m confident that this book will help shape that path.<a contenteditable="false" data-primary="software engineering" data-secondary="concluding thoughts" data-startref="ix_sftengrcl" data-type="indexterm" id="id-kZsmH0hn"> </a></p>
|
||||
|
||||
<p class="right">—<em>Asim Husain<br>Vice President of Engineering, Google</em></p>
|
||||
|
||||
</section>
|
||||
|
||||
</body>
|
||||
</html>
|
23
clones/abseil.io/resources/swe-book/html/foreword.html
Normal file
|
@ -0,0 +1,23 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="foreword" id="foreword">
|
||||
<h1>Foreword</h1>
|
||||
|
||||
<p>I have always been endlessly fascinated with the details of how Google does things. I have grilled my Googler friends for information about the way things really work inside of the company. How do they manage such a massive, monolithic code repository without falling over? How do tens of thousands of engineers successfully collaborate on thousands of projects? How do they maintain the quality of their systems?</p>
|
||||
<p>Working with former Googlers has only increased my curiosity. If you’ve ever worked with a former Google engineer (or "Xoogler," as they’re sometimes called), you’ve no doubt heard the phrase "at Google we…" Coming out of Google into other companies seems to be a shocking experience, at least from the engineering side of things. As far as this outsider can tell, the systems and processes for writing code at Google must be among the best in the world, given both the scale of the company and how often people sing their praises.</p>
|
||||
<p>In <em>Software Engineering at Google</em>, a set of Googlers (and some Xooglers) gives us a lengthy blueprint for many of the practices, tools, and even cultural elements that underlie software engineering at Google. It’s easy to overfocus on the amazing tools that Google has built to support writing code, and this book provides a lot of details about those tools. But it also goes beyond simply describing the tooling to give us the philosophy and processes that the teams at Google follow. These can be adapted to fit a variety of circumstances, whether or not you have the scale and tooling. To my delight, there are several chapters that go deep on various aspects of automated testing, a topic that continues to meet with too much resistance in our industry.</p>
|
||||
<p>The great thing about tech is that there is never only one way to do something. Instead, there is a series of trade-offs we all must make depending on the circumstances of our team and situation. What can we cheaply take from open source? What can our team build? What makes sense to support for our scale? When I was grilling my Googler friends, I wanted to hear about the world at the extreme end of scale: resource rich, in both talent and money, with high demands on the software being built. This anecdotal information gave me ideas on some options that I might not otherwise have considered.</p>
|
||||
<p>With this book, we've written down those options for everyone to read. Of course, Google is a unique company, and it would be foolish to assume that the right way to run your software engineering organization is to precisely copy their formula. Applied practically, this book will give you ideas on how things could be done, and a lot of information that you can use to bolster your arguments for adopting best practices like testing, knowledge sharing, and building collaborative teams.</p>
|
||||
<p>You may never need to build Google yourself, and you may not even want to reach for the same techniques they apply in your organization. But if you aren’t familiar with the practices Google has developed, you’re missing a perspective on software engineering that comes from tens of thousands of engineers working collaboratively on software over the course of more than two decades. That knowledge is far too valuable to ignore.</p>
|
||||
<p class="byline">Camille Fournier<br>Author, <span class="plain">The Manager's Path</span></p>
|
||||
</section>
|
||||
|
||||
</body>
|
||||
</html>
|
BIN
clones/abseil.io/resources/swe-book/html/images/seag_0101.png
Normal file
After Width: | Height: | Size: 24 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_0102.png
Normal file
After Width: | Height: | Size: 27 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_0601.png
Normal file
After Width: | Height: | Size: 31 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_0602.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1101.png
Normal file
After Width: | Height: | Size: 740 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1103.png
Normal file
After Width: | Height: | Size: 20 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1104.png
Normal file
After Width: | Height: | Size: 21 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1105.png
Normal file
After Width: | Height: | Size: 88 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1401.png
Normal file
After Width: | Height: | Size: 12 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1402.png
Normal file
After Width: | Height: | Size: 57 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1403.png
Normal file
After Width: | Height: | Size: 62 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1404.png
Normal file
After Width: | Height: | Size: 20 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1405.png
Normal file
After Width: | Height: | Size: 41 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1406.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1601.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1701.png
Normal file
After Width: | Height: | Size: 282 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1702.png
Normal file
After Width: | Height: | Size: 602 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1703.png
Normal file
After Width: | Height: | Size: 38 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1704.png
Normal file
After Width: | Height: | Size: 28 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1801.png
Normal file
After Width: | Height: | Size: 5.8 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1802.png
Normal file
After Width: | Height: | Size: 15 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1803.png
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1804.png
Normal file
After Width: | Height: | Size: 29 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1805.png
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1901.png
Normal file
After Width: | Height: | Size: 16 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1902.png
Normal file
After Width: | Height: | Size: 46 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1903.png
Normal file
After Width: | Height: | Size: 256 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1904.png
Normal file
After Width: | Height: | Size: 211 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1905.png
Normal file
After Width: | Height: | Size: 93 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1906.png
Normal file
After Width: | Height: | Size: 176 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1907.png
Normal file
After Width: | Height: | Size: 91 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_1908.png
Normal file
After Width: | Height: | Size: 267 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2001.png
Normal file
After Width: | Height: | Size: 147 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2002.png
Normal file
After Width: | Height: | Size: 26 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2101.png
Normal file
After Width: | Height: | Size: 13 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2301.png
Normal file
After Width: | Height: | Size: 27 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2302.png
Normal file
After Width: | Height: | Size: 47 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2303.png
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2304.png
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
clones/abseil.io/resources/swe-book/html/images/seag_2305.png
Normal file
After Width: | Height: | Size: 20 KiB |
8644
clones/abseil.io/resources/swe-book/html/ix.html
Normal file
16
clones/abseil.io/resources/swe-book/html/part1.html
Normal file
|
@ -0,0 +1,16 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<div xmlns="http://www.w3.org/1999/xhtml" data-type="part" id="thesis" class="pagenumrestart">
|
||||
<h1><span class="label">Part I. </span>Thesis</h1>
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
</html>
|
21
clones/abseil.io/resources/swe-book/html/part2.html
Normal file
|
@ -0,0 +1,21 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<div xmlns="http://www.w3.org/1999/xhtml" data-type="part" id="culture">
|
||||
<h1><span class="label">Part II. </span>Culture</h1>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
</html>
|
23
clones/abseil.io/resources/swe-book/html/part3.html
Normal file
|
@ -0,0 +1,23 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<div xmlns="http://www.w3.org/1999/xhtml" data-type="part" id="processes">
|
||||
<h1><span class="label">Part III. </span>Processes</h1>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
</html>
|
25
clones/abseil.io/resources/swe-book/html/part4.html
Normal file
|
@ -0,0 +1,25 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<div xmlns="http://www.w3.org/1999/xhtml" data-type="part" id="tools">
|
||||
<h1><span class="label">Part IV. </span>Tools</h1>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
</html>
|
16
clones/abseil.io/resources/swe-book/html/part5.html
Normal file
|
@ -0,0 +1,16 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<div xmlns="http://www.w3.org/1999/xhtml" data-type="part" id="conclusion-id00031">
|
||||
<h1><span class="label">Part V. </span>Conclusion</h1>
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
</html>
|
271
clones/abseil.io/resources/swe-book/html/pr01.html
Normal file
|
@ -0,0 +1,271 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Software Engineering at Google</title>
|
||||
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
|
||||
<link rel="stylesheet" type="text/css" href="theme/html/html.css">
|
||||
</head>
|
||||
<body data-type="book">
|
||||
<section xmlns="http://www.w3.org/1999/xhtml" data-type="preface" id="preface">
|
||||
<h1>Preface</h1>
|
||||
|
||||
<p>This book is titled <em>Software Engineering at Google</em>. What precisely do we mean by software engineering? What distinguishes “software engineering” from “programming” or “computer science”? And why would Google have a unique perspective to add to the corpus of previous software engineering literature written over the past 50 years?</p>
|
||||
|
||||
<p>The terms “programming” and “software engineering” have been used interchangeably for quite some time in our industry, although each term has a different emphasis and different implications. University students tend to study computer science and get jobs writing code as “programmers.”</p>
|
||||
|
||||
<p>“Software engineering,” however, sounds more serious, as if it implies the application of some theoretical knowledge to build something real and precise. Mechanical engineers, civil engineers, aeronautical engineers, and those in other engineering disciplines all practice engineering. They all work in the real world and use the application of their theoretical knowledge to create something real. Software engineers also create “something real,” though it is less tangible than the things other engineers create.</p>
|
||||
|
||||
<p>Unlike those more established engineering professions, current software engineering theory or practice is not nearly as rigorous. Aeronautical engineers must follow rigid guidelines and practices, because errors in their calculations can cause real damage; programming, on the whole, has traditionally not followed such rigorous practices. But, as software becomes more integrated into our lives, we must adopt and rely on more rigorous engineering methods. We hope this book helps others see a path toward more reliable software practices.</p>
|
||||
|
||||
<section data-type="sect1" id="programming_over_time">
|
||||
<h1>Programming Over Time</h1>
|
||||
|
||||
<p>We propose that “software engineering” encompasses not just the act of writing code, but all of the tools and processes an organization uses to build and maintain that code over time. What practices can a software organization introduce that will best keep its code valuable over the long term? How can engineers make a codebase more sustainable and the software engineering discipline itself more rigorous? We don’t have fundamental answers to these questions, but we hope that Google’s collective experience over the past two decades illuminates possible paths toward finding those answers.</p>
|
||||
|
||||
<p>One key insight we share in this book is that software engineering can be thought of as “programming integrated over time.” What practices can we introduce to our code to make it <em>sustainable</em>—able to react to necessary change—over its life cycle, from conception to introduction to maintenance to deprecation?</p>
|
||||
|
||||
<p>The book emphasizes three fundamental principles that we feel software organizations should keep in mind when designing, architecting, and writing their code:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Time and Change</dt>
|
||||
<dd><p>How code will need to adapt over the length of its life</p></dd>
|
||||
<dt>Scale and Growth</dt>
|
||||
<dd><p>How an organization will need to adapt as it evolves</p></dd>
|
||||
<dt>Trade-offs and Costs</dt>
|
||||
<dd><p>How an organization makes decisions, based on the lessons of Time and Change and Scale and Growth</p></dd>
|
||||
</dl>
|
||||
|
||||
<p>Throughout the chapters, we have tried to tie back to these themes and point out ways in which such principles affect engineering practices and allow them to be sustainable. (See <a data-type="xref" href="ch01.html#what_is_software_engineeringquestion_ma">What Is Software Engineering?</a> for a full discussion.)</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="googleapostrophes_perspective">
|
||||
<h1>Google’s Perspective</h1>
|
||||
|
||||
<p>Google has a unique perspective on the growth and evolution of a sustainable software ecosystem, stemming from our scale and longevity. We hope that the lessons we have learned will be useful as your organization evolves and embraces more sustainable practices.</p>
|
||||
|
||||
<p>We’ve divided the topics in this book into three main aspects of Google’s software engineering landscape:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Culture</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Processes</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Tools</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Google’s culture is unique, but the lessons we have learned in developing our engineering culture are widely applicable. Our chapters on Culture (<a data-type="xref" href="part2.html#culture">Culture</a>) emphasize the collective nature of a software development enterprise, that the development of software is a team effort, and that proper cultural principles are essential for an organization to grow and remain healthy.</p>
|
||||
|
||||
<p class="pagebreak-before">The techniques outlined in our Processes chapters (<a data-type="xref" href="part3.html#processes">Processes</a>) are familiar to most software engineers, but Google’s large size and long-lived codebase provides a more complete stress test for developing best practices. Within those chapters, we have tried to emphasize what we have found to work over time and at scale as well as identify areas where we don’t yet have satisfying answers.</p>
|
||||
|
||||
<p>Finally, our Tools chapters (<a data-type="xref" href="part4.html#tools">Tools</a>) illustrate how we leverage our investments in tooling infrastructure to provide benefits to our codebase as it both grows and ages. In some cases, these tools are specific to Google, though we point out open source or third-party alternatives where applicable. We expect that these basic insights apply to most engineering organizations.</p>
|
||||
|
||||
<p>The culture, processes, and tools outlined in this book describe the lessons that a typical software engineer hopefully learns on the job. Google certainly doesn’t have a monopoly on good advice, and our experiences presented here are not intended to dictate what your organization should do. This book is our perspective, but we hope you will find it useful, either by adopting these lessons directly or by using them as a starting point when considering your own practices, specialized for your own problem domain.</p>
|
||||
|
||||
<p>Neither is this book intended to be a sermon. Google itself still imperfectly applies many of the concepts within these pages. The lessons that we have learned, we learned through our failures: we still make mistakes, implement imperfect solutions, and need to iterate toward improvement. Yet the sheer size of Google’s engineering organization ensures that there is a diversity of solutions for every problem. We hope that this book contains the best of that group.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="what_this_book_isnapostrophet">
|
||||
<h1>What This Book Isn’t</h1>
|
||||
|
||||
<p>This book is not meant to cover software design, a discipline that requires its own book (and for which much content already exists). Although there is some code in this book for illustrative purposes, the principles are language neutral, and there is little actual “programming” advice within these chapters. As a result, this text doesn’t cover many important issues in software development: project management, API design, security hardening, internationalization, user interface frameworks, or other language-specific concerns. Their omission in this book does not imply their lack of importance. Instead, we choose not to cover them here knowing that we could not provide the treatment they deserve. We have tried to make the discussions in this book more about engineering and less about programming.</p>
|
||||
</section>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="parting_remarks">
|
||||
<h1 class="less_space">Parting Remarks</h1>
|
||||
|
||||
<p>This text has been a labor of love on behalf of all who have contributed, and we hope that you receive it as it is given: as a window into how a large software engineering organization builds its products. We also hope that it is one of many voices that helps move our industry to adopt more forward-thinking and sustainable practices. Most important, we further hope that you enjoy reading it and can adopt some of its lessons to your own concerns.</p>
|
||||
|
||||
<p class="byline">Tom Manshreck</p>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="_conventions_used_in_this_book">
|
||||
<h1>Conventions Used in This Book</h1>
|
||||
|
||||
<p>The following typographical conventions are used in this book:</p>
|
||||
|
||||
<dl>
|
||||
<dt><em>Italic</em></dt>
|
||||
<dd>
|
||||
<p>Indicates new terms, URLs, email addresses, filenames, and file extensions.</p>
|
||||
</dd>
|
||||
<dt><code>Constant width</code></dt>
|
||||
<dd>
|
||||
<p>Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.</p>
|
||||
</dd>
|
||||
<dt><strong><code>Constant width bold</code></strong></dt>
|
||||
<dd>
|
||||
<p>Shows commands or other text that should be typed literally by the user.</p>
|
||||
</dd>
|
||||
<dt><em><code>Constant width italic</code></em></dt>
|
||||
<dd>
|
||||
<p>Shows text that should be replaced with user-supplied values or by values determined by context.</p>
|
||||
</dd>
|
||||
</dl>
|
||||
|
||||
<div data-type="note" id="id-BoIafEIb"><h6>Note</h6>
|
||||
<p>This element signifies a general note.</p>
|
||||
</div>
|
||||
|
||||
</section>
|
||||
|
||||
<!--<section data-type="sect1" id="_using_code_examples">
|
||||
<h1>Using Code Examples</h1>
|
||||
PROD: Please reach out to author to find out if they will be uploading code examples to oreilly.com or their own site (e.g., GitHub). If there is no code download, delete this whole section. If there is, when you email digidist with the link, let them know what you filled in for title_title (should be as close to book title as possible, i.e., learning_python_2e). This info will determine where digidist loads the files.
|
||||
|
||||
<p>Supplemental material (code examples, exercises, etc.) is available for download at <a href="https://github.com/oreillymedia/title_title"><em class="hyperlink">https://github.com/oreillymedia/title_title</em></a>.</p>
|
||||
|
||||
<p>If you have a technical question or a problem using the code examples, please send email to <a class="email" href="mailto:bookquestions@oreilly.com"><em>bookquestions@oreilly.com</em></a>.</p>
|
||||
|
||||
<p>This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.</p>
|
||||
|
||||
<p>We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “<em>Software Engineering at Google</em> by Titus Winters, Tom Manshreck, and Hyrum Wright (O’Reilly). Copyright 2020 Google, LLC, 978-1-492-08279-8.”</p>
|
||||
|
||||
<p>If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at <a class="email" href="mailto:permissions@oreilly.com"><em>permissions@oreilly.com</em></a>.</p>
|
||||
</section>-->
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="_safari_books_online">
|
||||
<h1 class="less_space">O'Reilly Online Learning</h1>
|
||||
|
||||
<div class="ormenabled" data-type="note" id="id-N3IVFGSx"><h6>Note</h6>
|
||||
<p>For more than 40 years, <a class="orm:hideurl" href="https://oreilly.com"><em class="hyperlink">O'Reilly Media</em></a> has provided technology and business training, knowledge, and insight to help companies succeed.</p>
|
||||
</div>
|
||||
|
||||
<p>Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O'Reilly and 200+ other publishers. For more information, please visit <a class="orm:hideurl" href="https://www.oreilly.com"><em>http://oreilly.com</em></a>.</p>
|
||||
</section>
|
||||
|
||||
<section data-type="sect1" id="_how_to_contact_us">
|
||||
<h1>How to Contact Us</h1>
|
||||
|
||||
<p>Please address comments and questions concerning this book to the publisher:</p>
|
||||
|
||||
<ul class="simplelist">
|
||||
<li>O’Reilly Media, Inc.</li>
|
||||
<li>1005 Gravenstein Highway North</li>
|
||||
<li>Sebastopol, CA 95472</li>
|
||||
<li>800-998-9938 (in the United States or Canada)</li>
|
||||
<li>707-829-0515 (international or local)</li>
|
||||
<li>707-829-0104 (fax)</li>
|
||||
</ul>
|
||||
|
||||
<p>We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at <a href="https://oreil.ly/software-engineering-at-google"><em>https://oreil.ly/software-engineering-at-google</em></a>.</p>
|
||||
<!--Don't forget to update the link above.-->
|
||||
|
||||
<p>Email <a class="email" href="https://abseil.io/cdn-cgi/l/email-protection#ec8e8383879d99899f988583829fac839e8985808095c28f8381"><em><span class="__cf_email__" data-cfemail="bedcd1d1d5cfcbdbcdcad7d1d0cdfed1ccdbd7d2d2c790ddd1d3">[email protected]</span></em></a> to comment or ask technical questions about this book.</p>
|
||||
|
||||
<p>For news and more information about our books and courses, see our website at <a href="https://www.oreilly.com"><em class="hyperlink">http://www.oreilly.com</em></a>.</p>
|
||||
|
||||
<p>Find us on Facebook: <a href="https://facebook.com/oreilly"><em class="hyperlink">http://facebook.com/oreilly</em></a></p>
|
||||
|
||||
<p>Follow us on Twitter: <a href="https://twitter.com/oreillymedia"><em class="hyperlink">http://twitter.com/oreillymedia</em></a></p>
|
||||
|
||||
<p>Watch us on YouTube: <a href="https://www.youtube.com/oreillymedia"><em class="hyperlink">http://www.youtube.com/oreillymedia</em></a></p>
|
||||
</section>
|
||||
|
||||
<section class="pagebreak-before" data-type="sect1" id="_acknowledgments">
|
||||
<h1 class="less_space">Acknowledgments</h1>
|
||||
|
||||
<p>A book like this would not be possible without the work of countless others. All of the knowledge within this book has come to all of us through the experience of so many others at Google throughout our careers. We are the messengers; others came before us, at Google and elsewhere, and taught us what we now present to you. We cannot list all of you here, but we do wish to acknowledge you.</p>
|
||||
|
||||
<p>We’d also like to thank Melody Meckfessel for supporting this project in its infancy as well as Daniel Jasper and Danny Berlin for supporting it through its completion.</p>
|
||||
|
||||
<p>This book would not have been possible without the massive collaborative effort of our curators, authors, and editors. Although the authors and editors are specifically acknowledged in each chapter or callout, we’d like to take time to recognize those who contributed to each chapter by providing thoughtful input, discussion, and review.</p>
|
||||
|
||||
<ul class="list_style_type_none">
|
||||
<li>
|
||||
<p><strong>What Is Software Engineering?:</strong> Sanjay Ghemawat, Andrew Hyatt</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Working Well on Teams:</strong> Sibley Bacon, Joshua Morton</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Knowledge Sharing:</strong> Dimitri Glazkov, Kyle Lemons, John Reese, David Symonds, Andrew Trenk, James Tucker, David Kohlbrenner, Rodrigo Damazio Bovendorp</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Engineering for Equity:</strong> Kamau Bobb, Bruce Lee</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>How to Lead a Team:</strong> Jon Wiley, Laurent Le Brun</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Leading at Scale:</strong> Bryan O’Sullivan, Bharat Mediratta, Daniel Jasper, Shaindel Schwartz</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Measuring Engineering Productivity:</strong> Andrea Knight, Collin Green, Caitlin Sadowski, Max-Kanat Alexander, Yilei Yang</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Style Guides and Rules:</strong> Max Kanat-Alexander, Titus Winters, Matt Austern, James Dennett</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Code Review:</strong> Max Kanat-Alexander, Brian Ledger, Mark Barolak</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Documentation:</strong> Jonas Wagner, Smit Hinsu, Geoffrey Romer</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Testing Overview:</strong> Erik Kuefler, Andrew Trenk, Dillon Bly, Joseph Graves, Neal Norwitz, Jay Corbett, Mark Striebeck, Brad Green, Miško Hevery, Antoine <span class="keep-together">Picard</span>, Sarah Storck</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Unit Testing:</strong> Andrew Trenk, Adam Bender, Dillon Bly, Joseph Graves, Titus Winters, Hyrum Wright, Augie Fackler</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Testing Doubles:</strong> Joseph Graves, Gennadiy Civil, Adam Bender, Augie Fackler, Erik Kuefler, James Youngman</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Larger Testing:</strong> Adam Bender, Andrew Trenk, Erik Kuefler, Matthew <span class="keep-together">Beaumont-Gay</span></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Deprecation:</strong> Greg Miller, Andy Shulman</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Version Control and Branch Management:</strong> Rachel Potvin, Victoria Clarke</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Code Search:</strong> Jenny Wang</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Build Systems and Build Philosophy:</strong> Hyrum Wright, Titus Winters, Adam Bender, Jeff Cox, Jacques Pienaar</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Critique: Google’s Code Review Tool:</strong> Mikołaj Dądela, Hermann Loose, Eva May, Alice Kober-Sotzek, Edwin Kempin, Patrick Hiesel, Ole Rehmsen, Jan Macek</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Static Analysis:</strong> Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, Edward Aftandilian, Collin Winter, Eric Haugh</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Dependency Management:</strong> Russ Cox, Nicholas Dunn</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Large-Scale Changes:</strong> Matthew Fowles Kulukundis, Adam Zarek</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Continuous Integration:</strong> Jeff Listfield, John Penix, Kaushik Sridharan, Sanjeev Dhanda</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Continuous Delivery:</strong> Dave Owens, Sheri Shipe, Bobbi Jones, Matt Duftler, Brian Szuter</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><strong>Compute Services:</strong> Tim Hockin, Collin Winter, Jarek Kuśmierek</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>Additionally, we’d like to thank Betsy Beyer for sharing her insight and experience in having published the original <em>Site Reliability Engineering</em> book, which made our experience much smoother. Christopher Guzikowski and Alicia Young at O’Reilly did an awesome job launching and guiding this project to publication.</p>
|
||||
|
||||
<p>The curators would also like to personally thank the following people:</p>
|
||||
|
||||
<p><strong>Tom Manshreck:</strong> To my mom and dad for making me believe in myself—and working with me at the kitchen table to do my homework.</p>
|
||||
|
||||
<p><strong>Titus Winters:</strong> To Dad, for my path. To Mom, for my voice. To Victoria, for my heart. To Raf, for having my back. Also, to Mr. Snyder, Ranwa, Z, Mike, Zach, Tom (and all the Paynes), mec, Toby, cgd, and Melody for lessons, mentorship, and trust.</p>
|
||||
|
||||
<p><strong>Hyrum Wright:</strong> To Mom and Dad for their encouragement. To Bryan and the denizens of Bakerland, for my first foray into software. To Dewayne, for continuing that journey. To Hannah, Jonathan, Charlotte, Spencer, and Ben for their love and interest. To Heather for being there through it all.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<script data-cfasync="false" src="../../../cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script></body>
|
||||
</html>
|
1529
clones/abseil.io/resources/swe-book/html/theme/html/html.css
Normal file
1995
clones/abseil.io/resources/swe-book/html/toc.html
Normal file
6745
clones/asdf.common-lisp.dev/asdf.html
Normal file
294
clones/colinallen.dnsalias.org/lp/lp.html
Normal file
|
@ -0,0 +1,294 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE>Lisp Primer</TITLE>
|
||||
<meta name="description" value="Lisp Primer">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<A HREF="node1.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<IMG ALIGN=BOTTOM ALT="up" SRC="up_motif_gr.gif">
|
||||
<IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif_gr.gif">
|
||||
<BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node1.html"> Preface</A>
|
||||
<HR>
|
||||
<H1>Lisp Primer</H1>
|
||||
<STRONG><a href="https://colinallen.dnsalias.org/~colallen/">Colin Allen</a> -
|
||||
<a href="mailto:mdhagat@yahoo.com">Maneesh Dhagat</a>
|
||||
<P>copyright 1996-2020</STRONG><P>
|
||||
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node1.html"> Preface and Terms of Use</A>
|
||||
<LI>
|
||||
<A HREF="node2.html"> LISt Processing</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node3.html"> Background and Getting Started</A>
|
||||
<LI>
|
||||
<A HREF="node4.html"> Basic Data Types</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node5.html"> Atoms</A>
|
||||
<LI>
|
||||
<A HREF="node6.html"> Lists</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node7.html"> Some Primitive Functions</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node8.html"> Constructors: Cons, List, and Append</A>
|
||||
<LI>
|
||||
<A HREF="node9.html"> Quote</A>
|
||||
<LI>
|
||||
<A HREF="node10.html"> Selectors: First and Rest</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node11.html"> Changing Variable Values</A>
|
||||
<LI>
|
||||
<A HREF="node12.html"> More Functions and Predicates</A>
|
||||
<LI>
|
||||
<A HREF="node13.html"> Setf</A>
|
||||
<LI>
|
||||
<A HREF="node14.html"> Exercises</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node15.html"> Defining Lisp functions</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node16.html"> Defining Functions: Defun</A>
|
||||
<LI>
|
||||
<A HREF="node17.html"> Local and Global Variables</A>
|
||||
<LI>
|
||||
<A HREF="node18.html"> Using an Editor</A>
|
||||
<LI>
|
||||
<A HREF="node19.html"> Using Your Own Definitions in New Functions</A>
|
||||
<LI>
|
||||
<A HREF="node20.html"> Functions with Extended Bodies</A>
|
||||
<LI>
|
||||
<A HREF="node21.html"> Conditional Control</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node22.html"> If</A>
|
||||
<LI>
|
||||
<A HREF="node23.html"> Cond</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node24.html"> More Predicates and Functions</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node25.html"> Equality Predicates</A>
|
||||
<LI>
|
||||
<A HREF="node26.html"> Checking for NIL</A>
|
||||
<LI>
|
||||
<A HREF="node27.html"> Logical Operators: And and Or</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node28.html"> Exercises</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node29.html"> Recursion and Iteration</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node30.html"> Recursive Definitions</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node31.html"> A Simple Example</A>
|
||||
<LI>
|
||||
<A HREF="node32.html"> Using Trace To Watch Recursion</A>
|
||||
<LI>
|
||||
<A HREF="node33.html"> Another Example</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node34.html"> Iteration Using Dotimes</A>
|
||||
<LI>
|
||||
<A HREF="node35.html"> Local Variables Using Let</A>
|
||||
<LI>
|
||||
<A HREF="node36.html"> Iteration Using Dolist</A>
|
||||
<LI>
|
||||
<A HREF="node37.html"> When To Use Recursion/When To Use Iteration</A>
|
||||
<LI>
|
||||
<A HREF="node38.html"> Tail Recursion</A>
|
||||
<LI>
|
||||
<A HREF="node39.html"> Timing Function Calls</A>
|
||||
<LI>
|
||||
<A HREF="node40.html"> Exercises</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node41.html"> Programming Techniques</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node42.html"> A Word about Lisp</A>
|
||||
<LI>
|
||||
<A HREF="node43.html"> Recursion on Simple Lists</A>
|
||||
<LI>
|
||||
<A HREF="node44.html"> Recursion on Nested Lists and Expressions</A>
|
||||
<LI>
|
||||
<A HREF="node45.html"> Recursion on Numbers</A>
|
||||
<LI>
|
||||
<A HREF="node46.html"> Ensuring Proper Termination</A>
|
||||
<LI>
|
||||
<A HREF="node47.html"> Abstraction</A>
|
||||
<LI>
|
||||
<A HREF="node48.html"> Summary of Rules</A>
|
||||
<LI>
|
||||
<A HREF="node49.html"> Exercises</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node50.html"> Simple Data Structures in Lisp</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node51.html"> Association Lists</A>
|
||||
<LI>
|
||||
<A HREF="node52.html"> Property Lists</A>
|
||||
<LI>
|
||||
<A HREF="node53.html"> Arrays, Vectors, and Strings</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node54.html"> Arrays and Vectors</A>
|
||||
<LI>
|
||||
<A HREF="node55.html"> Strings</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node56.html"> Defstruct</A>
|
||||
<LI>
|
||||
<A HREF="node57.html"> Exercises.</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node58.html"> Input and Output</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node59.html"> Basic Printing</A>
|
||||
<LI>
|
||||
<A HREF="node60.html"> Nicer Output Using Format</A>
|
||||
<LI>
|
||||
<A HREF="node61.html"> Reading</A>
|
||||
<LI>
|
||||
<A HREF="node62.html"> Input and Output to Files</A>
|
||||
<LI>
|
||||
<A HREF="node63.html"> Converting Strings to Lists</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node64.html"> Functions, Lambda Expressions, and Macros</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node65.html"> Eval</A>
|
||||
<LI>
|
||||
<A HREF="node66.html"> Lambda Expressions</A>
|
||||
<LI>
|
||||
<A HREF="node67.html"> Funcall</A>
|
||||
<LI>
|
||||
<A HREF="node68.html"> Apply</A>
|
||||
<LI>
|
||||
<A HREF="node69.html"> Mapcar</A>
|
||||
<LI>
|
||||
<A HREF="node70.html"> Backquote and Commas</A>
|
||||
<LI>
|
||||
<A HREF="node71.html"> Defmacro</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp primitives</A>
|
||||
<UL>
|
||||
<LI>
|
||||
<A HREF="node73.html"> * (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node74.html"> + (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node75.html"> - (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node76.html"> 1+, 1- (FUNCTIONS)</A>
|
||||
<LI>
|
||||
<A HREF="node77.html"> = (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node78.html"> <, >, <=, >= (PREDICATES)</A>
|
||||
<LI>
|
||||
<A HREF="node79.html"> and (MACRO)</A>
|
||||
<LI>
|
||||
<A HREF="node80.html"> append (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node81.html"> apply (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node82.html"> atom (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node83.html"> butlast (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node84.html"> car (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node85.html"> caar, cadr, cdar, cddr, etc. (FUNCTIONS)</A>
|
||||
<LI>
|
||||
<A HREF="node86.html"> cdr (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node87.html"> cond (MACRO)</A>
|
||||
<LI>
|
||||
<A HREF="node88.html"> cons (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node89.html"> defun (MACRO)</A>
|
||||
<LI>
|
||||
<A HREF="node90.html"> do (SPECIAL FORM)</A>
|
||||
<LI>
|
||||
<A HREF="node91.html"> documentation (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node92.html"> eql (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node93.html"> eval (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node94.html"> evenp, oddp (PREDICATES)</A>
|
||||
<LI>
|
||||
<A HREF="node95.html"> first (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node96.html"> if (SPECIAL FORM)</A>
|
||||
<LI>
|
||||
<A HREF="node97.html"> length (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node98.html"> let (SPECIAL FORM)</A>
|
||||
<LI>
|
||||
<A HREF="node99.html"> list (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node100.html"> listp (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node101.html"> mapcar (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node102.html"> max, min (FUNCTIONS)</A>
|
||||
<LI>
|
||||
<A HREF="node103.html"> member (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node104.html"> not (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node105.html"> nth (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node106.html"> nthcdr (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node107.html"> null (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node108.html"> numberp (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node109.html"> or (MACRO)</A>
|
||||
<LI>
|
||||
<A HREF="node110.html"> read (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node111.html"> reverse (FUNCTION)</A>
|
||||
<LI>
|
||||
<A HREF="node112.html"> second, third, etc. (FUNCTIONS)</A>
|
||||
<LI>
|
||||
<A HREF="node113.html"> setf (MACRO)</A>
|
||||
<LI>
|
||||
<A HREF="node114.html"> symbolp (PREDICATE)</A>
|
||||
<LI>
|
||||
<A HREF="node115.html"> y-or-n-p, yes-or-no-p (PREDICATES)</A>
|
||||
</UL>
|
||||
<LI>
|
||||
<A HREF="node116.html"> About this document ... </A>
|
||||
</UL>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
Tue Feb 6, 2001</I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
BIN
clones/colinallen.dnsalias.org/lp/lp.pdf
Normal file
BIN
clones/colinallen.dnsalias.org/lp/next_motif.gif
Normal file
After Width: | Height: | Size: 172 B |
BIN
clones/colinallen.dnsalias.org/lp/next_motif_gr.gif
Normal file
After Width: | Height: | Size: 172 B |
62
clones/colinallen.dnsalias.org/lp/node-rest.html
Normal file
|
@ -0,0 +1,62 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> rest (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" rest (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node111.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node110.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node111.html"> reverse (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node110.html"> read (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> rest (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (rest <exp> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <exp> </tt>: any Lisp expression which returns a list
|
||||
<P>
|
||||
The argument expression must evaluate to a list; rest returns all but the first element of this list. If the list is empty, i.e. is nil, rest returns nil.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (rest '(1 2 3))
|
||||
(2 3)
|
||||
|
||||
> (rest '((a (b (c)) d) e (f)))
|
||||
(E (F))
|
||||
|
||||
> (rest '(z))
|
||||
NIL
|
||||
|
||||
> (rest ())
|
||||
NIL
|
||||
|
||||
> (rest 'a)
|
||||
Error: A is not of type LIST
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
88
clones/colinallen.dnsalias.org/lp/node1.html
Normal file
|
@ -0,0 +1,88 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> Preface</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" Preface">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node2.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="lp.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="lp.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<B> Next:</B>
|
||||
<A HREF="node2.html"> Lisp Primer</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="lp.html">Lisp Primer</A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="lp.html">Lisp Primer</A>
|
||||
<BR>
|
||||
<HR> <P>
|
||||
<H1>Lisp Primer</H1>
|
||||
<P><STRONG><a href="http://mypage.iu.edu/~colallen/">Colin Allen</a>
|
||||
-
|
||||
<a href="mailto:mdhagat@yahoo.com">Maneesh Dhagat</a>
|
||||
|
||||
<P>© 1996-1999</STRONG><P>
|
||||
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<H1> Preface</H1>
|
||||
<P>
|
||||
This text has been written to provide a quick introduction to the
|
||||
basic elements of Common Lisp for both experienced and novice
|
||||
programmers. It is not intended to be a comprehensive account of the
|
||||
language for, in our experience, it takes only a little introduction
|
||||
before most Lisp programmers are able to turn to Guy L. Steele, Jr.'s,
|
||||
<a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/cltl2.html">
|
||||
<em>Common Lisp: The Language</em></a> (2nd Edition, Digital Press, 1990) or to the <a href="http://www.lisp.org/HyperSpec/FrontMatter">ANSI Common Lisp specifications</a>
|
||||
for all their reference needs. Readers who have a basic understanding of Lisp and are looking for some A.I. applications may find useful the examples at <a href="http://www.cs.utexas.edu/users/ml/ml-progs.html">http://www.cs.utexas.edu/users/ml/ml-progs.html</a>.
|
||||
<P>
|
||||
This text has been used successfully with honors undergraduate
|
||||
students at <a href="http://www.tamu.edu/">Texas A&M University</a>.
|
||||
We now make it available on the World Wide Web in the hope that it
|
||||
will provide useful to the Lisp community at large.
|
||||
<P>
|
||||
Some parts of this text are incomplete. We have in mind to complete
|
||||
those parts if there is enough interest. If you find the text useful,
|
||||
please drop us a line. <!--Our idea is to make this text available on the
|
||||
same model as shareware software---if you like it please also let us
|
||||
know what you think would be a reasonable contribution for its use.-->
|
||||
|
||||
<P>
|
||||
Academic sites are free to mirror the html code provided that these
|
||||
conditions are respected:
|
||||
<ol>
|
||||
<li>This preface and copyright statement are left unchanged.
|
||||
<li>The authors are notified by email of the URL of the site where
|
||||
<i>Lisp Primer</i> is mirrored.
|
||||
<li>Any emendations of the html code are shared with the authors with
|
||||
the understanding that they may be incorporated into the master copy.
|
||||
</ol>
|
||||
|
||||
If you agree to these conditions, the source code for <i>Lisp
|
||||
Primer</i> can be retrieved at
|
||||
<a href=http://mypage.iu.edu/~colallen/lp/lphtml.tar.gz>lphtml.tar.gz</a>
|
||||
(approx 70K).
|
||||
|
||||
(<a href="relinks.html">List of existing mirror sites</a>)
|
||||
|
||||
<p>
|
||||
A <a href="lp.pdf">PDF</a> version of the original version of the Primer is provided for reading convenience, but it has not been maintained and so contains some omissions and errors that have been corrected in the online version.
|
||||
|
||||
<p>
|
||||
Colin Allen <<a href="mailto:colallen%40indiana%2eedu">colallen<abbr title=" at ">@</abbr>indiana<abbr title=" dot " class="plain">.</abbr>edu</a>> and Maneesh Dhagat
|
||||
<mdhagat@yahoo.com>
|
||||
|
||||
<P>
|
||||
<a href="lp.html">Contents</a>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007</I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
84
clones/colinallen.dnsalias.org/lp/node10.html
Normal file
|
@ -0,0 +1,84 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> Selectors: First and Rest</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" Selectors: First and Rest">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node11.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node7.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node9.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node11.html"> Changing Variable Values</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node7.html"> Some Primitive Functions</A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node9.html"> Quote</A>
|
||||
<BR> <HR> <P>
|
||||
<H2> Selectors: First and Rest</H2>
|
||||
<P>
|
||||
There are two primitive list selectors. Historically, these were
|
||||
known as car and cdr, but these names were hard to explain since they
|
||||
referred to the contents of various hardware registers in computers
|
||||
running Lisp. In Common Lisp the functions have been given
|
||||
alternative names, first and rest, respectively. (You can still use
|
||||
the old names in Common Lisp. One of us learned Lisp in the old days, so
|
||||
occasionally we'll use car or cdr instead of first or rest.)
|
||||
<P>
|
||||
First takes a list as an argument and returns the first element of
|
||||
that list. It works like this:
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (first '(a s d f))
|
||||
a
|
||||
> (first '((a s) d f))
|
||||
(a s)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
Rest takes a list as an argument and returns the list, minus its
|
||||
first element.
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (rest '(a s d f))
|
||||
(s d f)
|
||||
> (rest '((a s) d f))
|
||||
(d f)
|
||||
> (rest '((a s) (d f)))
|
||||
((d f))
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
You can use setq to save yourself some typing. Do the following:
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (setq a '(a s d f))
|
||||
(a s d f)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
You can now use a instead of repeating the list (a s d f) every time.
|
||||
So:
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (first a)
|
||||
a
|
||||
> (rest a)
|
||||
(s d f)
|
||||
> (first (rest a))
|
||||
s
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
You can figure out the rest, like how to get at the third and fourth
|
||||
elements of the list using first and rest.
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
56
clones/colinallen.dnsalias.org/lp/node100.html
Normal file
|
@ -0,0 +1,56 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> listp (PREDICATE)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" listp (PREDICATE)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node101.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node99.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node101.html"> mapcar (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node99.html"> list (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> listp (PREDICATE)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (listp <exp> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <exp> </tt>: any Lisp expression.
|
||||
<P>
|
||||
Returns T if <tt> <exp> </tt> is of the data type list; NIL otherwise.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (listp '(a s d f))
|
||||
T
|
||||
|
||||
> (listp 3)
|
||||
NIL
|
||||
|
||||
> (listp (cons '1 '(2 3 4)))
|
||||
T
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
65
clones/colinallen.dnsalias.org/lp/node101.html
Normal file
|
@ -0,0 +1,65 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> mapcar (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" mapcar (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node102.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node100.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node102.html"> maxmin (FUNCTIONS)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node100.html"> listp (PREDICATE)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> mapcar (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (mapcar <func> <lis1> . . . <lisN> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
2
|
||||
<P>
|
||||
First argument names a function (usually quoted). Subsequent arguments must evaluate to lists.
|
||||
<P>
|
||||
Mapcar applies the named function successively to the first, second, third, etc. elements of the subsequent arguments and returns a list of the results, up to the length of the shortest list provided.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (mapcar '+ '(1 2 3))
|
||||
(1 2 3)
|
||||
|
||||
> (mapcar '+ '(1 2 3) '(4 5 6))
|
||||
(5 7 9)
|
||||
|
||||
> (mapcar '+ '(1 2 3) '(4 5 6) '(7 8 9))
|
||||
(12 15 18)
|
||||
|
||||
> (mapcar '+ '(1 2) '(3 4 5))
|
||||
(4 6)
|
||||
|
||||
> (mapcar '< '(1 2 3) '(4 5 0))
|
||||
(T T NIL)
|
||||
|
||||
> (mapcar '< '(1 2 3) '(4 5))
|
||||
(T T)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
60
clones/colinallen.dnsalias.org/lp/node102.html
Normal file
|
@ -0,0 +1,60 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> max, min (FUNCTIONS)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" max, min (FUNCTIONS)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node103.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node101.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node103.html"> member (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node101.html"> mapcar (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> max, min (FUNCTIONS)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (max <num1> ... <numN> )</tt>
|
||||
<tt> (min <num1> ... <numN> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <num1> ... <numN> </tt> must all evaluate to numbers
|
||||
<P>
|
||||
Returns the numerical maximum (minimum) of the arguments given.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (max 1 4 3 15 (* 9 2))
|
||||
18
|
||||
|
||||
> (min 3 4 (- 7 19) 5 6.0)
|
||||
-12
|
||||
|
||||
> (max 3)
|
||||
3
|
||||
|
||||
> (min 4)
|
||||
4
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
99
clones/colinallen.dnsalias.org/lp/node103.html
Normal file
|
@ -0,0 +1,99 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> member (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" member (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node104.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node102.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node104.html"> not (PREDICATE)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node102.html"> maxmin (FUNCTIONS)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> member (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (member <item> <list> </tt>
|
||||
<tt> :test <test> </tt>
|
||||
<tt> :test-not <test-not> </tt>
|
||||
<tt> :key <key> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
2
|
||||
<P>
|
||||
<tt> <item> </tt>: Any Lisp expression
|
||||
<tt> <list> </tt>: A expression which returns a list
|
||||
<P>
|
||||
<b> Keyword arguments:</b>
|
||||
3
|
||||
<P>
|
||||
<tt> <test> /<test-not> </tt>: A function or lambda expression that
|
||||
can be applied to compare <tt> <item> </tt> with elements of <tt> <list> </tt>.
|
||||
<tt> <key> </tt>: A function or lambda expression that can be applied
|
||||
to elements of <tt> <list> </tt>.
|
||||
<P>
|
||||
The elements of <tt> <list> </tt> are compared with the <tt> <item> </tt>. If
|
||||
<tt> <test> </tt> is not specified, eq is used; otherwise <tt> <test> </tt> is used.
|
||||
If <tt> <item> </tt> is found to match an element of <tt> <list> </tt>, a list containing
|
||||
all the elements from <tt> <item> </tt> to the end of <tt> <list> </tt> is returned.
|
||||
Otherwise NIL is returned. If <tt> <test-not> </tt> is specified, member
|
||||
returns a list beginning with the first UNmatched element of <tt> <list> </tt>.
|
||||
Specifying a <tt> <key> </tt> causes member to compare <tt> <item> </tt> with the
|
||||
result of applying <tt> <key> </tt> to each element of <tt> <list> </tt>, rather than
|
||||
to the element itself.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (member 'riker '(picard riker worf crusher))
|
||||
(RIKER WORF CRUSHER)
|
||||
|
||||
> (member '(lieutenant worf)
|
||||
'((captain picard)
|
||||
(commander riker)
|
||||
(lieutenant worf)
|
||||
(ensign crusher)))
|
||||
NIL
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (member '(lieutenant worf)
|
||||
'((captain picard)
|
||||
(commander riker)
|
||||
(lieutenant worf)
|
||||
(ensign crusher))
|
||||
:test #'equal)
|
||||
((LIEUTENANT WORF) (ENSIGN CRUSHER))
|
||||
|
||||
> (member 'picard '(picard riker worf crusher) :test-not #'eq)
|
||||
(RIKER WORF CRUSHER)
|
||||
|
||||
> (member 'worf
|
||||
'((captain picard)
|
||||
(commander riker)
|
||||
(lieutenant worf)
|
||||
(ensign crusher))
|
||||
:key #'second)
|
||||
((LIEUTENANT WORF) (ENSIGN CRUSHER))
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
42
clones/colinallen.dnsalias.org/lp/node104.html
Normal file
|
@ -0,0 +1,42 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> not (PREDICATE)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" not (PREDICATE)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node105.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node103.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node105.html"> nth (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node103.html"> member (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> not (PREDICATE)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (not <exp> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <exp> </tt>: any Lisp expression.
|
||||
<P>
|
||||
See entry for null. Not is identical to null; its use is preferred for when <tt> <exp> </tt> is not a list.
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
60
clones/colinallen.dnsalias.org/lp/node105.html
Normal file
|
@ -0,0 +1,60 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> nth (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" nth (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node106.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node104.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node106.html"> nthcdr (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node104.html"> not (PREDICATE)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> nth (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (nth <index> <list> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
2
|
||||
<P>
|
||||
<tt> <index> </tt>: any expression which returns a positive integer (fixnum).
|
||||
<tt> <list> </tt>: any expression which returns a list.
|
||||
<P>
|
||||
The function nth returns the indexed element of <tt> <list> </tt>. <tt> <index> </tt> must
|
||||
be a non-negative integer. 0 indicates the first element of <tt> <list> </tt>,
|
||||
1 the second, etc. An index past the end of the list will cause nth to
|
||||
return nil.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (nth 0 '(picard riker work crusher))
|
||||
PICARD
|
||||
|
||||
> (nth 2 '((captain picard)
|
||||
(commander riker)
|
||||
(lieutenant worf)
|
||||
(ensign crusher)))
|
||||
(LIEUTENANT WORF)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
63
clones/colinallen.dnsalias.org/lp/node106.html
Normal file
|
@ -0,0 +1,63 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> nthcdr (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" nthcdr (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node107.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node105.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node107.html"> null (PREDICATE)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node105.html"> nth (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> nthcdr (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (nthcdr <index> <list> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
2
|
||||
<P>
|
||||
<tt> <index> </tt>: any expression which returns a positive integer (fixnum).
|
||||
<tt> <list> </tt>: any expression which returns a list.
|
||||
<P>
|
||||
The function nth returns the <tt> <list> </tt> with the first n elements removed. <tt> <index> </tt> must be a non-negative integer. An index past the end of the list will cause nthcdr to return nil.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (setf ds9 '(Sisko Kira Dax Odo Bashir OBrien))
|
||||
(SISKO KIRA DAX ODO BASHIR OBRIEN)
|
||||
|
||||
> (nthcdr 0 ds9)
|
||||
(SISKO KIRA DAX ODO BASHIR OBRIEN)
|
||||
|
||||
> (nthcdr 1 ds9)
|
||||
(KIRA DAX ODO BASHIR OBRIEN)
|
||||
|
||||
> (nthcdr 3 ds9)
|
||||
(ODO BASHIR OBRIEN)
|
||||
|
||||
> (nthcdr 2345 ds9)
|
||||
NIL
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
55
clones/colinallen.dnsalias.org/lp/node107.html
Normal file
|
@ -0,0 +1,55 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> null (PREDICATE)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" null (PREDICATE)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node108.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node106.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node108.html"> numberp (PREDICATE)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node106.html"> nthcdr (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> null (PREDICATE)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (null <exp> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <exp> </tt>: any Lisp expression
|
||||
<P>
|
||||
The predicate <tt>null</tt> returns T if <tt> <expr> </tt> evaluates to the empty list; NIL
|
||||
otherwise. <tt>null</tt> is just the same as <tt>not</tt>, but is the preferred form to use when
|
||||
the purpose is to test whether a list is empty.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (null '(picard riker))
|
||||
NIL
|
||||
|
||||
> (null (rest '(picard)))
|
||||
T
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
63
clones/colinallen.dnsalias.org/lp/node108.html
Normal file
|
@ -0,0 +1,63 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> numberp (PREDICATE)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" numberp (PREDICATE)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node109.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node107.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node109.html"> or (MACRO)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node107.html"> null (PREDICATE)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> numberp (PREDICATE)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (numberp <exp> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <exp> </tt>: any Lisp expression
|
||||
<P>
|
||||
The predicate numberp returns T if <tt> <exp> </tt> evaluates to a number (i.e. an object of type integer, ratio, float, or complex); numberp returns
|
||||
NIL otherwise.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (numberp 1/2)
|
||||
T
|
||||
|
||||
> (numberp 1235439)
|
||||
T
|
||||
|
||||
> (numberp (/ 5 1.23))
|
||||
T
|
||||
|
||||
> (numberp #C(1.2 -0.9))
|
||||
T
|
||||
|
||||
> (numberp '(+ 1 2 3))
|
||||
NIL
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
59
clones/colinallen.dnsalias.org/lp/node109.html
Normal file
|
@ -0,0 +1,59 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> or (MACRO)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" or (MACRO)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node110.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node108.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node110.html"> read (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node108.html"> numberp (PREDICATE)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> or (MACRO)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (or <exp1> <exp2> ... <expn> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
None
|
||||
<P>
|
||||
This macro evaluates arguments in order until it reaches a non-nil value, in which case it returns that value, or it returns nil. Evaluation of intermediate expressions may produce side-effects. In the special case where and is given no arguments, it always returns nil.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (or 3 (+ 4 5))
|
||||
3
|
||||
|
||||
> (or nil (print 'hello))
|
||||
|
||||
HELLO
|
||||
HELLO
|
||||
|
||||
> (or nil '(print hello) 3)
|
||||
(PRINT HELLO)
|
||||
|
||||
> (or)
|
||||
NIL
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
60
clones/colinallen.dnsalias.org/lp/node11.html
Normal file
|
@ -0,0 +1,60 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> Changing Variable Values</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" Changing Variable Values">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node12.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node2.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node10.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node12.html"> More Functions and </A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node2.html"> LISt Processing</A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node10.html"> Selectors: First and </A>
|
||||
<BR> <HR> <P>
|
||||
<H1> Changing Variable Values</H1>
|
||||
<P>
|
||||
What happens to the value of a, after saying (cons 'a a)? Nothing.
|
||||
That is, it looks like this:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (cons 'a a)
|
||||
(a a s d f)
|
||||
> a
|
||||
(a s d f)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
Obviously, it would be useful to make these changes stick sometimes.
|
||||
To do that you can use setq as follows:
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (setq a (cons 'a a))
|
||||
(a a s d f)
|
||||
> a
|
||||
(a a s d f)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
and henceforth, that is the new value of a.
|
||||
<P>
|
||||
We'll let you play with the possibilities here, but using setq with
|
||||
just the three functions first, rest, and cons you can do <em> anything</em>
|
||||
you want to with lists. These primitives are sufficient. Append and
|
||||
list are strictly superfluous -- although they are very convenient.
|
||||
For practice, try to achieve the same effects using just first, rest,
|
||||
and cons as in the examples that used append and list, above.
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
64
clones/colinallen.dnsalias.org/lp/node110.html
Normal file
|
@ -0,0 +1,64 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> read (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" read (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node-rest.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node109.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node-rest.html"> rest (FUNCTION)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node109.html"> or (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> read (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (read <instream> </tt>
|
||||
<tt> <eof-error> </tt>
|
||||
<tt> <eof-value> </tt>
|
||||
<tt> <recursive> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
none
|
||||
<P>
|
||||
<b> Optional arguments:</b>
|
||||
4
|
||||
<P>
|
||||
<tt> <instream> </tt>: an expression which returns an input stream
|
||||
<tt> <eof-error> </tt>: any Lisp expression
|
||||
<tt> <eof-value> </tt>: any Lisp expression
|
||||
<tt> <recursive> </tt>: any Lisp expression
|
||||
<P>
|
||||
Called with no arguments, read waits for input from the standard input
|
||||
(usually the keyboard) and returns a Lisp object. If <tt> <instream> </tt> is
|
||||
specified, input is taken from the stream rather than standard input. If
|
||||
<tt> <eof-error> </tt> is specified it controls what happens if an end of file
|
||||
is encountered in the middle of a ``read.'' If <tt> <eof-error> </tt> is nil,
|
||||
no error results, and the result of <tt> <eof-value> </tt> is returned by ``read.''
|
||||
If <tt> <eof-error> </tt> is not NIL, then encountering the end of a file during a
|
||||
read will cause an error to occur. <tt> <recursive> </tt> controls the kind
|
||||
of error that is signalled when an end of file is encountered. If
|
||||
<tt> <recursive> </tt> is specified and is not NIL, then the end of file is
|
||||
reported to have occurred in the middle of reading in an object. If it is
|
||||
NIL, the the end of file is reported as occurring between objects.
|
||||
<P>
|
||||
<b> Examples:</b> See <a href="node58.html">chapter 5</a>.
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
57
clones/colinallen.dnsalias.org/lp/node111.html
Normal file
|
@ -0,0 +1,57 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> reverse (FUNCTION)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" reverse (FUNCTION)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node112.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node-rest.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node112.html"> secondthird, etc. </A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node-rest.html"> rest (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> reverse (FUNCTION)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (reverse <list> )</tt>
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
<tt> <list> </tt>: An expression which returns a list.
|
||||
<P>
|
||||
Reverse returns a list that contains all the elements of <tt> <list> </tt> in reversed order.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (reverse '(picard riker worf crusher))
|
||||
(CRUSHER WORF RIKER PICARD)
|
||||
|
||||
> (reverse (reverse '(picard riker worf crusher)))
|
||||
(PICARD RIKER WORF CRUSHER)
|
||||
|
||||
> (reverse '((this list) (of words)))
|
||||
((OF WORDS) (THIS LIST))
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
58
clones/colinallen.dnsalias.org/lp/node112.html
Normal file
|
@ -0,0 +1,58 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> second, third, etc. (FUNCTIONS)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" second, third, etc. (FUNCTIONS)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node113.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node111.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node113.html"> setf (MACRO)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node111.html"> reverse (FUNCTION)</A>
|
||||
<BR> <HR> <P>
|
||||
<H1> second, third, etc. (FUNCTIONS)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<tt> (second <list> )</tt>
|
||||
<tt> (third <list> )</tt>
|
||||
etc.
|
||||
<P>
|
||||
<b> Required arguments:</b>
|
||||
1
|
||||
<P>
|
||||
The argument must evaluate to a list
|
||||
<P>
|
||||
These functions return the obvious element from the given list, or nil if the list is shorter than the selected element would require.
|
||||
<P>
|
||||
<b> Examples:</b>
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (second '(1 2 3 4))
|
||||
2
|
||||
|
||||
> (fourth '(1 2 3 4))
|
||||
4
|
||||
|
||||
> (ninth '(1 2 3 4))
|
||||
NIL
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|
97
clones/colinallen.dnsalias.org/lp/node113.html
Normal file
|
@ -0,0 +1,97 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
|
||||
<!Originally converted to HTML using LaTeX2HTML 95 (Thu Jan 19 1995) by Nikos Drakos (nikos@cbl.leeds.ac.uk), CBLU, University of Leeds >
|
||||
<HEAD>
|
||||
<TITLE> setf (MACRO)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<meta name="description" value=" setf (MACRO)">
|
||||
<meta name="keywords" value="lp">
|
||||
<meta name="resource-type" value="document">
|
||||
<meta name="distribution" value="global">
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<A HREF="node114.html"><IMG ALIGN=BOTTOM ALT="next" SRC="next_motif.gif"></A>
|
||||
<A HREF="node72.html"><IMG ALIGN=BOTTOM ALT="up" SRC="up_motif.gif"></A>
|
||||
<A HREF="node112.html"><IMG ALIGN=BOTTOM ALT="previous" SRC="previous_motif.gif"></A> <BR>
|
||||
<A HREF="lp.html"><B>Contents</B></A>
|
||||
<B> Next:</B>
|
||||
<A HREF="node114.html"> symbolp (PREDICATE)</A>
|
||||
<B>Up:</B>
|
||||
<A HREF="node72.html"> Appendix: Selected Lisp </A>
|
||||
<B> Previous:</B>
|
||||
<A HREF="node112.html"> secondthird, etc. </A>
|
||||
<BR> <HR> <P>
|
||||
<H1> setf (MACRO)</H1>
|
||||
<P>
|
||||
<b> Format:</b>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>(setf <place1> <val1>
|
||||
<place2> <val2>
|
||||
.
|
||||
.
|
||||
<placeN> <valN> )
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<b> Required arguments:</b> none
|
||||
<P>
|
||||
<b> Optional arguments:</b> any even number of arguments
|
||||
<P>
|
||||
<tt> <place> </tt>: either (i) the name of a variable, or (ii) an expression referring to part of a larger structure (e.g. a list, property list, structure, or array).
|
||||
<tt> <val> </tt>: any Lisp expression.
|
||||
<P>
|
||||
setf assigns the result of evaluating <tt> <val> </tt> to the location
|
||||
specified in the immediately preceding <tt> <place> </tt>. It returns the
|
||||
result of evaluating the last <tt> <val> </tt>. If no <tt> <place> -<val> </tt>
|
||||
pairs are specified, setf returns nil. setf is used, among other
|
||||
things, to assign values to variables, change parts of list structures,
|
||||
and to manage property lists and structures. Examples of all these uses
|
||||
are given in the chapters of this book. Other uses, too numerous to document
|
||||
here, can be found in Steele.
|
||||
<P>
|
||||
<b> Examples:</b> (see all chapters for further examples)
|
||||
<P>
|
||||
<BLOCKQUOTE>
|
||||
<PRE>> (setf crew '(picard riker worf crusher))
|
||||
(PICARD RIKER WORF CRUSHER)
|
||||
|
||||
> (setf (first crew) (list 'captain (first crew))
|
||||
(second crew) (list 'commander (second crew))
|
||||
(third crew) (list 'lieutenant (third crew))
|
||||
(fourth crew) (list 'ensign (fourth crew)))
|
||||
(ENSIGN CRUSHER)
|
||||
|
||||
> crew
|
||||
((CAPTAIN PICARD) (COMMANDER RIKER) (LIEUTENANT WORF)
|
||||
(ENSIGN CRUSHER))
|
||||
|
||||
> (setf (get 'picard 'rank) 'captain)
|
||||
CAPTAIN
|
||||
|
||||
> (get 'picard 'rank)
|
||||
CAPTAIN
|
||||
|
||||
> (defstruct starship crew captain)
|
||||
STARSHIP
|
||||
|
||||
> (setf enterprise (make-starship))
|
||||
#S(STARSHIP CREW NIL CAPTAIN NIL)
|
||||
|
||||
> (setf (starship-crew enterprise) (rest crew)
|
||||
(starship-captain enterprise) (second (first crew)))
|
||||
PICARD
|
||||
|
||||
> enterprise
|
||||
#S(STARSHIP CREW
|
||||
((COMMANDER RIKER) (LIEUTENANT WORF) (ENSIGN CRUSHER))
|
||||
CAPTAIN PICARD)
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
<P>
|
||||
<BR> <HR>
|
||||
<P>
|
||||
<ADDRESS>
|
||||
<I>© Colin Allen & Maneesh Dhagat <BR>
|
||||
March 2007 </I>
|
||||
</ADDRESS>
|
||||
</BODY>
|