emacs.d/clones/lisp/stevelosh.com/blog/2011/05/paper-free/index.html
2022-10-07 15:47:14 +02:00

212 lines
No EOL
12 KiB
HTML

<!DOCTYPE html>
<html lang='en'><head><meta charset='utf-8' /><meta name='pinterest' content='nopin' /><link href='../../../../static/css/style.css' rel='stylesheet' type='text/css' /><link href='../../../../static/css/print.css' rel='stylesheet' type='text/css' media='print' /><title>Going Paper-Free for $220 / Steve Losh</title></head><body><header><a id='logo' href='https://stevelosh.com/'>Steve Losh</a><nav><a href='../../../index.html'>Blog</a> - <a href='https://stevelosh.com/projects/'>Projects</a> - <a href='https://stevelosh.com/photography/'>Photography</a> - <a href='https://stevelosh.com/links/'>Links</a> - <a href='https://stevelosh.com/rss.xml'>Feed</a></nav></header><hr class='main-separator' /><main id='page-blog-entry'><article><h1><a href='index.html'>Going Paper-Free for $220</a></h1><p class='date'>Posted on May 26th, 2011.</p><p>It's 2011. Personal computers have been around and popular for well over a decade
now, and yet we still have to deal with a huge amount of physical paper.</p>
<p>I've been wanting to go paper-free for a long time now. The advantages are obvious:</p>
<ul>
<li>Paper takes up physical space in our homes that digital files don't.</li>
<li>Digital files, if properly encrypted, are far more secure than sheets of paper that
could be stolen.</li>
<li>Digital files can be searched in an instant, while papers have to be laboriously
sorted through.</li>
<li>Digital files can be backed up perfectly and easily.</li>
</ul>
<p>After reading <a href="http://ryanwaggoner.com/2010/11/how-i-filled-two-dumpsters-and-went-paperless-with-the-fujitsu-scansnap-s1500/">this article</a> I was psyched to scan and shred all the boxes of paper
sitting in my apartment, but the $420+ price tag was hard to swallow. I started
looking around for other options.</p>
<p>Here are the requirements I have for any paper-free system:</p>
<ul>
<li>The scanned files need to be OCR'ed so I can search them easily. I'm too lazy to
categorize and tag files manually.</li>
<li>I need to be able to scan files anywhere. If I'm out at dinner I want to be able
to snap a picture of my receipt and tear it up right there.</li>
<li>No &quot;cloud&quot; services allowed for unencrypted important documents. I simply don't
trust Google/Dropbox/etc enough to put my bank statements and such there.</li>
<li>Files need to be backed up securely in case my apartment burns down.</li>
<li>The entire process needs to be automated as much as possible, otherwise I'll get
lazy and not scan things.</li>
</ul>
<p>It's taken me a while, but I've finally got a system I'm happy with. This post will
describe each part and how they fit together. The total cost is about $220, $160 of
which is for a physical scanner.</p>
<p>Note: I use OS X and an iPhone, so this post will focus on that platform. However,
the important pieces of software will run on Windows and I'm sure there are
Windows/Android equivalents to the other pieces.</p>
<ol class="table-of-contents"><li><a href="index.html#s1-scanning-at-home">Scanning at Home</a></li><li><a href="index.html#s2-scanning-on-the-go">Scanning on the Go</a></li><li><a href="index.html#s3-ocr-ing-scanned-documents">OCR'ing Scanned Documents</a></li><li><a href="index.html#s4-gluing-everything-together">Gluing Everything Together</a></li><li><a href="index.html#s5-backing-up">Backing Up</a></li><li><a href="index.html#s6-destroying-the-originals">Destroying the Originals</a></li><li><a href="index.html#s7-summary">Summary</a></li></ol>
<h2 id="s1-scanning-at-home"><a href="index.html#s1-scanning-at-home">Scanning at Home</a></h2>
<p>The first step to becoming paper-free is obviously scanning your documents. There
are a lot of scanners out there, some more expensive than others. I eventually
settled on a <a href="http://www.getdoxie.com/">Doxie</a> for $160.</p>
<p>I chose the Doxie because:</p>
<ul>
<li>It's compact.</li>
<li>It runs with a single USB cable.</li>
<li>It's cross-platform.</li>
<li>Its software has a great, polished UI.</li>
<li>It has a &quot;multiple-function button&quot; that lets you control it without the
mouse/keyboard.</li>
</ul>
<p>The first, second, and last points mean that (with a USB extension cable) I can scan
documents while sitting on the couch and watching Netflix, which is critical for lazy
people like me.</p>
<p>I set Doxie to save scans on my Desktop. The scanning process is pretty simple so
I won't describe it here. Check out Doxie's documentation for more information.</p>
<p><strong>Note:</strong> When I first received my Doxie and tried to calibrate it, it simply made
a grinding noise and wouldn't feed the paper. I emailed their tech support and
within half an hour I got a response back saying they were shipping me a replacement
immediately.</p>
<p>When I got the replacement it worked like a charm. Their customer service was so
great that I'd still recommend the Doxie even though my first one was a dud.</p>
<h2 id="s2-scanning-on-the-go"><a href="index.html#s2-scanning-on-the-go">Scanning on the Go</a></h2>
<p>As I mentioned before, I want to be able to scan things while out and about with my
iPhone. There are a bunch of iPhone document-scanning apps out there. I settled on
<a href="http://itunes.apple.com/us/app/jotnot-scanner-pro/id307868751?mt=8">JotNot</a> for $7 because it has a decent UI and supports multiple-page PDFs.</p>
<p>JotNot's UI is pretty easy to get the hang of so I won't go over it here.</p>
<p>Once I finish scanning something I send the PDF to a <a href="http://www.dropbox.com/">Dropbox</a> folder called
&quot;JotNot&quot;. </p>
<p>I know I said in my requirements that &quot;cloud&quot; services weren't allowed, but I make an
exception for non-critical things that I'd be scanning with my phone. I don't care if
Dropbox knows how much I spent on dinner.</p>
<h2 id="s3-ocr-ing-scanned-documents"><a href="index.html#s3-ocr-ing-scanned-documents">OCR'ing Scanned Documents</a></h2>
<p>The next step is to run the scanned PDFs through an OCR program so they can be
searched with Spotlight.</p>
<p>I looked at a lot of OCR software and finally settled on <a href="http://solutions.weblite.ca/pdfocrx/">PDF OCR X</a> for $30. It
has a simple interface, does a pretty good job at OCR'ing, has a free version so
I could try it out, and is cross-platform.</p>
<p>Using it is simple: you drag a PDF onto the app and select your desired settings
(make sure to choose &quot;searchable PDF&quot; as the output format). The app will think for
a while and then create a new PDF next to the old one with the searchable text
embedded.</p>
<p>Once you've done this once you should go into the preferences and change it to
non-interactive mode so that it won't prompt you for the settings every time you use
it.</p>
<h2 id="s4-gluing-everything-together"><a href="index.html#s4-gluing-everything-together">Gluing Everything Together</a></h2>
<p>So far we've got two folders with scanned PDFs and a method for OCR'ing them. The
next step is to automate the process.</p>
<p>I use an app called <a href="http://www.noodlesoft.com/hazel.php">Hazel</a> to do this. It's $21 for a license and well worth it.
We'll set up four rules to make our lives easier.</p>
<p>Before we start we need to create two folders somewhere (you can name them whatever
you like):</p>
<ul>
<li>Pending OCR: A folder to hold documents that are waiting to be OCR'ed.</li>
<li>Dead Trees: A folder to hold the final, OCR'ed versions of our documents.</li>
</ul>
<p>The first rule watches the Desktop for scans from Doxie. Any files placed on the
Desktop whose name starts with &quot;Doxie Doc&quot; will be renamed to include the current
date and time, and then moved to the &quot;Pending OCR&quot; folder.</p>
<p><img src="../../../../static/images/blog/2011/05/rules-1-doxie.png" alt="Rule 1 Screenshot" title="Rule 1"></p>
<p><strong>Note:</strong>: you'll need to click the <code>date created</code> bubble and then &quot;Edit Date&quot; to get
the time as well as the date into the filename.</p>
<p>The second rule watches the &quot;JotNot&quot; folder for scans from the iPhone app. Any PDFs
that appear in here (i.e. that are synced down from Dropbox) will be moved to the
&quot;Pending OCR&quot; folder. We don't need to rename them like we did with the Doxie scans
because JotNot already includes the date and time of scans in the filenames by
default.</p>
<p><img src="../../../../static/images/blog/2011/05/rules-2-jotnot.png" alt="Rule 2 Screenshot" title="Rule 2"></p>
<p>Now that we've got all of our scans going into the same folder (with unique names) we
can set up a rule to OCR them. The third rule watches the &quot;Pending OCR&quot; folder for
PDFs. When a PDF lands in the folder it will be moved to its final destination
folder (&quot;Dead Trees&quot; in my case) and then opened in PDF OCR X. Because I've put PDF
OCR X in non-interactive mode the files will automatically be OCR'ed without any
intervention from me.</p>
<p><img src="../../../../static/images/blog/2011/05/rules-3-ocr.png" alt="Rule 3 Screenshot" title="Rule 3"></p>
<p>The fourth and final rule watches for the OCR'ed copies of our scans and runs
a script to move the originals to the trash once the searchable versions are ready.
It doesn't delete the files completely because I want a safety net in case something
goes wrong.</p>
<p><img src="../../../../static/images/blog/2011/05/rules-4-clean.png" alt="Rule 4 Screenshot" title="Rule 4"></p>
<p><strong>Note:</strong> make sure you change the Shell to <code>/usr/bin/python</code>. Here's the text of
the script so you can copy and paste it:</p>
<pre><code>import sys, os
RM_CMD = r&quot;&quot;&quot;osascript -e 'tell app &quot;Finder&quot; to move the POSIX file &quot;%s&quot; to trash'&quot;&quot;&quot;
old_file = sys.argv[1].rsplit('.', 2)[0]
if os.path.exists(old_file):
os.system(RM_CMD % os.path.abspath(old_file))
</code></pre>
<p>Once these four rules are in place we can simply scan a document with Doxie or JotNot
and it will automatically be OCR'ed and placed in our &quot;Dead Trees&quot; folder, with no
intervention from us!</p>
<h2 id="s5-backing-up"><a href="index.html#s5-backing-up">Backing Up</a></h2>
<p>A while ago I was using Mozy for full backups. Recently they changed their pricing so
it was no longer unlimited. When that happened I switched to <a href="http://www.backblaze.com/">Backblaze</a> and
couldn't be happier.</p>
<p>Backblaze's UI is leaps and bounds above Mozy's, and they offer an option to generate
a secure encryption key for encrypting your backups. I highly recommend this, but be
sure to have a few copies of your key because you'll need it to restore your backups.</p>
<p>Backblaze is also only $5 per month (less if you pay for a year in advance) for
unlimited backups which is definitely a bargain. As a bonus, they just released
a <a href="http://blog.backblaze.com/2011/05/23/lost-your-computer-get-it-back-backblaze-launches-locate-my-computer/">&quot;find my computer&quot; feature</a> that's kind of like a lightweight
version of <a href="http://www.orbicule.com/undercover/">Undercover</a>, so it's an even better deal.</p>
<h2 id="s6-destroying-the-originals"><a href="index.html#s6-destroying-the-originals">Destroying the Originals</a></h2>
<p>Once the documents are scanned and backed up it's time to destroy the physical paper.
If you live in a rural area you could burn them for free.</p>
<p>Those of us that can't start random fires need a paper shredder. I use a shredder
I picked up a long time ago — any crosscut shredder will do the job.</p>
<h2 id="s7-summary"><a href="index.html#s7-summary">Summary</a></h2>
<p>After all of this I've now got a mostly-automated system that lets me go paper-free.
The costs are:</p>
<ul>
<li>Doxie Scanner: $160</li>
<li>JotNot: $7</li>
<li>PDF OCR X: $30</li>
<li>Hazel: $21</li>
<li>Backblaze: $5 per month</li>
</ul>
<p>For me the $218 initial cost is worth it. Now I can search all of my paper in
a instant and my apartment is much less cluttered. If you have the money to spare I'd
definitely consider trying it.</p>
</article></main><hr class='main-separator' /><footer><nav><a href='https://github.com/sjl/'>GitHub</a><a href='https://twitter.com/stevelosh/'>Twitter</a><a href='https://instagram.com/thirtytwobirds/'>Instagram</a><a href='https://hg.stevelosh.com/.plan/'>.plan</a></nav></footer></body></html>