683 lines
No EOL
45 KiB
HTML
683 lines
No EOL
45 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en"
|
||
xmlns:og="http://ogp.me/ns#"
|
||
xmlns:fb="https://www.facebook.com/2008/fbml">
|
||
<head>
|
||
<title>EOF is not a character - Ruslan's Blog</title>
|
||
<!-- Using the latest rendering mode for IE -->
|
||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
|
||
|
||
|
||
<link rel="canonical" href="index.html">
|
||
|
||
<meta name="author" content="Ruslan Spivak" />
|
||
<meta name="description" content="I was reading Computer Systems: A Programmer’s Perspective the other day and in the chapter on Unix I/O the authors mention that there is no explicit “EOF character” at the end of a file." />
|
||
|
||
<meta property="og:site_name" content="Ruslan's Blog" />
|
||
<meta property="og:type" content="article"/>
|
||
<meta property="og:title" content="EOF is not a character"/>
|
||
<meta property="og:url" content="https://ruslanspivak.com/eofnotchar/"/>
|
||
<meta property="og:description" content="I was reading Computer Systems: A Programmer’s Perspective the other day and in the chapter on Unix I/O the authors mention that there is no explicit “EOF character” at the end of a file."/>
|
||
<meta property="article:published_time" content="2020-03-01" />
|
||
<meta property="article:section" content="blog" />
|
||
<meta property="article:author" content="Ruslan Spivak" />
|
||
<meta property="og:image"
|
||
content="https://ruslanspivak.com/eofnotchar/eofnotchar_notachar.png"/>
|
||
|
||
<meta name="twitter:card" content="summary">
|
||
<meta name="twitter:domain" content="https://ruslanspivak.com">
|
||
<meta property="twitter:image"
|
||
content="https://ruslanspivak.com/eofnotchar/eofnotchar_notachar.png"/>
|
||
|
||
<!-- Bootstrap -->
|
||
<link rel="stylesheet" href="../theme/css/bootstrap.min.css" type="text/css"/>
|
||
<link href="../theme/css/font-awesome.min.css" rel="stylesheet">
|
||
|
||
<link href="../theme/css/pygments/tango.css" rel="stylesheet">
|
||
<link href="../theme/css/typogrify.css" rel="stylesheet">
|
||
<link rel="stylesheet" href="../theme/css/style.css" type="text/css"/>
|
||
<link href="../static/custom.css" rel="stylesheet">
|
||
|
||
<link href="../feeds/all.atom.xml" type="application/atom+xml" rel="alternate"
|
||
title="Ruslan's Blog ATOM Feed"/>
|
||
|
||
</head>
|
||
<body>
|
||
|
||
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
|
||
<div class="container">
|
||
<div class="navbar-header">
|
||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
|
||
<span class="sr-only">Toggle navigation</span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
</button>
|
||
<a href="../index.html" class="navbar-brand">
|
||
Ruslan's Blog </a>
|
||
</div>
|
||
<div class="collapse navbar-collapse navbar-ex1-collapse">
|
||
<ul class="nav navbar-nav">
|
||
</ul>
|
||
<ul class="nav navbar-nav navbar-right">
|
||
<li><a href="../pages/about.html"><i class="fa fa-question"></i><span class="icon-label">About</span></a></li>
|
||
<li><a href="../archives.html"><i class="fa fa-th-list"></i><span class="icon-label">Archives</span></a></li>
|
||
</ul>
|
||
</div>
|
||
<!-- /.navbar-collapse -->
|
||
</div>
|
||
</div> <!-- /.navbar -->
|
||
<!-- Banner -->
|
||
<!-- End Banner -->
|
||
<div class="container">
|
||
<div class="row">
|
||
<div class="col-sm-9">
|
||
|
||
<section id="content">
|
||
<article>
|
||
<header class="page-header">
|
||
<h1>
|
||
<a href="index.html"
|
||
rel="bookmark"
|
||
title="Permalink to EOF is not a character">
|
||
<span class="caps">EOF</span> is not a character
|
||
</a>
|
||
</h1>
|
||
</header>
|
||
<div class="entry-content">
|
||
<div class="panel">
|
||
<div class="panel-body">
|
||
<footer class="post-info">
|
||
<span class="label label-default">Date</span>
|
||
<span class="published">
|
||
<i class="fa fa-calendar"></i><time datetime="2020-03-01T11:53:00-05:00"> Sun, March 01, 2020</time>
|
||
</span>
|
||
|
||
|
||
|
||
|
||
</footer><!-- /.post-info --> </div>
|
||
</div>
|
||
<p><strong>Update Mar 14, 2020</strong>: I’m working on an update to the article based on all the feedback I’ve received so far. Stay tuned!
|
||
<br/>
|
||
<br/></p>
|
||
<p>I was reading <em>Computer Systems: A Programmer’s Perspective</em> the other day and in the chapter on Unix I/O the authors mention that <strong><em>there is no explicit “<span class="caps">EOF</span> character” at the end of a file</em></strong>.</p>
|
||
<p><img alt="" src="eofnotchar_notachar.png" width="640"></p>
|
||
<p>If you’ve spent some time reading and/or playing with Unix I/O and have written some C programs that read text files and run on Unix/Linux, that statement is probably obvious. But let’s take a closer look at the following two points related to the statement in the book:</p>
|
||
<ol>
|
||
<li><span class="caps">EOF</span> is not a character</li>
|
||
<li><span class="caps">EOF</span> is not a character you find at the end of a file</li>
|
||
</ol>
|
||
<p><br/>
|
||
1. Why would anyone say or think that <span class="caps">EOF</span> is a character? I think it may be because in some C programs you can find code that explicitly checks for <span class="caps">EOF</span> using <em>getchar()</em> and <em>getc()</em> routines:</p>
|
||
<div class="highlight"><pre><span></span> <span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
|
||
<span class="p">...</span>
|
||
<span class="k">while</span> <span class="p">((</span><span class="n">c</span> <span class="o">=</span> <span class="n">getchar</span><span class="p">())</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">)</span>
|
||
<span class="n">putchar</span><span class="p">(</span><span class="n">c</span><span class="p">);</span>
|
||
|
||
<span class="n">OR</span>
|
||
|
||
<span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">;</span>
|
||
<span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
|
||
<span class="p">...</span>
|
||
<span class="k">while</span> <span class="p">((</span><span class="n">c</span> <span class="o">=</span> <span class="n">getc</span><span class="p">(</span><span class="n">fp</span><span class="p">))</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">)</span>
|
||
<span class="n">putc</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>And if you check the <em>man</em> page for <em>getchar()</em> or <em>getc()</em>, you’ll read that both routines get the next character from the input stream. So that could be what leads to a confusion about the nature of <span class="caps">EOF</span>, but that’s just me speculating. Let’s get back to the point that <span class="caps">EOF</span> is not a character.</p>
|
||
<p>What is a character anyway? A <em>character</em> is the smallest component of a text. ‘A’, ‘a’, ‘B’, ‘b’ are all different characters. A character has a numeric value that is called a <a href="https://docs.python.org/3/howto/unicode.html"><em>code point</em> </a>in the Unicode standard. For example, the English character ‘A’ has a numeric value of 65 in decimal. You can check this quickly in a Python shell:</p>
|
||
<div class="highlight"><pre><span></span>$python
|
||
>>> ord('A')
|
||
65
|
||
>>> chr(65)
|
||
'A'
|
||
</pre></div>
|
||
|
||
|
||
<p><br/>
|
||
Or you could look it up in the <span class="caps">ASCII</span> table on your Unix/Linux box:</p>
|
||
<div class="highlight"><pre><span></span>$ man ascii
|
||
</pre></div>
|
||
|
||
|
||
<p><img alt="" src="eofnotchar_asciitable.png" width="640"></p>
|
||
<p><br/></p>
|
||
<p>Let’s check the value of <span class="caps">EOF</span> by writing a little C program. In <span class="caps">ANSI</span> C, <span class="caps">EOF</span> is defined in <em><stdio.h></em> as part of the standard library. Its value is usually -1. Save the following code in file <em>printeof.c</em>, compile it, and run it:</p>
|
||
<div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
|
||
|
||
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
|
||
<span class="p">{</span>
|
||
<span class="n">printf</span><span class="p">(</span><span class="s">"EOF value on my system: %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">EOF</span><span class="p">);</span>
|
||
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
|
||
<span class="p">}</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ gcc -o printeof printeof.c
|
||
|
||
$ ./printeof
|
||
EOF value on my system: -1
|
||
</pre></div>
|
||
|
||
|
||
<p>Okay, so on my system the value is -1 (I tested it both on Mac <span class="caps">OS</span> and Ubuntu Linux). Is there a character with a numerical value of -1? Again, you could check the available numeric values in the <span class="caps">ASCII</span> table or check the official Unicode page to find the legitimate range of numeric values for representing characters. But let’s fire up a Python shell and use the built-in <em>chr()</em> function to return a character for -1:</p>
|
||
<div class="highlight"><pre><span></span>$ python
|
||
>>> chr<span class="o">(</span>-1<span class="o">)</span>
|
||
Traceback <span class="o">(</span>most recent call last<span class="o">)</span>:
|
||
File <span class="s2">"<stdin>"</span>, line <span class="m">1</span>, in <module>
|
||
ValueError: chr<span class="o">()</span> arg not in range<span class="o">(</span>0x110000<span class="o">)</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>As expected, there is no character with a numeric value of -1. Okay, so <span class="caps">EOF</span> (as seen in C programs) is not a character.</p>
|
||
<p>Onto the second point.</p>
|
||
<p><br/>
|
||
2. Is <span class="caps">EOF</span> a character that you can find at the end of a file? I think at this point you already know the answer, but let’s double check our assumption.</p>
|
||
<p>Let’s take a simple text file <a href="https://github.com/rspivak/2x25/blob/master/eofnotchar/helloworld.txt">helloworld.txt</a> and get a hexdump of the contents of the file. We can use <em>xxd</em> for that:</p>
|
||
<div class="highlight"><pre><span></span>$ cat helloworld.txt
|
||
Hello world!
|
||
|
||
$ xxd helloworld.txt
|
||
<span class="m">00000000</span>: <span class="m">4865</span> 6c6c 6f20 776f 726c <span class="m">6421</span> 0a Hello world!.
|
||
</pre></div>
|
||
|
||
|
||
<p>As you can see, the last character at the end of the file is the hex <em>0a</em>. You can find in the <span class="caps">ASCII</span> table that <em>0a</em> represents <em>nl,</em> the newline character. Or you can check it in a Python shell:</p>
|
||
<div class="highlight"><pre><span></span>$ python
|
||
>>> chr<span class="o">(</span>0x0a<span class="o">)</span>
|
||
<span class="s1">'\n'</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/>
|
||
Okay. If <span class="caps">EOF</span> is not a character and it’s not a character that you find at the end of a file, what is it then?</p>
|
||
<p><strong><em><span class="caps">EOF</span> (end-of-file)</em></strong> is a condition provided by the kernel that can be detected by an application.</p>
|
||
<p>Let’s see how we can detect the <span class="caps">EOF</span> condition in various programming languages when reading a text file using high-level I/O routines provided by the languages. For this purpose, we’ll write a very simple <a href="https://en.wikipedia.org/wiki/Cat_(Unix)"><em>cat</em></a> version called <em>mcat</em> that reads an <span class="caps">ASCII</span>-encoded text file byte by byte (character by character) and explicitly checks for <span class="caps">EOF</span>. Let’s write our <em>cat</em> version in the following programming languages:</p>
|
||
<ul>
|
||
<li><span class="caps">ANSI</span> C</li>
|
||
<li>Python</li>
|
||
<li>Go</li>
|
||
<li>JavaScript (node.js)</li>
|
||
</ul>
|
||
<p>You can find source code for all of the examples in this article on <a href="https://github.com/rspivak/2x25/tree/master/eofnotchar">GitHub</a>. Okay, let’s get started with the venerable C programming language.</p>
|
||
<ol>
|
||
<li>
|
||
<p><span class="caps">ANSI</span> C (a modified <em>cat</em> version from <em>The C Programming Language</em> book)</p>
|
||
<div class="highlight"><pre><span></span><span class="cm">/* mcat.c */</span>
|
||
<span class="cp">#include</span> <span class="cpf"><stdio.h></span><span class="cp"></span>
|
||
|
||
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
|
||
<span class="p">{</span>
|
||
<span class="kt">FILE</span> <span class="o">*</span><span class="n">fp</span><span class="p">;</span>
|
||
<span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
|
||
|
||
<span class="k">if</span> <span class="p">((</span><span class="n">fp</span> <span class="o">=</span> <span class="n">fopen</span><span class="p">(</span><span class="o">*++</span><span class="n">argv</span><span class="p">,</span> <span class="s">"r"</span><span class="p">))</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
|
||
<span class="n">printf</span><span class="p">(</span><span class="s">"mcat: can't open %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">*</span><span class="n">argv</span><span class="p">);</span>
|
||
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
|
||
<span class="p">}</span>
|
||
|
||
<span class="k">while</span> <span class="p">((</span><span class="n">c</span> <span class="o">=</span> <span class="n">getc</span><span class="p">(</span><span class="n">fp</span><span class="p">))</span> <span class="o">!=</span> <span class="n">EOF</span><span class="p">)</span>
|
||
<span class="n">putc</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">stdout</span><span class="p">);</span>
|
||
|
||
<span class="n">fclose</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
|
||
|
||
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
|
||
<span class="p">}</span>
|
||
</pre></div>
|
||
|
||
|
||
<p>Compile</p>
|
||
<div class="highlight"><pre><span></span>$ gcc -o mcat mcat.c
|
||
</pre></div>
|
||
|
||
|
||
<p>Run</p>
|
||
<div class="highlight"><pre><span></span>$ ./mcat helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
<p><br/>
|
||
Quick explanation of the code above:</p>
|
||
<ul>
|
||
<li>The program opens a file passed as a command line argument</li>
|
||
<li>The <em>while</em> loop copies data from the file to the standard output one byte at a time until it reaches the end of the file.</li>
|
||
<li>On reaching <span class="caps">EOF</span>, the program closes the file and terminates</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p>Python 3</p>
|
||
<p>Python doesn’t have a mechanism to explicitly check for <span class="caps">EOF</span> like in <span class="caps">ANSI</span> C, but if you read a text file one character at a time, you can determine the <em>end-of-file</em> condition by checking if the character read is empty:</p>
|
||
<div class="highlight"><pre><span></span><span class="c1"># mcat.py</span>
|
||
<span class="kn">import</span> <span class="nn">sys</span>
|
||
|
||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
|
||
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
|
||
<span class="n">c</span> <span class="o">=</span> <span class="n">fin</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># read max 1 char</span>
|
||
<span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="s1">''</span><span class="p">:</span> <span class="c1"># EOF</span>
|
||
<span class="k">break</span>
|
||
<span class="k">print</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ python mcat.py helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
<p>Python 3.8+ (a shorter version of the above using <a href="https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions">the walrus operator</a>):</p>
|
||
<div class="highlight"><pre><span></span><span class="c1"># mcat38.py</span>
|
||
<span class="kn">import</span> <span class="nn">sys</span>
|
||
|
||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">as</span> <span class="n">fin</span><span class="p">:</span>
|
||
<span class="k">while</span> <span class="p">(</span><span class="n">c</span> <span class="p">:</span><span class="o">=</span> <span class="n">fin</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span> <span class="o">!=</span> <span class="s1">''</span><span class="p">:</span> <span class="c1"># read max 1 char at a time until EOF</span>
|
||
<span class="k">print</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s1">''</span><span class="p">)</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ python3.8 mcat38.py helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>Go</p>
|
||
<p>In Go we can explicitly check if the error returned by <a href="https://tour.golang.org/methods/21">Read()</a> is <span class="caps">EOF</span>.</p>
|
||
<div class="highlight"><pre><span></span><span class="o">//</span> <span class="n">mcat</span><span class="o">.</span><span class="n">go</span>
|
||
<span class="n">package</span> <span class="n">main</span>
|
||
|
||
<span class="kn">import</span> <span class="p">(</span>
|
||
<span class="s2">"fmt"</span>
|
||
<span class="s2">"os"</span>
|
||
<span class="s2">"io"</span>
|
||
<span class="p">)</span>
|
||
|
||
<span class="n">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
|
||
<span class="nb">file</span><span class="p">,</span> <span class="n">err</span> <span class="p">:</span><span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">Open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
|
||
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="n">nil</span> <span class="p">{</span>
|
||
<span class="n">fmt</span><span class="o">.</span><span class="n">Fprintf</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Stderr</span><span class="p">,</span> <span class="s2">"mcat: %v</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
|
||
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="p">}</span>
|
||
|
||
<span class="nb">buffer</span> <span class="p">:</span><span class="o">=</span> <span class="n">make</span><span class="p">([]</span><span class="n">byte</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">//</span> <span class="mi">1</span><span class="o">-</span><span class="n">byte</span> <span class="nb">buffer</span>
|
||
<span class="k">for</span> <span class="p">{</span>
|
||
<span class="n">bytesread</span><span class="p">,</span> <span class="n">err</span> <span class="p">:</span><span class="o">=</span> <span class="nb">file</span><span class="o">.</span><span class="n">Read</span><span class="p">(</span><span class="nb">buffer</span><span class="p">)</span>
|
||
<span class="k">if</span> <span class="n">err</span> <span class="o">==</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span> <span class="p">{</span>
|
||
<span class="k">break</span>
|
||
<span class="p">}</span>
|
||
<span class="n">fmt</span><span class="o">.</span><span class="n">Print</span><span class="p">(</span><span class="n">string</span><span class="p">(</span><span class="nb">buffer</span><span class="p">[:</span><span class="n">bytesread</span><span class="p">]))</span>
|
||
<span class="p">}</span>
|
||
<span class="nb">file</span><span class="o">.</span><span class="n">Close</span><span class="p">()</span>
|
||
<span class="p">}</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ go run mcat.go helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
<li>
|
||
<p>JavaScript (node.js)</p>
|
||
<p>There is no explicit check for <span class="caps">EOF</span>, but the <a href="https://nodejs.org/api/stream.html#stream_event_end"><em>end</em> event</a> on a stream is fired when the end of a file is reached and a <em>read</em> operation tries to read more data.</p>
|
||
<div class="highlight"><pre><span></span><span class="cm">/* mcat.js */</span>
|
||
<span class="kr">const</span> <span class="nx">fs</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
|
||
<span class="kr">const</span> <span class="nx">process</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">'process'</span><span class="p">);</span>
|
||
|
||
<span class="kr">const</span> <span class="nx">fileName</span> <span class="o">=</span> <span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
|
||
|
||
<span class="kd">var</span> <span class="nx">readable</span> <span class="o">=</span> <span class="nx">fs</span><span class="p">.</span><span class="nx">createReadStream</span><span class="p">(</span><span class="nx">fileName</span><span class="p">,</span> <span class="p">{</span>
|
||
<span class="nx">encoding</span><span class="o">:</span> <span class="s1">'utf8'</span><span class="p">,</span>
|
||
<span class="nx">fd</span><span class="o">:</span> <span class="kc">null</span><span class="p">,</span>
|
||
<span class="p">});</span>
|
||
|
||
<span class="nx">readable</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'readable'</span><span class="p">,</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
|
||
<span class="kd">var</span> <span class="nx">chunk</span><span class="p">;</span>
|
||
<span class="k">while</span> <span class="p">((</span><span class="nx">chunk</span> <span class="o">=</span> <span class="nx">readable</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span> <span class="o">!==</span> <span class="kc">null</span><span class="p">)</span> <span class="p">{</span>
|
||
<span class="nx">process</span><span class="p">.</span><span class="nx">stdout</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span> <span class="cm">/* chunk is one byte */</span>
|
||
<span class="p">}</span>
|
||
<span class="p">});</span>
|
||
|
||
<span class="nx">readable</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'end'</span><span class="p">,</span> <span class="p">()</span> <span class="p">=></span> <span class="p">{</span>
|
||
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'\nEOF: There will be no more data.'</span><span class="p">);</span>
|
||
<span class="p">});</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ node mcat.js helloworld.txt
|
||
Hello world!
|
||
|
||
EOF: There will be no more data.
|
||
</pre></div>
|
||
|
||
|
||
</li>
|
||
</ol>
|
||
<p><br/>
|
||
How do the high-level I/O routines in the examples above determine the <em>end-of-file</em> condition? On Linux systems the routines either directly or indirectly use the <a href="https://en.wikipedia.org/wiki/Read_(system_call)">read()</a> system call provided by the kernel. The <em>getc()</em> function (or macro) in C, for example, uses the <em>read()</em> system call and returns <span class="caps">EOF</span> if <em>read()</em> indicated the <em>end-of-file</em> condition. The <a href="https://en.wikipedia.org/wiki/Read_(system_call)">read()</a> system call returns 0 to indicate the <span class="caps">EOF</span> condition.</p>
|
||
<p><img alt="" src="eofnotchar_stdsysio.png" width="400"></p>
|
||
<p>Let’s write a <em>cat</em> version called <em>syscat</em> using Unix system calls only, both for fun and potentially some profit. Let’s do that in C first:</p>
|
||
<div class="highlight"><pre><span></span><span class="cm">/* syscat.c */</span>
|
||
<span class="cp">#include</span> <span class="cpf"><sys/types.h></span><span class="cp"></span>
|
||
<span class="cp">#include</span> <span class="cpf"><sys/stat.h></span><span class="cp"></span>
|
||
<span class="cp">#include</span> <span class="cpf"><fcntl.h></span><span class="cp"></span>
|
||
<span class="cp">#include</span> <span class="cpf"><unistd.h></span><span class="cp"></span>
|
||
|
||
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
|
||
<span class="p">{</span>
|
||
<span class="kt">int</span> <span class="n">fd</span><span class="p">;</span>
|
||
<span class="kt">char</span> <span class="n">c</span><span class="p">;</span>
|
||
|
||
<span class="n">fd</span> <span class="o">=</span> <span class="n">open</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">O_RDONLY</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
|
||
|
||
<span class="k">while</span> <span class="p">(</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
|
||
<span class="n">write</span><span class="p">(</span><span class="n">STDOUT_FILENO</span><span class="p">,</span> <span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
|
||
|
||
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
|
||
<span class="p">}</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ gcc -o syscat syscat.c
|
||
|
||
$ ./syscat helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
<p>In the code above, you can see that we use the fact that the <em>read()</em> function returns 0 to indicate <span class="caps">EOF</span>.</p>
|
||
<p>And the same in Python 3:</p>
|
||
<div class="highlight"><pre><span></span><span class="c1"># syscat.py</span>
|
||
<span class="kn">import</span> <span class="nn">sys</span>
|
||
<span class="kn">import</span> <span class="nn">os</span>
|
||
|
||
<span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">os</span><span class="o">.</span><span class="n">O_RDONLY</span><span class="p">)</span>
|
||
|
||
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
|
||
<span class="n">c</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||
<span class="k">if</span> <span class="ow">not</span> <span class="n">c</span><span class="p">:</span> <span class="c1"># EOF</span>
|
||
<span class="k">break</span>
|
||
<span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span> <span class="n">c</span><span class="p">)</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ python syscat.py helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
<p>And in Python3.8+ using <a href="https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions">the walrus operator</a>:</p>
|
||
<div class="highlight"><pre><span></span><span class="c1"># syscat38.py</span>
|
||
<span class="kn">import</span> <span class="nn">sys</span>
|
||
<span class="kn">import</span> <span class="nn">os</span>
|
||
|
||
<span class="n">fd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">os</span><span class="o">.</span><span class="n">O_RDONLY</span><span class="p">)</span>
|
||
|
||
<span class="k">while</span> <span class="n">c</span> <span class="p">:</span><span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span>
|
||
<span class="n">os</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">fileno</span><span class="p">(),</span> <span class="n">c</span><span class="p">)</span>
|
||
</pre></div>
|
||
|
||
|
||
<p><br/></p>
|
||
<div class="highlight"><pre><span></span>$ python3.8 syscat38.py helloworld.txt
|
||
Hello world!
|
||
</pre></div>
|
||
|
||
|
||
<p><br/>
|
||
Let’s recap the main points about <span class="caps">EOF</span> again:</p>
|
||
<ul>
|
||
<li><span class="caps">EOF</span> is not a character</li>
|
||
<li><span class="caps">EOF</span> is not a character that you find at the end of a file</li>
|
||
<li><span class="caps">EOF</span> is a condition provided by the kernel that can be detected by an application <s>when a <em>read</em> operation reaches the end of a file</s></li>
|
||
</ul>
|
||
<p><strong>Update Mar 3, 2020</strong> Let’s recap the main points about <span class="caps">EOF</span> with added details for more clarity:</p>
|
||
<ul>
|
||
<li><span class="caps">EOF</span> in <span class="caps">ANSI</span> C is not a character. It’s a constant defined in <em><stdio.h></em> and its value is usually -1</li>
|
||
<li><span class="caps">EOF</span> is not a character in the <span class="caps">ASCII</span> or Unicode character set</li>
|
||
<li><span class="caps">EOF</span> is not a character that you find at the end of a file on Unix/Linux systems</li>
|
||
<li>There is no explicit “<span class="caps">EOF</span> character” at the end of a file on Unix/Linux systems</li>
|
||
<li><span class="caps">EOF</span>(end-of-file) is a condition provided by the kernel that can be detected by an application <s>when a <em>read</em> operation reaches the end of a file</s> (if <em>k</em> is the current file position and <em>m</em> is the size of a file, performing a <em>read()</em> when <em>k >= m</em> triggers the condition)</li>
|
||
</ul>
|
||
<p><strong>Update Mar 14, 2020</strong>: I’m working on an update to the article based on all the feedback I’ve received so far. Stay tuned!</p>
|
||
<p><br/>
|
||
Happy learning and have a great day!</p>
|
||
<p><br/>
|
||
<em>Resources used in preparation for this article (some links are affiliate links):</em></p>
|
||
<ol>
|
||
<li><a target="_blank" href="https://www.amazon.com/gp/product/013409266X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=013409266X&linkCode=as2&tag=russblo0b-20&linkId=ec2bfa5062cddb0c6f86266ba481c625">Computer Systems: A Programmer’s Perspective (3rd Edition)</a><img src="https://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=013409266X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li>
|
||
<li><a target="_blank" href="https://www.amazon.com/gp/product/0131103628/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0131103628&linkCode=as2&tag=russblo0b-20&linkId=97a792c45446683f7235710c2f8c899d">C Programming Language, 2nd Edition</a><img src="https://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0131103628" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li>
|
||
<li><a target="_blank" href="https://www.amazon.com/gp/product/013937681X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=013937681X&linkCode=as2&tag=russblo0b-20&linkId=b8b462e767809ac396966bbb3e79af76">The Unix Programming Environment (Prentice-Hall Software Series)</a><img src="https://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=013937681X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li>
|
||
<li><a target="_blank" href="https://www.amazon.com/gp/product/0321637739/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0321637739&linkCode=as2&tag=russblo0b-20&linkId=f9fc233797afcaf2c103f7aac24d717d">Advanced Programming in the <span class="caps">UNIX</span> Environment, 3rd Edition</a><img src="https://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0321637739" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li>
|
||
<li><a target="_blank" href="https://www.amazon.com/gp/product/0134190440/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0134190440&linkCode=as2&tag=russblo0b-20&linkId=3e0104678e6eb68f11fb29e4cda46bd1">Go Programming Language, The (Addison-Wesley Professional Computing Series)</a><img src="https://ir-na.amazon-adsystem.com/e/ir?t=russblo0b-20&l=am2&o=1&a=0134190440" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li>
|
||
<li><a href="https://docs.python.org/3/howto/unicode.html">Unicode <span class="caps">HOWTO</span></a></li>
|
||
<li><a href="https://nodejs.org/api/stream.html">Node.js Stream module</a></li>
|
||
<li><a href="https://golang.org/pkg/io/">Go io package</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Cat_(Unix)">cat (Unix)</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/End-of-file">End-of-file</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/End-of-Transmission_character">End-of-Transmission character</a></li>
|
||
</ol>
|
||
<p><br/>
|
||
<p>If you want to get my newest articles in your inbox, then enter your email address below and click "Get Updates!"</p>
|
||
|
||
<!-- Begin MailChimp Signup Form -->
|
||
<link href="https://cdn-images.mailchimp.com/embedcode/classic-081711.css"
|
||
rel="stylesheet" type="text/css">
|
||
<style type="text/css">
|
||
#mc_embed_signup {
|
||
background: #f5f5f5;
|
||
clear: left;
|
||
font: 18px Helvetica,Arial,sans-serif;
|
||
}
|
||
|
||
#mc_embed_signup form {
|
||
text-align: center;
|
||
padding: 20px 0 10px 3%;
|
||
}
|
||
|
||
#mc_embed_signup .mc-field-group input {
|
||
display: inline;
|
||
width: 40%;
|
||
}
|
||
|
||
#mc_embed_signup div.response {
|
||
width: 100%;
|
||
}
|
||
</style>
|
||
<div id="mc_embed_signup">
|
||
<form
|
||
action="https://ruslanspivak.us4.list-manage.com/subscribe/post?u=7dde30eedc045f4670430c25f&id=6f69f44e03"
|
||
method="post"
|
||
id="mc-embedded-subscribe-form"
|
||
name="mc-embedded-subscribe-form"
|
||
class="validate"
|
||
target="_blank" novalidate>
|
||
<div id="mc_embed_signup_scroll">
|
||
|
||
<div class="mc-field-group">
|
||
<label for="mce-NAME">Enter Your First Name *</label>
|
||
<input type="text" value="" name="NAME" class="required" id="mce-NAME">
|
||
</div>
|
||
<div class="mc-field-group">
|
||
<label for="mce-EMAIL">Enter Your Best Email *</label>
|
||
<input type="email" value="" name="EMAIL" class="required email" id="mce-EMAIL">
|
||
</div>
|
||
<div id="mce-responses" class="clear">
|
||
<div class="response" id="mce-error-response" style="display:none"></div>
|
||
<div class="response" id="mce-success-response" style="display:none"></div>
|
||
</div>
|
||
<!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups-->
|
||
<div style="position: absolute; left: -5000px;"><input type="text" name="b_7dde30eedc045f4670430c25f_6f69f44e03" tabindex="-1" value=""></div>
|
||
<div class="clear"><input type="submit" value="Get Updates!" name="subscribe" id="mc-embedded-subscribe" class="button" style="background-color: rgb(63, 146, 236);"></div>
|
||
</div>
|
||
</form>
|
||
</div>
|
||
<!-- <script type='text/javascript' src='//s3.amazonaws.com/downloads.mailchimp.com/js/mc-validate.js'></script><script type='text/javascript'>(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[1]='NAME';ftypes[1]='text';fnames[0]='EMAIL';ftypes[0]='email';}(jQuery));var $mcj = jQuery.noConflict(true);</script> -->
|
||
<!--End mc_embed_signup-->
|
||
</p>
|
||
</div>
|
||
<!-- /.entry-content -->
|
||
<hr/>
|
||
<section class="comments" id="comments">
|
||
<h2>Comments</h2>
|
||
|
||
<div id="disqus_thread"></div>
|
||
<script type="text/javascript">
|
||
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
|
||
var disqus_shortname = 'ruslanspivak'; // required: replace example with your forum shortname
|
||
|
||
var disqus_identifier = 'eof-is-not-a-character';
|
||
var disqus_url = 'https://ruslanspivak.com/eofnotchar/';
|
||
|
||
var disqus_config = function () {
|
||
this.language = "en";
|
||
};
|
||
|
||
/* * * DON'T EDIT BELOW THIS LINE * * */
|
||
(function () {
|
||
var dsq = document.createElement('script');
|
||
dsq.type = 'text/javascript';
|
||
dsq.async = true;
|
||
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
|
||
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
|
||
})();
|
||
</script>
|
||
<noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by
|
||
Disqus.</a></noscript>
|
||
<a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
|
||
|
||
</section>
|
||
</article>
|
||
</section>
|
||
|
||
</div>
|
||
<div class="col-sm-3" id="sidebar">
|
||
<aside>
|
||
|
||
<section class="well well-sm">
|
||
<ul class="list-group list-group-flush">
|
||
<li class="list-group-item"><h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
|
||
<ul class="list-group" id="social">
|
||
<li class="list-group-item"><a href="https://github.com/rspivak/"><i class="fa fa-github-square fa-lg"></i> github</a></li>
|
||
<li class="list-group-item"><a href="https://twitter.com/rspivak"><i class="fa fa-twitter-square fa-lg"></i> twitter</a></li>
|
||
<li class="list-group-item"><a href="https://linkedin.com/in/ruslanspivak/"><i class="fa fa-linkedin-square fa-lg"></i> linkedin</a></li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li class="list-group-item"><h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Popular posts</span></h4>
|
||
<ul class="list-group" id="popularposts">
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbaws-part1/index.html">
|
||
Let's Build A Web Server. Part 1.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbasi-part1/index.html">
|
||
Let's Build A Simple Interpreter. Part 1.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbaws-part2/index.html">
|
||
Let's Build A Web Server. Part 2.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbaws-part3/index.html">
|
||
Let's Build A Web Server. Part 3.
|
||
</a>
|
||
</li>
|
||
<li class="list-group-item"
|
||
style="font-size: 15px; word-break: normal;">
|
||
<a href="../lsbasi-part2/index.html">
|
||
Let's Build A Simple Interpreter. Part 2.
|
||
</a>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
|
||
<li class="list-group-item">
|
||
<h4>
|
||
<span>Disclaimer</span>
|
||
</h4>
|
||
<p id="disclaimer-text"> Some of the links on this site
|
||
have my Amazon referral id, which provides me with a small
|
||
commission for each sale. Thank you for your support.
|
||
</p>
|
||
</li>
|
||
|
||
|
||
|
||
</ul>
|
||
</section>
|
||
</aside>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
<footer>
|
||
<div class="container">
|
||
<hr>
|
||
<div class="row">
|
||
<div class="col-xs-10">© 2020 Ruslan Spivak
|
||
<!-- · Powered by <a href="https://github.com/DandyDev/pelican-bootstrap3" target="_blank">pelican-bootstrap3</a>, -->
|
||
<!-- <a href="http://docs.getpelican.com/" target="_blank">Pelican</a>, -->
|
||
<!-- <a href="http://getbootstrap.com" target="_blank">Bootstrap</a> -->
|
||
<!-- -->
|
||
</div>
|
||
<div class="col-xs-2"><p class="pull-right"><i class="fa fa-arrow-up"></i> <a href="index.html#">Back to top</a></p></div>
|
||
</div>
|
||
</div>
|
||
</footer>
|
||
<script src="../theme/js/jquery.min.js"></script>
|
||
|
||
<!-- Include all compiled plugins (below), or include individual files as needed -->
|
||
<script src="../theme/js/bootstrap.min.js"></script>
|
||
|
||
<!-- Enable responsive features in IE8 with Respond.js (https://github.com/scottjehl/Respond) -->
|
||
<script src="../theme/js/respond.min.js"></script>
|
||
|
||
<!-- Disqus -->
|
||
<script type="text/javascript">
|
||
/* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */
|
||
var disqus_shortname = 'ruslanspivak'; // required: replace example with your forum shortname
|
||
|
||
/* * * DON'T EDIT BELOW THIS LINE * * */
|
||
(function () {
|
||
var s = document.createElement('script');
|
||
s.async = true;
|
||
s.type = 'text/javascript';
|
||
s.src = '//' + disqus_shortname + '.disqus.com/count.js';
|
||
(document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
|
||
}());
|
||
</script>
|
||
<!-- End Disqus Code -->
|
||
<!-- Google Analytics Universal -->
|
||
<script type="text/javascript">
|
||
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
|
||
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
|
||
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
|
||
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
|
||
|
||
ga('create', 'UA-2572871-3', 'auto');
|
||
ga('send', 'pageview');
|
||
</script>
|
||
<!-- End Google Analytics Universal Code -->
|
||
|
||
</body>
|
||
</html> |