1
0
Fork 0
cl-sites/w3.cs.jmu.edu/kirkpams/OpenCSF/Books/csf/html/TCPSockets.html
2025-01-28 10:11:14 +01:00

841 lines
No EOL
69 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>4.5. TCP Socket Programming: HTTP &mdash; Computer Systems Fundamentals</title>
<link rel="stylesheet" href="_static/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous" />
<link rel="stylesheet" href="_static/css/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/css/normalize.css" type="text/css" />
<link rel="stylesheet" href="../../../JSAV/css/JSAV.css" type="text/css" />
<link rel="stylesheet" href="../../../lib/odsaMOD-min.css" type="text/css" />
<link rel="stylesheet" href="_static/css/jquery-1.11.4-smoothness-ui.css" type="text/css" />
<link rel="stylesheet" href="../../../lib/odsaStyle-min.css" type="text/css" />
<link rel="stylesheet" href="_static/css/csf.css" type="text/css" />
<style>
.underline { text-decoration: underline; }
</style>
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '0.4.1',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
inlineMath: [['$','$'], ['\\(','\\)']],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true
},
"HTML-CSS": {
scale: "80"
}
});
</script>
<link rel="shortcut icon" href="_static/favicon.ico"/>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="index" title="Computer Systems Fundamentals" href="index.html" />
<link rel="next" title="6. UDP Socket Programming: DNS" href="UDPSockets.html" />
<link rel="prev" title="4. The Socket Interface" href="Sockets.html" />
</head><body>
<nav class="navbar navbar-expand-md navbar-dark navbar-custom fixed-top">
<a class="navbar-brand py-0" href="index.html"><img src="_static/CSF-Logo-Square-Text.png" alt="OpenCSF Logo" height="40em" class="py-1 px-2 mb-0 align-center rounded-lg bg-white" /></a>
<!-- Show a navbar toggler on mobile -->
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#defaultNavbars" aria-controls="defaultNavbars" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="defaultNavbars">
<ul class="navbar-nav mr-auto">
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle jmu-gold rounded" href="TCPSockets.html#" id="navbarDropdownChapters" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">Contents</a>
<div class="dropdown-menu scrollable-menu" role="menu" aria-labelledby="navbarDropdownChapters">
<a class="dropdown-item" tabindex="-1" href="TCPSockets.html#"><b>Chapter 1</b></a>
<a class="dropdown-item" href="IntroConcSysOverview.html">&nbsp;&nbsp;&nbsp;1.1. Introduction to Concurrent Systems</a>
<a class="dropdown-item" href="SysAndModels.html">&nbsp;&nbsp;&nbsp;1.2. Systems and Models</a>
<a class="dropdown-item" href="Themes.html">&nbsp;&nbsp;&nbsp;1.3. Themes and Guiding Principles</a>
<a class="dropdown-item" href="Architectures.html">&nbsp;&nbsp;&nbsp;1.4. System Architectures</a>
<a class="dropdown-item" href="StateModels.html">&nbsp;&nbsp;&nbsp;1.5. State Models in UML</a>
<a class="dropdown-item" href="SequenceModels.html">&nbsp;&nbsp;&nbsp;1.6. Sequence Models in UML</a>
<a class="dropdown-item" href="StateModelImplementation.html">&nbsp;&nbsp;&nbsp;1.7. Extended Example: State Model Implementation</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 2</b></a>
<a class="dropdown-item" href="ProcessesOverview.html">&nbsp;&nbsp;&nbsp;2.1. Processes and OS Basics</a>
<a class="dropdown-item" href="Multiprogramming.html">&nbsp;&nbsp;&nbsp;2.2. Processes and Multiprogramming</a>
<a class="dropdown-item" href="KernelMechanics.html">&nbsp;&nbsp;&nbsp;2.3. Kernel Mechanics</a>
<a class="dropdown-item" href="Syscall.html">&nbsp;&nbsp;&nbsp;2.4. System Call Interface</a>
<a class="dropdown-item" href="ProcessCycle.html">&nbsp;&nbsp;&nbsp;2.5. Process Life Cycle</a>
<a class="dropdown-item" href="UnixFile.html">&nbsp;&nbsp;&nbsp;2.6. The UNIX File Abstraction</a>
<a class="dropdown-item" href="EventsSignals.html">&nbsp;&nbsp;&nbsp;2.7. Events and Signals</a>
<a class="dropdown-item" href="Extended2Processes.html">&nbsp;&nbsp;&nbsp;2.8. Extended Example: Listing Files with Processes</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 3</b></a>
<a class="dropdown-item" href="IPCOverview.html">&nbsp;&nbsp;&nbsp;3.1. Concurrency with IPC</a>
<a class="dropdown-item" href="IPCModels.html">&nbsp;&nbsp;&nbsp;3.2. IPC Models</a>
<a class="dropdown-item" href="Pipes.html">&nbsp;&nbsp;&nbsp;3.3. Pipes and FIFOs</a>
<a class="dropdown-item" href="MMap.html">&nbsp;&nbsp;&nbsp;3.4. Shared Memory With Memory-mapped Files</a>
<a class="dropdown-item" href="POSIXvSysV.html">&nbsp;&nbsp;&nbsp;3.5. POSIX vs. System V IPC</a>
<a class="dropdown-item" href="MQueues.html">&nbsp;&nbsp;&nbsp;3.6. Message Passing With Message Queues</a>
<a class="dropdown-item" href="ShMem.html">&nbsp;&nbsp;&nbsp;3.7. Shared Memory</a>
<a class="dropdown-item" href="IPCSems.html">&nbsp;&nbsp;&nbsp;3.8. Semaphores</a>
<a class="dropdown-item" href="Extended3Bash.html">&nbsp;&nbsp;&nbsp;3.9. Extended Example: Bash-lite: A Simple Command-line Shell</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 4</b></a>
<a class="dropdown-item" href="SocketsOverview.html">&nbsp;&nbsp;&nbsp;4.1. Networked Concurrency</a>
<a class="dropdown-item" href="FiveLayer.html">&nbsp;&nbsp;&nbsp;4.2. The TCP/IP Internet Model</a>
<a class="dropdown-item" href="NetApps.html">&nbsp;&nbsp;&nbsp;4.3. Network Applications and Protocols</a>
<a class="dropdown-item" href="Sockets.html">&nbsp;&nbsp;&nbsp;4.4. The Socket Interface</a>
<a class="dropdown-item" href="TCPSockets.html">&nbsp;&nbsp;&nbsp;4.5. TCP Socket Programming: HTTP</a>
<a class="dropdown-item" href="UDPSockets.html">&nbsp;&nbsp;&nbsp;4.6. UDP Socket Programming: DNS</a>
<a class="dropdown-item" href="AppBroadcast.html">&nbsp;&nbsp;&nbsp;4.7. Application-Layer Broadcasting: DHCP</a>
<a class="dropdown-item" href="Extended4CGI.html">&nbsp;&nbsp;&nbsp;4.8. Extended Example: CGI Web Server</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 5</b></a>
<a class="dropdown-item" href="InternetOverview.html">&nbsp;&nbsp;&nbsp;5.1. The Internet and Connectivity</a>
<a class="dropdown-item" href="AppLayer.html">&nbsp;&nbsp;&nbsp;5.2. Application Layer: Overlay Networks</a>
<a class="dropdown-item" href="TransLayer.html">&nbsp;&nbsp;&nbsp;5.3. Transport Layer</a>
<a class="dropdown-item" href="NetSec.html">&nbsp;&nbsp;&nbsp;5.4. Network Security Fundamentals</a>
<a class="dropdown-item" href="NetLayer.html">&nbsp;&nbsp;&nbsp;5.5. Network Layer: IP</a>
<a class="dropdown-item" href="LinkLayer.html">&nbsp;&nbsp;&nbsp;5.6. Link Layer</a>
<a class="dropdown-item" href="Wireless.html">&nbsp;&nbsp;&nbsp;5.7. Wireless Connectivity: Wi-Fi, Bluetooth, and Zigbee</a>
<a class="dropdown-item" href="Extended5DNS.html">&nbsp;&nbsp;&nbsp;5.8. Extended Example: DNS client</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 6</b></a>
<a class="dropdown-item" href="ThreadsOverview.html">&nbsp;&nbsp;&nbsp;6.1. Concurrency with Multithreading</a>
<a class="dropdown-item" href="ProcVThreads.html">&nbsp;&nbsp;&nbsp;6.2. Processes vs. Threads</a>
<a class="dropdown-item" href="RaceConditions.html">&nbsp;&nbsp;&nbsp;6.3. Race Conditions and Critical Sections</a>
<a class="dropdown-item" href="POSIXThreads.html">&nbsp;&nbsp;&nbsp;6.4. POSIX Thread Library</a>
<a class="dropdown-item" href="ThreadArgs.html">&nbsp;&nbsp;&nbsp;6.5. Thread Arguments and Return Values</a>
<a class="dropdown-item" href="ImplicitThreads.html">&nbsp;&nbsp;&nbsp;6.6. Implicit Threading and Language-based Threads</a>
<a class="dropdown-item" href="Extended6Input.html">&nbsp;&nbsp;&nbsp;6.7. Extended Example: Keyboard Input Listener</a>
<a class="dropdown-item" href="Extended6Primes.html">&nbsp;&nbsp;&nbsp;6.8. Extended Example: Concurrent Prime Number Search</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 7</b></a>
<a class="dropdown-item" href="SynchOverview.html">&nbsp;&nbsp;&nbsp;7.1. Synchronization Primitives</a>
<a class="dropdown-item" href="CritSect.html">&nbsp;&nbsp;&nbsp;7.2. Critical Sections and Peterson's Solution</a>
<a class="dropdown-item" href="Locks.html">&nbsp;&nbsp;&nbsp;7.3. Locks</a>
<a class="dropdown-item" href="Semaphores.html">&nbsp;&nbsp;&nbsp;7.4. Semaphores</a>
<a class="dropdown-item" href="Barriers.html">&nbsp;&nbsp;&nbsp;7.5. Barriers</a>
<a class="dropdown-item" href="Condvars.html">&nbsp;&nbsp;&nbsp;7.6. Condition Variables</a>
<a class="dropdown-item" href="Deadlock.html">&nbsp;&nbsp;&nbsp;7.7. Deadlock</a>
<a class="dropdown-item" href="Extended7Events.html">&nbsp;&nbsp;&nbsp;7.8. Extended Example: Event Log File</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 8</b></a>
<a class="dropdown-item" href="SynchProblemsOverview.html">&nbsp;&nbsp;&nbsp;8.1. Synchronization Patterns and Problems</a>
<a class="dropdown-item" href="SynchDesign.html">&nbsp;&nbsp;&nbsp;8.2. Basic Synchronization Design Patterns</a>
<a class="dropdown-item" href="ProdCons.html">&nbsp;&nbsp;&nbsp;8.3. Producer-Consumer Problem</a>
<a class="dropdown-item" href="ReadWrite.html">&nbsp;&nbsp;&nbsp;8.4. Readers-Writers Problem</a>
<a class="dropdown-item" href="DiningPhil.html">&nbsp;&nbsp;&nbsp;8.5. Dining Philosophers Problem and Deadlock</a>
<a class="dropdown-item" href="CigSmokers.html">&nbsp;&nbsp;&nbsp;8.6. Cigarette Smokers Problem and the Limits of Semaphores and Locks</a>
<a class="dropdown-item" href="Extended8ModExp.html">&nbsp;&nbsp;&nbsp;8.7. Extended Example: Parallel Modular Exponentiation</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Chapter 9</b></a>
<a class="dropdown-item" href="ParallelDistributedOverview.html">&nbsp;&nbsp;&nbsp;9.1. Parallel and Distributed Systems</a>
<a class="dropdown-item" href="ParVConc.html">&nbsp;&nbsp;&nbsp;9.2. Parallelism vs. Concurrency</a>
<a class="dropdown-item" href="ParallelDesign.html">&nbsp;&nbsp;&nbsp;9.3. Parallel Design Patterns</a>
<a class="dropdown-item" href="Scaling.html">&nbsp;&nbsp;&nbsp;9.4. Limits of Parallelism and Scaling</a>
<a class="dropdown-item" href="DistTiming.html">&nbsp;&nbsp;&nbsp;9.5. Timing in Distributed Environments</a>
<a class="dropdown-item" href="DistDataStorage.html">&nbsp;&nbsp;&nbsp;9.6. Reliable Data Storage and Location</a>
<a class="dropdown-item" href="DistConsensus.html">&nbsp;&nbsp;&nbsp;9.7. Consensus in Distributed Systems</a>
<a class="dropdown-item" href="Extended9Blockchain.html">&nbsp;&nbsp;&nbsp;9.8. Extended Example: Blockchain Proof-of-Work</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item disabled"><b>Appendix A</b></a>
<a class="dropdown-item" href="CLangOverview.html">&nbsp;&nbsp;&nbsp;A.1. C Language Reintroduction</a>
<a class="dropdown-item" href="Debugging.html">&nbsp;&nbsp;&nbsp;A.2. Documentation and Debugging</a>
<a class="dropdown-item" href="BasicTypes.html">&nbsp;&nbsp;&nbsp;A.3. Basic Types and Pointers</a>
<a class="dropdown-item" href="Arrays.html">&nbsp;&nbsp;&nbsp;A.4. Arrays, Structs, Enums, and Type Definitions</a>
<a class="dropdown-item" href="Functions.html">&nbsp;&nbsp;&nbsp;A.5. Functions and Scope</a>
<a class="dropdown-item" href="Pointers.html">&nbsp;&nbsp;&nbsp;A.6. Pointers and Dynamic Allocation</a>
<a class="dropdown-item" href="Strings.html">&nbsp;&nbsp;&nbsp;A.7. Strings</a>
<a class="dropdown-item" href="FunctionPointers.html">&nbsp;&nbsp;&nbsp;A.8. Function Pointers</a>
<a class="dropdown-item" href="Files.html">&nbsp;&nbsp;&nbsp;A.9. Files</a>
</div>
</li>
</ul>
</div>
<ul class="navbar-nav flex-row ml-md-auto d-none d-md-flex">
<li class="nav-item"><a class="nav-link jmu-gold" href="https://w3.cs.jmu.edu/kirkpams/OpenCSF/Books/csf/source/TCPSockets.rst"
target="_blank" rel="nofollow">Show Source</a></li>
</ul>
</nav>
<div class="container center">
«&#160;&#160;<a id="prevmod" href="Sockets.html">4.4. The Socket Interface</a>
&#160;&#160;::&#160;&#160;
<a class="uplink" href="index.html">Contents</a>
&#160;&#160;::&#160;&#160;
<a id="nextmod" href="UDPSockets.html">4.6. UDP Socket Programming: DNS</a>&#160;&#160;»
</div>
<br />
<script type="text/javascript" src="_static/js/jquery-2.1.4.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="_static/js/jquery-1.11.4-ui.min.js"></script>
<script type="text/javascript" src="_static/js/forge-0.7.0.min.js"></script>
<script type="text/javascript" src="../../../JSAV/lib/jquery.transit.js"></script>
<script type="text/javascript" src="../../../JSAV/lib/raphael.js"></script>
<script type="text/javascript" src="../../../JSAV/build/JSAV-min.js"></script>
<script type="text/javascript" src="_static/js/config.js"></script>
<script type="text/javascript" src="../../../lib/odsaUtils-min.js"></script>
<script type="text/javascript" src="../../../lib/odsaMOD-min.js"></script>
<script type="text/javascript" src="_static/js/d3-4.13.0.min.js"></script>
<script type="text/javascript" src="_static/js/d3-selection-multi.v1.min.js"></script>
<script type="text/javascript" src="../../../lib/dataStructures.js"></script>
<div class="container">
<script>ODSA.SETTINGS.DISP_MOD_COMP = true;ODSA.SETTINGS.MODULE_NAME = "TCPSockets";ODSA.SETTINGS.MODULE_LONG_NAME = "TCP Socket Programming: HTTP";ODSA.SETTINGS.MODULE_CHAPTER = "Networked Concurrency"; ODSA.SETTINGS.BUILD_DATE = "2021-06-14 17:15:25"; ODSA.SETTINGS.BUILD_CMAP = false;JSAV_OPTIONS['lang']='en';JSAV_EXERCISE_OPTIONS['code']='java';</script><div class="section" id="tcp-socket-programming-http">
<h1>4.5. TCP Socket Programming: HTTP<a class="headerlink" href="TCPSockets.html#tcp-socket-programming-http" title="Permalink to this headline"></a></h1>
<p>Processes running at the application layer of the protocol stack are not fundamentally different
from non-networked concurrent applications. The process has a virtual memory space, can exchange
data through IPC channels, may interact with users through <code class="docutils literal notranslate"><span class="pre">STDIN</span></code> and <code class="docutils literal notranslate"><span class="pre">STDOUT</span></code>, and so on. The
primary differences between such distributed application processes and non-networked processes are
that the data is exchanged via an IPC channel based on a predefined communication <a class="reference internal" href="Glossary.html#term-protocol"><span class="xref std std-term">protocol</span></a>,
and that channel has a significantly higher likelihood of intermittent communication failures. The
peer process on the other host may be built by the same development team, it may be a customized
open-source server, or it may be a proprietary network service. So long as both processes agree to
abide by the protocol specification, writing distributed applications is not drastically different
from other concurrent applications with IPC. In this section, we will demonstrate how to use TCP
sockets to implement the basic functionality of HTTP, the protocol that underlies web-based
technologies.</p>
<div class="section" id="hypertext-transfer-protocol-http">
<h2>4.5.1. Hypertext Transfer Protocol (HTTP)<a class="headerlink" href="TCPSockets.html#hypertext-transfer-protocol-http" title="Permalink to this headline"></a></h2>
<div class="figure mb-2 align-right" id="id13" style="width: 40%">
<span id="tcphttp"></span><a class="reference internal image-reference" href="_images/CSF-Images.4.7.png"><img class="p-3 mb-2 align-center border border-dark rounded-lg" alt="Basic request-response structure of HTTP running on top of TCP" src="_images/CSF-Images.4.7.png" style="width: 90%;" /></a>
<p class="caption align-center px-3"><span class="caption-text"> Figure 4.5.1: Basic request-response structure of HTTP running on top of TCP</span></p>
</div>
<p>HTTP is the protocol that defines communication for web browsers and servers. Readers who have built
personal or professional web pages have relied on this protocol, even if they were unaware of the
details of its operation. HTTP is a simple <a class="reference internal" href="Glossary.html#term-request-response-protocol"><span class="xref std std-term">request-response protocol</span></a>, defined in RFC 2616.
To be precise, HTTP is a <a class="reference internal" href="Glossary.html#term-stateless-protocol"><span class="xref std std-term">stateless protocol</span></a>, in the sense that neither the client nor the
server preserves any state information between requests; the server processes each request
independently from those that arrived previously. HTTP applications use TCP connections for their
transport layer, and <a href="TCPSockets.html#tcphttp">Figure 4.5.1</a> shows the basic structure of HTTP in relation to the
functions that establish the socket connection. The client—a web browser—sends an HTTP request to
the server and receives a response.</p>
<div class="line-block">
<div class="line"><br /></div>
<div class="line"><br /></div>
<div class="line"><br /></div>
<div class="line"><br /></div>
</div>
<div class="topic border border-dark rounded-lg bg-light px-2 mb-3" id="getrequestex">
<div class="figure align-left">
<a class="reference internal image-reference" href="_images/CSF-Images-Example.png"><img alt="Decorative example icon" src="_images/CSF-Images-Example.png" style="width: 100%;" /></a>
</div>
<p class="topic-title first pt-2 mb-1">Example 4.5.1 </p><hr class="mt-1" />
<p>Both HTTP requests and responses begin with a sequence of header lines, each ending in a
two-character sequence denoted as <code class="docutils literal notranslate"><span class="pre">CRLF</span></code> (carriage return-line feed, or <code class="docutils literal notranslate"><span class="pre">&quot;\r\n&quot;</span></code> in C strings).
The first line of requests must be a designated <code class="docutils literal notranslate"><span class="pre">Request</span></code> or <code class="docutils literal notranslate"><span class="pre">Response</span></code> line, which must adhere
to a given structure. After the first line, all other headers are optional, but they provide the
client and server with additional useful information. At the end of the header lines, there is a
single blank line (consisting of only <code class="docutils literal notranslate"><span class="pre">CRLF</span></code>). The figure below shows a sample HTTP
header for a GET request, which is the type of request that indicates the client is asking for a
copy of a file; in contrast a POST request occurs when the client is writing data back to the
server. In the figure below, the client is requesting
<code class="docutils literal notranslate"><span class="pre">http://example.com/index.html</span></code>, based on a link from <code class="docutils literal notranslate"><span class="pre">https://link.from.com</span></code>.</p>
<div class="figure mb-2 align-center" id="id14">
<a class="reference internal image-reference" href="_images/CSF-Images.4.8.png"><img class="p-3 mb-2 align-center border border-dark rounded-lg" alt="Sample HTTP headers for a GET request" src="_images/CSF-Images.4.8.png" style="width: 70%;" /></a>
<p class="caption align-center px-3"><span class="caption-text"> Figure 4.5.3: Sample HTTP headers for a GET request</span></p>
</div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">netcat</span></code> tool is a useful way to explore the details of HTTP without a web browser. <a class="footnote-reference" href="TCPSockets.html#f25" id="id1">[1]</a>
Using <code class="docutils literal notranslate"><span class="pre">netcat</span></code>, you can interact directly with a remote HTTP server, typing the lines of the
protocol itself. This tool is useful for text-based protocols like HTTP but cannot easily be used
for protocols that use binary-formatted data. Consider the following example of a command-line
session with <code class="docutils literal notranslate"><span class="pre">netcat</span></code>:</p>
<div class="highlight-none border border-dark rounded-lg bg-light px-2 mb-3 notranslate"><div class="highlight bg-light"><pre class="mb-0"><span></span>$ netcat -v example.com 80
Warning: Inverse name lookup failed for `93.184.216.34&#39;
example.com [93.184.216.34] 80 (http) open
GET / HTTP/1.1
Host: example.com
Connection: close
HTTP/1.1 200 OK
[...more lines here, omitted for brevity...]
</pre></div>
</div>
<p>To use <code class="docutils literal notranslate"><span class="pre">netcat</span></code>, you specify the hostname (<code class="docutils literal notranslate"><span class="pre">example.com</span></code>) and the port number (80) to access.
After the command prompt, the first two lines are printed by <code class="docutils literal notranslate"><span class="pre">netcat</span></code> (in verbose mode with the
<code class="docutils literal notranslate"><span class="pre">-v</span></code> flag) to indicate that it has connected to the server. The next four lines (the <code class="docutils literal notranslate"><span class="pre">GET</span></code>,
<code class="docutils literal notranslate"><span class="pre">Host</span></code>, <code class="docutils literal notranslate"><span class="pre">Connection</span></code>, and blank lines) were typed manually by the user to request the contents
of <code class="docutils literal notranslate"><span class="pre">http://example.com/</span></code>. The <code class="docutils literal notranslate"><span class="pre">Host</span></code> is required for HTTP/1.1, as many web servers are operated
by third-party providers. In the case of <code class="docutils literal notranslate"><span class="pre">example.com</span></code>, the web server is operated by a cloud
service provider, <code class="docutils literal notranslate"><span class="pre">fastly.net</span></code>. That is, the server at 93.184.216.34 is not serving content
exclusively for <code class="docutils literal notranslate"><span class="pre">example.com</span></code>; there are several other domains that can be accessed from the same
IP address. The <code class="docutils literal notranslate"><span class="pre">Host</span></code> header, then, tells <code class="docutils literal notranslate"><span class="pre">fastly.net</span></code> which specific domain name you are
trying to reach. The lines beginning with <code class="docutils literal notranslate"><span class="pre">HTTP/1.1</span> <span class="pre">200</span> <span class="pre">OK</span></code> are the response from the server. The
structure of an HTTP response is explained below. We omit the full response here, as it consists of
several lines of HTTP headers and HTML code that are not critical to the current discussion.</p>
<p>Writing the messages for an HTTP header is straightforward, as the headers are just concatenated
text output. Code Listing 4.11 illustrates the general structure of this task. The client creates a
buffer and copies the required <code class="docutils literal notranslate"><span class="pre">Request</span></code> line into the beginning. The string concatenation
function, <code class="docutils literal notranslate"><span class="pre">strncat()</span></code>, appends the other lines to the buffer, and the buffer is written to the
socket. Note that the <code class="docutils literal notranslate"><span class="pre">length</span></code> variable is used to keep track of how much available space is
remaining in the buffer, which is always the capacity (500) minus the length of the existing string
in the buffer.</p>
<div class="highlight-c border border-dark rounded-lg bg-light px-0 mb-3 notranslate" id="cl4-11"><table class="highlighttable"><tr><td class="linenos px-0 mx-0"><div class="linenodiv"><pre class="mb-0"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18</pre></div></td><td class="code"><div class="highlight bg-light"><pre class="mb-0"><span></span><span class="cm">/* Code Listing 4.11:</span>
<span class="cm"> Constructing and sending an HTTP GET request</span>
<span class="cm"> */</span>
<span class="kt">size_t</span> <span class="n">length</span> <span class="o">=</span> <span class="mi">500</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="n">length</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">memset</span> <span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span> <span class="p">(</span><span class="n">buffer</span><span class="p">));</span>
<span class="cm">/* Copy first line in and shrink the remaining length available */</span>
<span class="n">strncpy</span> <span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="s">&quot;GET /web/index.html HTTP/1.0</span><span class="se">\r\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">length</span><span class="p">);</span>
<span class="n">length</span> <span class="o">=</span> <span class="mi">500</span> <span class="o">-</span> <span class="n">strlen</span> <span class="p">(</span><span class="n">buffer</span><span class="p">);</span>
<span class="cm">/* Concatenate each additional header line */</span>
<span class="n">strncat</span> <span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="s">&quot;Accept: text/html</span><span class="se">\r\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">length</span><span class="p">);</span>
<span class="n">length</span> <span class="o">=</span> <span class="mi">500</span> <span class="o">-</span> <span class="n">strlen</span> <span class="p">(</span><span class="n">buffer</span><span class="p">);</span>
<span class="cm">/* Other lines are similar and omitted... */</span>
<span class="n">write</span> <span class="p">(</span><span class="n">socketfd</span><span class="p">,</span> <span class="n">buffer</span><span class="p">,</span> <span class="n">strlen</span> <span class="p">(</span><span class="n">buffer</span><span class="p">));</span>
</pre></div>
</td></tr></table></div>
<div class="topic border border-dark rounded-lg alert-danger px-2 mb-3">
<div class="figure align-left">
<a class="reference internal image-reference" href="_images/CSF-Images-BugWarning.png"><img alt="Decorative bug warning" src="_images/CSF-Images-BugWarning.png" style="width: 90%;" /></a>
</div>
<p class="topic-title first pt-2 mb-1">Bug Warning</p><hr class="mt-1" />
<p>Cs string functions are notorious sources of buffer overflow vulnerabilities. One common way these vulnerabilities arise is with repeated calls to <code class="docutils literal notranslate"><span class="pre">strncat()</span></code>, such as the omitted lines in <a class="reference external" href="TCPSockets.html#cl4-11">Code Listing 4.11</a>. The problem is that each call reduces the amount of space left in the buffer. As this happens, the <code class="docutils literal notranslate"><span class="pre">length</span></code> parameter passed to <code class="docutils literal notranslate"><span class="pre">strncat()</span></code> each time must shrink to match only the remaining size of the buffer, not the original size. Using the original size each time would create the possibility that strings would be concatenated beyond the end of the buffer.</p>
</div>
<div class="topic border border-dark rounded-lg bg-light px-2 mb-3" id="getresponseex">
<div class="figure align-left">
<a class="reference internal image-reference" href="_images/CSF-Images-Example.png"><img alt="Decorative example icon" src="_images/CSF-Images-Example.png" style="width: 100%;" /></a>
</div>
<p class="topic-title first pt-2 mb-1">Example 4.5.2 </p><hr class="mt-1" />
<p>The figure below shows a sample response from a web server for the request in
<a href="TCPSockets.html#getrequestex">Example 4.5.1</a>. The response begins with a required <code class="docutils literal notranslate"><span class="pre">Response</span></code> line that lets the
client know the request was successful. The optional headers indicate that the body of the message
(after the blank line) consists of 37 bytes of HTML text. (Note that the body of the message in
<a href="TCPSockets.html#getrequestex">Example 4.5.1</a> was empty, which is typical for <code class="docutils literal notranslate"><span class="pre">GET</span></code> requests.) The body of the
response is the contents of the file <code class="docutils literal notranslate"><span class="pre">index.html</span></code> stored in the web servers designated root
directory. The newline character at the end of the HTML code is not required by the HTTP protocol;
rather, it is simply a character stored in the file, as most text editors place a newline at the
end of the file. From the perspective of HTTP, the body of the message is a meaningless stream of
bytes; the content type only matters to the client (the web browser) so that the client knows how
to handle the data. Specifically, the second line of the message body is an HTML header, demarcated
with the <code class="docutils literal notranslate"><span class="pre">&lt;head&gt;...&lt;/head&gt;</span></code> tag structure. This header has no meaning to HTTP itself.</p>
<div class="figure mb-2 align-center" id="id15">
<a class="reference internal image-reference" href="_images/CSF-Images.4.9.png"><img class="p-3 mb-2 align-center border border-dark rounded-lg" alt="Sample HTTP response to the request from Figure 4.5.2" src="_images/CSF-Images.4.9.png" style="width: 70%;" /></a>
<p class="caption align-center px-3"><span class="caption-text"> Figure 4.5.6: Sample HTTP response to the request from <a href="TCPSockets.html#getrequestex">Example 4.5.1</a></span></p>
</div>
</div>
</div>
<div class="section" id="bnf-protocol-specification">
<h2>4.5.2. BNF Protocol Specification<a class="headerlink" href="TCPSockets.html#bnf-protocol-specification" title="Permalink to this headline"></a></h2>
<p>The key features of the HTTP specification in RFC 2616 are structured as BNF declarations. To
understand how these declarations structure the protocol, consider the required request and response
lines. Every HTTP request must begin with a <code class="docutils literal notranslate"><span class="pre">Request-Line</span></code> and every response must begin with a <code class="docutils literal notranslate"><span class="pre">Status-Line</span></code>:</p>
<div class="highlight-none border border-dark rounded-lg bg-light px-2 mb-3 notranslate"><div class="highlight bg-light"><pre class="mb-0"><span></span>Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">SP</span></code> designates a space while <code class="docutils literal notranslate"><span class="pre">CRLF</span></code> designates the carriage return-line feed. Although
whitespace is typically insignificant in HTML, it is significant when processing HTTP headers;
spaces and <code class="docutils literal notranslate"><span class="pre">CRLF</span></code> characters are required in particular places to facilitate correct
interpretations. For requests, there are several valid <code class="docutils literal notranslate"><span class="pre">Methods</span></code> that can be used, with <code class="docutils literal notranslate"><span class="pre">GET</span></code>
and <code class="docutils literal notranslate"><span class="pre">POST</span></code> being the two most common. A <code class="docutils literal notranslate"><span class="pre">GET</span></code> request corresponds to reading a file to display
in the web browser; the body of the request would be empty in that case. A <code class="docutils literal notranslate"><span class="pre">POST</span></code> request, on the
other hand, occurs when the web browser is sending data back to the server, such as when a user
enters data in a form and submits it; the message body after the blank line contain the data to
send, which would contain the form contents.</p>
<p>Readers with experience writing HTML code may be familiar with <em>query strings</em>
and <a class="reference internal" href="Glossary.html#term-cookie"><span class="xref std std-term">cookies</span></a>. As described previously, a query is part of the standard URI structure
that begins with a <code class="docutils literal notranslate"><span class="pre">'?'</span></code> and can provide information to the server about how to process the
request. For instance, the URL <code class="docutils literal notranslate"><span class="pre">http://example.com/help.html?topic=login</span></code> indicates that the user
is looking for help logging in. The <code class="docutils literal notranslate"><span class="pre">Request-URI</span></code> in this case is <code class="docutils literal notranslate"><span class="pre">/help.html?topic=login</span></code>,
containing the query string. <a class="footnote-reference" href="TCPSockets.html#f26" id="id2">[2]</a> Cookies, on the other hand, are another technology used to
provide data to the server; for instance, a cookie may contain an authentication token or a username
to keep track of the user from one request to the next. Cookies are stored in their own HTTP header,
but they are not described in RFC 2616. Instead, there is a separate RFC 6265 that defines the
structure of cookies and how to use them. Ultimately, though, from the perspective of HTTP describe
above, cookies are simply small pieces of data stored in another optional header field.</p>
</div>
<div class="section" id="http-1-1-persistent-connections">
<h2>4.5.3. HTTP/1.1 Persistent Connections<a class="headerlink" href="TCPSockets.html#http-1-1-persistent-connections" title="Permalink to this headline"></a></h2>
<p>The last part of an HTTP <code class="docutils literal notranslate"><span class="pre">Request-Line</span></code> is the version, which corresponds to the first field of
the <code class="docutils literal notranslate"><span class="pre">Status-Line</span></code> that begins the response. The examples in <a href="TCPSockets.html#getrequestex">Example 4.5.1</a> and
<a href="TCPSockets.html">Example GetResponseEx</a> used the version HTTP/1.0, which is the basic <code class="docutils literal notranslate"><span class="pre">request-response</span></code>
protocol we have discussed so far. HTTP/1.1 introduces <a class="reference internal" href="Glossary.html#term-persistent-connection"><span class="xref std std-term">persistent connections</span></a>, which are commonly used in modern web applications. With a standard HTTP/1.0 request,
the TCP socket connection is closed when the server sends the response. As such, if a client needs
to request more data, the client must establish a new connection and start over. With an HTTP/1.1
persistent connection, the TCP connection is only closed after the client sends a request that
explicitly asks to close the connection.</p>
<div class="highlight-html border border-dark rounded-lg bg-light px-0 mb-3 notranslate" id="cl4-12"><table class="highlighttable"><tr><td class="linenos px-0 mx-0"><div class="linenodiv"><pre class="mb-0"> 1
2
3
4
5
6
7
8
9
10</pre></div></td><td class="code"><div class="highlight bg-light"><pre class="mb-0"><span></span><span class="c">&lt;!-- Code Listing 4.12:</span>
<span class="c"> HTML code that causes the sequence in Figure 4.6 --&gt;</span>
<span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">head</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&quot;http://zoo.com/library.js&quot;</span> <span class="p">/&gt;</span>
<span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&quot;script.js&quot;</span> <span class="p">/&gt;</span>
<span class="p">&lt;/</span><span class="nt">head</span><span class="p">&gt;</span>
<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;&lt;</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">&quot;logo.png&quot;</span> <span class="p">/&gt;&lt;/</span><span class="nt">body</span><span class="p">&gt;</span>
<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
</pre></div>
</td></tr></table></div>
<div class="figure mb-2 align-right" id="id16" style="width: 50%">
<span id="tcpobjects"></span><a class="reference internal image-reference" href="_images/CSF-Images.4.10.png"><img class="p-3 mb-2 align-center border border-dark rounded-lg" alt="Requesting four objects with HTTP/1.1 and a fourth object with HTTP/1.0 from a second server" src="_images/CSF-Images.4.10.png" style="width: 95%;" /></a>
<p class="caption align-center px-3"><span class="caption-text"> Figure 4.5.7: Requesting four objects with HTTP/1.1 and a fourth object with HTTP/1.0
from a second server</span></p>
</div>
<p><a href="TCPSockets.html#tcpobjects">Figure 4.5.7</a> illustrates a sample scenario that benefits from HTTP/1.1. The client
connects to <code class="docutils literal notranslate"><span class="pre">foo.com</span></code> and requests the file <code class="docutils literal notranslate"><span class="pre">index.html</span></code>, which is shown in <a class="reference external" href="TCPSockets.html#cl4-12">Code Listing 4.12</a>. This file references two more files, <code class="docutils literal notranslate"><span class="pre">script.js</span></code> and <code class="docutils literal notranslate"><span class="pre">logo.png</span></code>, that are both
stored on <code class="docutils literal notranslate"><span class="pre">foo.com</span></code>. These additional files are retrieved with separate HTTP requests. These
requests are <a class="reference internal" href="Glossary.html#term-asynchronous"><span class="xref std std-term">asynchronous</span></a>, so the client can do other work while waiting for them. While
waiting, the client connects to <code class="docutils literal notranslate"><span class="pre">zoo.com</span></code> with a request for <code class="docutils literal notranslate"><span class="pre">library.js</span></code>. Since this is the
only file needed from that server, the client uses HTTP/1.0, which closes the socket immediately.
Since the requests to foo.com used HTTP/1.1, the client must explicitly send a separate request to
close the connection once all files have been received.</p>
<p>The asynchronous requests shown in <a href="TCPSockets.html#tcpobjects">Figure 4.5.7</a> are common in modern web designs. These
applications frequently augment HTML with JavaScript to create a dynamic and interactive web page;
for instance, clicking on a button may produce a drop-down box of menu options to appear. In some
cases, the JavaScript code may issue asynchronous HTTP requests. In truth, the web browser issues
the request, as all client-side software (including JavaScript) runs as part of the browser process.
These requests are standard HTTP requests as we have observed, with the exception that the browser
uses multiple <a class="reference internal" href="Glossary.html#term-thread"><span class="xref std std-term">threads of execution</span></a> to issue that request and wait for the response
while doing other work. When the browser receives the HTTP response, it will invoke a JavaScript
<em>callback</em> routine that the script declared ahead of time to be responsible for handling the
data. The web pages URL does not appear to change, but it is possible to observe these requests as
they happen. Most popular web browsers include a menu option, typically called something like
“Developer Tools,” that can monitor and even modify these requests.</p>
<p>To be clear, the use of persistent TCP connections does not change the nature of HTTP as a stateless
protocol. Although the TCP connection will remain open from one request to the next, the server does
not maintain any local information about the specifics of the previous HTTP request or how it
responded. The persistent connection is simply a performance improvement, and each request-response
is considered a distinct, unrelated exchange.</p>
</div>
<div class="section" id="processing-http-headers">
<h2>4.5.4. Processing HTTP Headers<a class="headerlink" href="TCPSockets.html#processing-http-headers" title="Permalink to this headline"></a></h2>
<p>After the <code class="docutils literal notranslate"><span class="pre">HTTP-Version</span></code>, the remainder of the HTTP response <code class="docutils literal notranslate"><span class="pre">Status-Line</span></code> provides information
to the client about whether or not the request was successful. The status is reported both as a
number (<code class="docutils literal notranslate"><span class="pre">Status-Code</span></code>) and a text description (<code class="docutils literal notranslate"><span class="pre">Reason-Phrase</span></code>). <a class="reference external" href="TCPSockets.html#tbl4-5">Table 4.5</a>
describes several of the most common statuses. The <code class="docutils literal notranslate"><span class="pre">Reason-Phrase</span></code>s shown here are used by
convention and they have no specific effect on processing the response; a web server could choose to
break this convention with arbitrary <code class="docutils literal notranslate"><span class="pre">Reason-Phrase</span></code>s and the response would still be processed
identically.</p>
<center>
<table class="table table-bordered">
<thead class="jmu-dark-purple-bg text-light">
<tr>
<th class="py-0 center">Status</th>
<th class="py-0 center">Reason-Phrase</th>
<th class="py-0 center">Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td class="py-0 center">200</td>
<td class="py-0">OK</td>
<td class="py-0">Request was successful</td>
</tr>
<tr>
<td class="py-0 center">301</td>
<td class="py-0">Moved Permanently</td>
<td class="py-0">File has been moved to a new location</td>
</tr>
<tr>
<td class="py-0 center">400</td>
<td class="py-0">Bad Request</td>
<td class="py-0">The HTTP request had incorrect syntax</td>
</tr>
<tr>
<td class="py-0 center">401</td>
<td class="py-0">Unauthorized</td>
<td class="py-0">The request requires user authentication</td>
</tr>
<tr>
<td class="py-0 center">403</td>
<td class="py-0">Forbidden</td>
<td class="py-0">Access to the resource is not allowed</td>
</tr>
<tr>
<td class="py-0 center">404</td>
<td class="py-0">Not Found</td>
<td class="py-0">No file was found based on Request-URI</td>
</tr>
<tr>
<td class="py-0 center">500</td>
<td class="py-0">Internal Server Error</td>
<td class="py-0">The server had an unexpected error or fault</td>
</tr>
<tr>
<td class="py-0 center">503</td>
<td class="py-0">Service Unavailable</td>
<td class="py-0">The server is unavailable or not accepting new requests</td>
</tr>
</tbody>
</table>
<p>
Table 4.5: Common HTTP response status messages and their meanings
</p>
</center><p>Some status codes are unrecoverable error messages. For instance, if a client receives 404 or 503
message, either the requested file does not exist or the server is down. There is nothing that the
web browser can do to correct those situations, and the browser simply displays an error message.
Other codes do allow the browser to respond automatically. If the server responds with a 301 status,
the HTTP response <cite>should</cite> include a Location header that designates the new location; this may be a
different URI on the same server, or it might be a full URL because the domain name is different.
Web browsers can read this header and re-issue a new HTTP request based on this new location. As
another example, if the server responds with a 401 status, the response must include a
<code class="docutils literal notranslate"><span class="pre">WWW-Authenticate</span></code> header with a <code class="docutils literal notranslate"><span class="pre">challenge</span></code>. The browser may re-issue the request with an added
<code class="docutils literal notranslate"><span class="pre">Authorization</span></code> header field that stores credentials to respond to the <code class="docutils literal notranslate"><span class="pre">challenge</span></code>.</p>
<p>Although writing HTTP headers—as shown in <a class="reference external" href="TCPSockets.html#cl4-11">Code Listing 4.11</a>—is straightforward,
reading them at the other end can be a challenge if not handled properly. The difficulty arises from
the fact that header sizes vary, so the receiver does not know how many bytes to request from the
socket at a time. To address this challenge, both clients and servers typically impose a maximum
header size of 8 KB by convention. The initial read from the socket requests this much data. If a
complete header is not found in this space, then the connection is terminated as invalid by clients;
servers that receive such invalid headers return Status 413 to indicate <code class="docutils literal notranslate"><span class="pre">Entity</span> <span class="pre">Too</span> <span class="pre">Large</span></code>. A
complete header must end with a blank line, creating the four-byte sequence <code class="docutils literal notranslate"><span class="pre">&quot;\r\n\r\n&quot;</span></code>. <a class="reference external" href="TCPSockets.html#cl4-13">Code
Listing 4.13</a> demonstrates how to perform this check by looking for this string. If it is
found, the <cite>second</cite> <code class="docutils literal notranslate"><span class="pre">'\r'</span></code> is replaced with the null-byte <code class="docutils literal notranslate"><span class="pre">'\0'</span></code> to convert <code class="docutils literal notranslate"><span class="pre">buffer</span></code> to a
complete string of the header, with each header line ending in <code class="docutils literal notranslate"><span class="pre">CRLF</span></code>.</p>
<div class="highlight-c border border-dark rounded-lg bg-light px-0 mb-3 notranslate" id="cl4-13"><table class="highlighttable"><tr><td class="linenos px-0 mx-0"><div class="linenodiv"><pre class="mb-0"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24</pre></div></td><td class="code"><div class="highlight bg-light"><pre class="mb-0"><span></span><span class="cm">/* Code Listing 4.13:</span>
<span class="cm"> Checking for a complete HTTP response header</span>
<span class="cm"> */</span>
<span class="cp">#define HEADER_MAX 8192</span>
<span class="cm">/* Allocate a buffer to handle initial responses up to 8 KB */</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">calloc</span> <span class="p">(</span><span class="n">HEADER_MAX</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="k">sizeof</span> <span class="p">(</span><span class="kt">char</span><span class="p">));</span>
<span class="kt">ssize_t</span> <span class="n">bytes</span> <span class="o">=</span> <span class="n">read</span> <span class="p">(</span><span class="n">socketfd</span><span class="p">,</span> <span class="n">buffer</span><span class="p">,</span> <span class="n">HEADER_MAX</span><span class="p">);</span>
<span class="n">assert</span> <span class="p">(</span><span class="n">bytes</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">);</span>
<span class="cm">/* Look for the end-of-header (eoh) CRLF CRLF substring; if</span>
<span class="cm"> not found, then the header size is too large */</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">eoh</span> <span class="o">=</span> <span class="n">strnstr</span> <span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="s">&quot;</span><span class="se">\r\n\r\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">HEADER_MAX</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">eoh</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">&quot;Header exceeds 8 KB maximum</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span>
<span class="n">close</span> <span class="p">(</span><span class="n">socketfd</span><span class="p">);</span>
<span class="k">return</span> <span class="n">EXIT_FAILURE</span><span class="p">;</span>
<span class="p">}</span>
<span class="cm">/* Replace the blank line of CRLF CRLF with \0 to split the</span>
<span class="cm"> header and body */</span>
<span class="n">eoh</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="sc">&#39;\0&#39;</span><span class="p">;</span>
</pre></div>
</td></tr></table></div>
<p>Once the header and body have been split, processing the header involves repeatedly breaking it at
the <code class="docutils literal notranslate"><span class="pre">CRLF</span></code> locations. <a class="reference external" href="TCPSockets.html#cl4-14">Code Listing 4.14</a> extends <a class="reference external" href="TCPSockets.html#cl4-13">Code Listing 4.13</a> to
demonstrate this processing, printing all header lines but looking specifically for the
<code class="docutils literal notranslate"><span class="pre">Content-Length</span></code> header. Since <a class="reference external" href="TCPSockets.html#cl4-13">Code Listing 4.13</a> ended by replacing the blank line of
<code class="docutils literal notranslate"><span class="pre">CRLF</span> <span class="pre">CRLF</span></code> with the null byte, line 25 will set <code class="docutils literal notranslate"><span class="pre">eol</span></code> to <code class="docutils literal notranslate"><span class="pre">NULL</span></code> after the last header line is
processed.</p>
<div class="highlight-c border border-dark rounded-lg bg-light px-0 mb-3 notranslate" id="cl4-14"><table class="highlighttable"><tr><td class="linenos px-0 mx-0"><div class="linenodiv"><pre class="mb-0"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26</pre></div></td><td class="code"><div class="highlight bg-light"><pre class="mb-0"><span></span><span class="cm">/* Code Listing 4.14:</span>
<span class="cm"> Extending Code Listing 4.13 to read a header line at a time</span>
<span class="cm"> */</span>
<span class="cm">/* Print each header line, with eol indicating the location</span>
<span class="cm"> of CRLF at the end-of-line */</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">line</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">eol</span> <span class="o">=</span> <span class="n">strstr</span> <span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="s">&quot;</span><span class="se">\r\n</span><span class="s">&quot;</span><span class="p">);</span>
<span class="kt">size_t</span> <span class="n">body_length</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">eol</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="cm">/* Null-terminate the line by replacing CRLF with \0 */</span>
<span class="o">*</span><span class="n">eol</span> <span class="o">=</span> <span class="sc">&#39;\0&#39;</span><span class="p">;</span>
<span class="n">printf</span> <span class="p">(</span><span class="s">&quot;HEADER LINE: %s</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="n">line</span><span class="p">);</span>
<span class="cm">/* Get the intended body length (in bytes) */</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span> <span class="n">strncmp</span> <span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="s">&quot;Content-Length: &quot;</span><span class="p">,</span> <span class="mi">16</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">len</span> <span class="o">=</span> <span class="n">strchr</span> <span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="sc">&#39; &#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">body_length</span> <span class="o">=</span> <span class="n">strtol</span> <span class="p">(</span><span class="n">len</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
<span class="p">}</span>
<span class="cm">/* Move the line pointer to the next line */</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">eol</span> <span class="o">+</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">eol</span> <span class="o">=</span> <span class="n">strstr</span> <span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="s">&quot;</span><span class="se">\r\n</span><span class="s">&quot;</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</td></tr></table></div>
<p>After <a class="reference external" href="TCPSockets.html#cl4-14">Code Listing 4.14</a> establishes the <cite>Content-Length</cite> of the message (either request
or response), this length can be compared with the length of the body that was already read. That
is, the initial read from the socket received no more than 8 KB of data, which was the maximum size
of the header. However, the body contents (particular for HTML data, images, and other objects
returned as a response) are likely to exceed this 8 KB size limit. Consequently, an additional
<code class="docutils literal notranslate"><span class="pre">read()</span></code> may be required to retrieve the rest of the body contents. <a class="reference external" href="TCPSockets.html#cl4-15">Code Listing 4.15</a>
completes the response processing by duplicating the body contents received so far and resizing it
to read in the additional data. Note that line 12 will read from the socket into the space just after
the existing body contents.</p>
<div class="highlight-c border border-dark rounded-lg bg-light px-0 mb-3 notranslate" id="cl4-15"><table class="highlighttable"><tr><td class="linenos px-0 mx-0"><div class="linenodiv"><pre class="mb-0"> 1
2
3
4
5
6
7
8
9
10
11
12
13</pre></div></td><td class="code"><div class="highlight bg-light"><pre class="mb-0"><span></span><span class="cm">/* Code Listing 4.15:</span>
<span class="cm"> Extending Code Listing 4.13 to read a header line at a time</span>
<span class="cm"> */</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">body</span> <span class="o">=</span> <span class="n">strdup</span> <span class="p">(</span><span class="n">eoh</span> <span class="o">+</span> <span class="mi">4</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">body_length</span> <span class="o">&gt;</span> <span class="n">strlen</span> <span class="p">(</span><span class="n">body</span><span class="p">))</span> <span class="c1">// if false, all data received</span>
<span class="p">{</span>
<span class="cm">/* Increase the body size and read additional data from the</span>
<span class="cm"> socket; the number of bytes to request is the Content-Length</span>
<span class="cm"> field minus the number of bytes already received */</span>
<span class="n">body</span> <span class="o">=</span> <span class="n">realloc</span> <span class="p">(</span><span class="n">body</span><span class="p">,</span> <span class="n">body_length</span><span class="p">);</span>
<span class="n">bytes</span> <span class="o">=</span> <span class="n">read</span> <span class="p">(</span><span class="n">socketfd</span><span class="p">,</span> <span class="n">body</span> <span class="o">+</span> <span class="n">strlen</span> <span class="p">(</span><span class="n">body</span><span class="p">),</span> <span class="n">body_length</span> <span class="o">-</span> <span class="n">strlen</span> <span class="p">(</span><span class="n">body</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
</td></tr></table></div>
<p>In <a class="reference external" href="TCPSockets.html#cl4-13">Code Listings 4.13</a>, <a class="reference external" href="TCPSockets.html#cl4-14">4.14</a>, and <a class="reference external" href="TCPSockets.html#cl4-15">4.15</a>, we have been
implicitly assuming the data was HTML for simplicity. The response header would declare this with
the <code class="docutils literal notranslate"><span class="pre">Content-Type</span></code> header. However, the same principle ideas would apply regardless of the type of
data requested. For instance, if the client issues a <code class="docutils literal notranslate"><span class="pre">GET</span></code> request for <code class="docutils literal notranslate"><span class="pre">/logo.png</span></code>, the response
would start with the same headers we have used, but the <code class="docutils literal notranslate"><span class="pre">Content-Type</span></code> would inform the client
that the body contains <code class="docutils literal notranslate"><span class="pre">image/png</span></code> instead of <code class="docutils literal notranslate"><span class="pre">text/html</span></code>. Given that images can contain the
null-byte, <a class="reference external" href="TCPSockets.html#cl4-13">Code Listing 4.13</a> and <a class="reference external" href="TCPSockets.html#cl4-15">Code Listing 4.15</a> would need to be
modified to avoid the use of <code class="docutils literal notranslate"><span class="pre">strlen()</span></code>. However, the code shown here could be adapted to support
binary data objects, such as images.</p>
</div>
<div class="section" id="persistent-state-with-cookies">
<h2>4.5.5. Persistent State with Cookies<a class="headerlink" href="TCPSockets.html#persistent-state-with-cookies" title="Permalink to this headline"></a></h2>
<p>Designing HTTP to be stateless worked well for its original purpose of sending and receiving
documents. As the uses of web pages and the Internet have evolved, however, the lack of state
information became a hindrance. Developers began to use HTTP as the foundation for web-based
applications that required state persistence. As an example, consider a web-based email serice. The
first request might allow a user to log into the system, then a second request would retrieve the
messages in the users inbox. Additional requests would retrieve pages for composing new messages
and sending them. Clearly, such an application design would benefit from storing persistent
information on the server about the user.</p>
<p>In modern web development, there are multiple options for data persistence. HTTP <a class="reference internal" href="Glossary.html#term-cookie"><span class="xref std std-term">cookies</span></a> are one of the oldest and most pervasive tools to accomplish this. A cookie is a short,
text-based key-value pair that gets sent as an HTTP header field (similar to the <code class="docutils literal notranslate"><span class="pre">Content-Length</span></code>
or <code class="docutils literal notranslate"><span class="pre">Connection</span></code> headers previously discussed). <a class="reference external" href="TCPSockets.html#cl4-16">Code Listing 4.16</a> demonstrates one way
to create a cookie in Javascript.</p>
<div class="highlight-javascript border border-dark rounded-lg bg-light px-0 mb-3 notranslate" id="cl4-16"><table class="highlighttable"><tr><td class="linenos px-0 mx-0"><div class="linenodiv"><pre class="mb-0">1
2
3
4
5
6
7
8
9</pre></div></td><td class="code"><div class="highlight bg-light"><pre class="mb-0"><span></span><span class="cm">/* Code Listing 4.16:</span>
<span class="cm"> Creating a new cookie in Javascript that expires in 15 minutes</span>
<span class="cm"> */</span>
<span class="kd">var</span> <span class="nx">date</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Date</span><span class="p">();</span>
<span class="nx">date</span><span class="p">.</span><span class="nx">setTime</span><span class="p">(</span><span class="nx">date</span><span class="p">.</span><span class="nx">getTime</span><span class="p">()</span> <span class="o">+</span> <span class="mi">15</span> <span class="o">*</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">);</span> <span class="c1">// add 15 minutes</span>
<span class="kd">var</span> <span class="nx">newCookie</span> <span class="o">=</span> <span class="s2">&quot;username=julian;expires=&quot;</span> <span class="o">+</span> <span class="nx">date</span><span class="p">.</span><span class="nx">toUTCString</span><span class="p">()</span> <span class="o">+</span>
<span class="s2">&quot;;path=/;samesite=strict;secure&quot;</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">cookie</span> <span class="o">=</span> <span class="nx">newCookie</span><span class="p">;</span>
</pre></div>
</td></tr></table></div>
<p>The Javascript code in <a class="reference external" href="TCPSockets.html#cl4-16">Code Listing 4.16</a> would run on the client side, causing the web browser to
create a new cookie as the key-value pair <code class="docutils literal notranslate"><span class="pre">username=julian</span></code>. The rest of lines 7 and 8 control how
the browser will use the cookie. The <code class="docutils literal notranslate"><span class="pre">expires</span></code> field indicates that the cookie is only valid for
the next 15 minutes; it will be deleted after that time has passed. The <code class="docutils literal notranslate"><span class="pre">path=/</span></code> indicates that
this cookie will be sent along with any requests for the current domain name, regardless of the web
sites directory structure. The <code class="docutils literal notranslate"><span class="pre">samesite=strict</span></code> prevents the cookie from being sent to
third-party web sites; that is, the cookie will not be sent to other domains, such as advertising
networks. Lastly, the <code class="docutils literal notranslate"><span class="pre">secure</span></code> restricts the cookie to transfer over HTTPS and will prevent its
transfer over standard HTTP. Line 9 adds the new cookie to the browsers <code class="docutils literal notranslate"><span class="pre">document.cookie</span></code> value,
which is the concatenated list of all cookies. (This line is unintuitive, as Javascript uses the
assignment operator for this purpose, but the line is actually <cite>appending</cite> the value to an existing
string.)</p>
<p>Once the browser executes the code in <a class="reference external" href="TCPSockets.html#cl4-16">Code Listing 4.16</a>, future HTTPS requests that
occur in the next 15 minutes will contain the HTTP header line <code class="docutils literal notranslate"><span class="pre">Cookie:</span> <span class="pre">username=julian\r\n</span></code>. Note
that the header line does not contain information about the expiration date or the other fields, as
these are only relevant to the browser. As an alternative, the server could also generate the
cookie. The primary differences are that the HTTP header line would include all of the fields:</p>
<div class="highlight-none border border-dark rounded-lg bg-light px-2 mb-3 notranslate"><div class="highlight bg-light"><pre class="mb-0"><span></span>Set-Cookie: username=julian;expires=Sat, 15 May 2021 16:00:00 GMT;
path=/;samesite=strict;secure
</pre></div>
</div>
<p>To add persistent state to the HTTP exchange, the server would use a database that mapped the cookie
<code class="docutils literal notranslate"><span class="pre">username=julian</span></code> to information about the user. As such, the server-side code would be able to
connect a new request to previous requests, creating the history needed for the application. One
common technique, even without a specific user login mechanism, is to use a <em>session cookie</em>,
such as <code class="docutils literal notranslate"><span class="pre">session=182735927341</span></code>. Session cookies persist until the web browser is closed, allowing
servers an easy way to link requests from the same web browser as likely to be related.</p>
<table class="docutils footnote" frame="void" id="f25" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="TCPSockets.html#id1">[1]</a></td><td>While <code class="docutils literal notranslate"><span class="pre">netcat</span></code> is useful for exploring protocols, it sends all data in unencrypted form
and cannot be used for any form of secure communication.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="f26" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="TCPSockets.html#id2">[2]</a></td><td>Using query strings with standard HTML files is pointless, as HTML is a static formatting
language and cannot respond to input. Other server-side technologies, such as Java servlets, PHP,
or Apache Server-Side Includes (SSI) are required to handle the query dynamically.</td></tr>
</tbody>
</table>
<div
id="AppTCPSumm"
class="embedContainer"
data-exer-name="AppTCPSumm"
data-long-name="TCP socket programming questions"
data-short-name="AppTCPSumm"
data-frame-src="../../../Exercises/Sockets/AppTCPSumm.html?selfLoggingEnabled=false&amp;localMode=true&amp;module=TCPSockets&amp;JXOP-debug=true&amp;JOP-lang=en&amp;JXOP-code=java"
data-frame-width="950"
data-frame-height="550"
data-external="false"
data-points="1.0"
data-required="True"
data-showhide="show"
data-threshold="5"
data-type="ka"
data-exer-id="">
<div class="center">
<div id="AppTCPSumm_iframe"></div>
</div>
</div>
</div>
</div>
</div>
<div class="container">
<div class="mt-4 container center">
«&#160;&#160;<a id="prevmod1" href="Sockets.html">4.4. The Socket Interface</a>
&#160;&#160;::&#160;&#160;
<a class="uplink" href="index.html">Contents</a>
&#160;&#160;::&#160;&#160;
<a id="nextmod1" href="UDPSockets.html">4.6. UDP Socket Programming: DNS</a>&#160;&#160;»
</div>
</div>
<br />
<div class="row jmu-dark-purple-bg">
<div class="col-md-12">
<center>
<a id="contact_us" class="btn button-link-no-blue jmu-gold" rel="nofollow" href="mailto:webmaster@opencsf.org" role="button">Contact Us</a>
<a id="license" class="btn button-link-no-blue jmu-gold" rel="nofollow" href="https://w3.cs.jmu.edu/kirkpams/OpenCSF/lib/license.html" target="_blank">License</a>
</center>
</div>
</div>
<script src="_static/js/popper.js-1.14.7-min.js" integrity="sha384-UO2eT0CpHqdSJQ6hJty5KVphtPhzWj9WO1clHTMGa3JDZwrnQq4sF86dIHNDz0W1" crossorigin="anonymous"></script>
<script src="_static/js/bootstrap.min.js" integrity="sha384-JjSmVgyd0p3pXB1rRibZUAYoIIy6OrQ6VrjIEaFf/nJGzIxFDsf4x0xIM+B07jRM" crossorigin="anonymous"></script>
</body>
</html>