Much of what’s to follow is simply me trying to sum up what I’ve learned in an attempt to internalize and understand this information. There’s a lot of detail in how packets are transmitted, and just wrapping my head around it took a good deal of reading, thinking, and processing.
Anatomy of an HTTP Request
Opening up Wireshark is initially intimidating. So many protocols, so little time. First thing’s first, a list of all the protocols I came across:
- DB-LSP-DISC (Dropbox LAN sync discovery protocol)
Too much to look at, surely. Spending some time researching all these, the picture started to become clear, and it required digging into the OSI Model to understand how these protocols interact.
Pulling the list of protocols by layer from Wikipedia:
Layer Number | Layer Name | TCP/IP Protocols 7 | Application | NNTP SIP SSI DNS FTP Gopher HTTP NFS NTP DHCP SMPP SMTP SNMP Telnet BGP FCIP 6 | Presentation | MIME SSL TLS XDR 5 | Session | Sockets (tcp / rtp / pptp) 4 | Transport | TCP UDP SCTP DCCP 3 | Network | IP IPsec ICMP IGMP OSPF RIP 2 | Data Link | PPP SBTV SLIP 1 | Physical |
Ignoring a lot of these for now, what’s interesting to look at is how an HTTP request makes use of a few other protocols in the process. When an HTTP request is sent from the browser, first a DNS request is sent. This happens on the application layer. Once a the DNS is resolved to an IP, a TCP connection is established between the client and host, then the HTTP request is sent as packets using TCP to ensure proper delivery, checking and sequencing the packets as necessary. Where I got a little confused was the difference between an HTTP request and TCP connections. After some more digging, and looking at the OSI Layers, it became clear that TCP is a generic protocol used for transfering data from one computer to another. It doesn’t care what the data is. It just knows there’s some stuff, and it has rigorous means of verifying that the stuff was transfered in it’s entirety. But HTTP requests carry very specific information, like a user submitting their username and password as a POST request. This is when ports all of a sudden made sense to me. The server get’s a dump of packets via TCP (containing HTTP information), but how does it know what to do with that data? What application should process the data contained in the TCP packets? By directing packets to a specific port (i.e. port 80 for HTTP), the server can now route the data to whatever program is used to handle that data. In HTTP’s case, it’s often nginx or apache.
So what we have for an HTTP request is the use of three (and often more, for secure requests) protocols. Two in the application layer (DNS and HTTP), and one in the transport layer (TCP).
Using Wireshark, I tracked an HTTP request to my website, jcharry.com.
From the looks of it, DNS sends it’s request using UDP, ensuring the transfer is as quick as possible. I’m guessing since DNS requests are so simple, there’s little risk of using UDP, as the data probably fits in one packet, thus doesn’t require the complexity of TCP.
Wireshark shows the DNS request going from my computer to my router, which I guess takes it over from there, but I’m not sure how the packet gets routed from my router to the DNS server. But anyway, the DNS response comes back with the IP of my website – 188.8.131.52. Immediately following is a series of TCP packets between my computer and the server – as seen in the following screen shot.
TCP has a handshake procedure that sends messages called SYN and ACK, and we can see those handshakes happening and some relevant options for the transfer.
Wireshark keeps track of the the sequence of, acks, and other info about the transfer. In the middle of that screenshot, we see a QUIC protocol being used at a totally separate IP. From the DNS transfer earlier, we saw a request for a google font server, which resolved to 184.108.40.206. QUIC stands for Quick UDP Internet Connections, and, looking it up, shows that it’s a very new protocol invented in 2012 at google as a way to add SSL to UDP, in an attempt to create a secure, faster alternative to TCP. Pretty cool!
Browser acting on it’s own?
Without reloading any pages, or doing anything at all, I kept noticing HTTP requests coming from my machine.
Looks like it’s going to techrepublic.com, a tab I had open at the time. So either my browser established a socket connection to techrepublic’s server, or is sending out timed requests to update the page. Either way, it’s happening without me doing anything.
Some other things
I saw this line:
188 31.333464 Apple_38:08:d3 Broadcast ARP 42 Who has 192.168.1.2? Tell 192.168.1.7
And was confused. I found out that ARP is the Address Resolution Protocol used by local networks to associate MAC address and IP’s. In a dynamically allocated network where the IP’s are handed out, it makes sense that there would be some circumstances where IP’s need verification, but who’s (or what’s) responsible for controlling this? My router? My ISP? Does my ISP care about locally assigned addresses? And why is the ARP protocol telling another device on the network the IP of the first device? I looked up those IP’s and 192.168.1.2 was an iPhone, and 192.168.1.7 was a computer. I’m still unclear on exactly the purpose of this, but I’ll keep digging.