HTTP, or the Hyptertext Transfer Protocol, is defined as an application based set of rules, or protocol, which is used to transfer files. These transfers can range from text, graphics, image, audio, multimedia, or video files and are sent via the World Wide Web. The HTTP definition provides the rules or guidelines for how the files and associated messages are defined and transmitted over networks. Additionally, the HTTP rule set defines how web browsers and web servers should act or behave in response to the commands received. The IETF and the World Wide Web Consortium are responsible for developing the international standards that mandate HTTP behavior and web browsing that consumers enjoy today.
How Does HTTP Work?
The HTTP specification includes eight techniques or commands to order or specify the desired action to be conducted by the targeted resource. The commands are considered or labeled as “requests” and include:
GET: Passes a request for a depiction of the targeted resource.
POST: Sends information requiring processing to the targeted resource.
HEAD: Requests a similar response to a “GET” command; however, does not include the response body of information.
DELETE: Removes or deletes the designated resource.
TRACE: Helps track or the request once received in order for the client to determine which intermediate servers are involved in the operation.
PUT: Uploads a depiction of the annotated resource.
CONNECT: Changes the request connection to a HTTP (or rather, TCP/IP) tunnel.
OPTIONS: Retrieves the HTTP techniques supported by the server for the given Uniform Resource Locator (URL).
How Does the Internet Work?
To better understand the role of HTTP on the World Wide Web, it helps for the reader to better understand how the various components that make up the Internet work both individually and together to create the online world.
Website Names and Locations
Once a web browser is opened on a computer, you normally enter a URL of a website or search engine. In order to load this page into the browser, the application must establish a network connection to the computer host or server where the web page or web document resides. The first step is to locate the network location of the website. To determine what the IP (Internet Protocol) address of the site is, the browser makes use of a name server. The name server is normally located at your ISP which will provide a primary and secondary nameserver that will be configured in the Internet access setup for your home or work computer. If you work at a University or large corporation, the name servers may reside on campus or at work.
The domain name servers on all networks constantly communicate with each other. They exchange the information required to resolve website addresses or hostnames (i.e. the IP address to name mappings). When posed with a new website not located in the name server’s cache, the server may query multiple other name servers to figure out or resolve the hostname being requested. Once it has located the website’s IP address, it will allow one’s computer to start exchanging information, or HTTP commands with the remote host.
How Does the Domain Name System Work?
The network of databases that drives the translations of all web host names to IP addresses on the Internet is the Domain Name System, or DNS. When one reads or sees a reference to a DNS server, the term can frequently be shortened to just being called a “nameserver.”
To understand how hostnames are translated, its first necessary to review how an Internet hostname is put together. These names are made up of two or more components separated by dots. A domain refers to a collection of computer that share a common suffix. Under current rules, there can be one domain that resides within another. For example, www.Tech-FAQ.com resides in the Tech-FAQ.org subdomain of the .com domain.
Every domain on the Internet is further defined by a primary or authoritative name server which is responsible for resolving the IP address of all computer hosts located in the domain. The primary DNS will commonly have a secondary name server in the event of an outage with the primary server. The secondary servers are typically backed-up with information from the primary domain name server every 2-4 hours.
The tricky part about understanding nameservers, is that any given nameserver does not need to know the locations of all computer hosts on other domains. The server is only required to have a knowledge of additional nameservers that can be queried to resolve domain name conflicts. Eventually, if a domain is not found or recognized, the name servers queries will reach the root server for the applicable TLD suchas .com, .org, or .net to resolve a name. Domain Name Servers are designed; however, to ensure that each server can efficiently operate with the minimum amount information possible.
When a request or query for an IP address is made, the primary nameserver will first check to see if the name and address are stored locally. If they are not, then the nameserver will request information from a root server for the requisite information. If the root server has the translation, the name server will cache the response, and then use that information to complete the domain name request operation.
The majority of the time, a computer’s nameserver does not have to conduct that much work. The current software architecture of today’s nameservers leverage a significant amount of caching of past domain name lookups. This simple action safes a large percentage of resources and provides an improved quality of service to the requesting computer hosts. In the event a host is unreachable from a cached IP address, the entry will normally be removed from the cache to ensure the end-user has the most up-to-date information possible.
Network Packets and Routers
When a web browser sends an HTTP command to a web server, it will first translate the command into a network packet or datagram. The packet is basically block of bits that is wrapped into a source IP address (normally the computer host surfing the web), a destination IP address, and a service port number (normally port 80 for web surfing). The computer host conducting the web surfing will then transmit the network packet to the Internet via the ISP (Internet Service Provider) or local network until it reaches the first network router. The router will have a mapping of Internet destinations and will send the data packet to the closest one on the routing listing.
A data packet may travel to one or many routers en-route to the ultimate destination. Today’s routers are able to assess how long it will take neighboring routers to acknowledge receiving a data packet as well as detect which links may be faster than others. They are also able to assess if a router has dropped offline and send the network traffic to an alternate router if required. Since there are an extremely large number of network routers on the Internet as compared to phone switches, failures are able to be routed around (or worked around) fairly easily.
Once a data packet arrives at a destination computer or server, the packet is passed to the web server. The web server will assess if it is required to respond to the data packet based on the source IP address of the datagram. When the web server returns the document to the client, it will break the information into a number of data packets. The size of the packets will vary based on the transmission media on the network and the type of service being used on the web server.
The Role of TCP/IP
In order to understand how the Internet and web browsing handle multiple-packet transmissions when surfing websites, a short review of how the TCP and IP network protocols work is required. IP (Internet Protocol) is considered to be a lower level protocol on the network stack. IP has the responsibility for adding the source and destination addresses to packets being sent over a network or the Internet. The IP addresses work similar to snail mail addresses in that they are interpreted by routers to assess how to best route the information.
The Transmission Control Protocol (TCP) provides reliability to Internet communications. When two computers (in the case of web browsing, a client and server), negotiate a TCP connection, the receiving computer knows that acknowledgements for packets received must be transmitted back to the sending computer. If the sending computer does not receive an ack within a predefined timeframe, it will resend the data packet to the sender. Additionally, the sending the computer will provide a sequence number in each packet to make sure they are reassembled in the proper order.
Each TCP/IP data packet will also include a checksum to make sure the data included in the packet has not be corrupted. These checksums are calculated in a manner that if either the packet or checksum are corrupted that an error will be indicated. As a result, the combined use of TCP/IP with Domain Name Servers has proved to be a reliable method to support the World Wide Web as experienced to date.
The Application Protocol of the Internet – HTTP
The primary application protocol that web browsers located on client computers and web servers use for communication on top of the TCP/IP protocols is HTTP (Hyper-Text Transfer Protocol). One of the primary commands used by HTTP is the “GET” command over port 80 on the web server. When a “GET” command is received, there will be a server daemon that is listening on the port waiting for incoming HTTP commands to pass off for execution. Since the HTTP protocol is designed to be as simple and human readable / accessible as possible, the GET and other related protocol commands can quickly be acted upon by web servers.
HTTP Status Codes
The following are the HTTP status codes that can be sent in response to HTTP requests. HTTP allows the data portions of messages forward, redirection, and error responses to include diagnostic information that is “human-readable.”
Success 2xx Code
All of the Success 2xx codes indicate some form of positive or successful transaction taking place. The message can include an object in MIME format.
OK 200 Code
The desired HTTP request was completed.
Created 2012 Code
This code follows a POST command and indicates success. The text portion of the response line in the message will include a URI the newly formed or created document will likely require knowledge of.
Indicates that the HTTP request has been accepted for processing, but this action is not yet complete.
Partial Information 203
When this status code is included in the GET command response, it indicates that the meta information returned is not a definitive set from a server with the desired object. Instead, it originates from a private network. It may also include information about the included object.
No Response 204 Code
This response indicates that the web server has received the request, but does not have information to send back. The client should remain on the same document view in this case to allow input for scripts without changing the document at the same time.
Error 4XX, 5XX Codes
4XX HTTP codes are used in cases where the client appears to have made an error. The 5XX error code series is reserved for cases that the server has likely made an error. The body text of both error codes may include a description to indicate additional information behind the error in MIME format.
Bad Request 400 Code
The HTTP request has bad syntax or was impossible to fulfill.
Unauthorized 401 Code
The 401 status code includes a parameter that provides a specification of what authorization codes are acceptable. The client computer should provide the request with the appropriate authorization header.
PaymentRequired 402 Code
The parameter to the PaymentRequired 402 code provides a specification of a charging scheme acceptable. The client computer can retry the HTTP request with the appropriate ChargeTo header.
Forbidden 403 Code
The client requested a forbidden resource. Authorization will not help the client.
Not found 404 Code
The server has not found a valid document that matches the URI provided by the client computer.
Internal Error 500 Code
The web server has encountered a condition that was not expected and precludes the request from being fulfilled.
Not Implemented 501 Code
The web server does not support the facility being requested.
Service Temporarily Overloaded 502 Code
The web server is not able to process the HTTP request due to a heavy traffic load. Indicates a temporary condition which may be corrected at other times.
Gateway Timeout 503 Code
The Gateway Timeout 503 code is equivalent to the Internal Error 500, but is used in the case where a server is accessing another service that did not return in sufficient time.
Redirection 3XX Codes
There are several Redirection HTTP codes that are used on the Internet. These codes typically indicate automatic action to be taken by the client computer in order to fulfill the HTTP request.
Moved 301 Code
The information requested has been moved to or assigned a new URI and the change is permanent.
Found 302 Code
The information requested is located under a different URL. The redirection may be changed in the future.
Method 303 Code
This code suggests the client computer try another network address to locate the desired resource. A different HTTP method than the “GET” call may be used. The body section of the message will include the parameters for use with the method. This will permit a document to be a pointer to a complex query operation.
Not Modified 304 Code
If the client computer has sent a conditional GET response with access permitted, but the document has not been modified since the date/time pair included in the “If-Modified-Since” field of the message, the web server will respond with a 304 status code. The document body will not be transmitted to the client in this case. This feature is designed to permit efficient updates of local caching information and avoid the overhead of sending a number of HTTP requests.