In this section, we discuss what occurs behind the scenes when a user requests a web page in a browser. The HTTP protocol allows clients and servers to interact and exchange in-formation in a uniform and reliable manner.
In its simplest form, a web page is nothing more than an XHTML document that describes to a web browser how to display and format the document’s information. XHTML documents normally contain hyperlinks that link to different pages or to other parts of the same page. When the user clicks a hyperlink, the requested web page loads into the user’s web browser. Similarly, the user can type the address of a page into the browser’s address field.
HTTP uses URIs (Uniform Resource Identifiers) to identify data on the Internet. URIs that specify document locations are called URLs (Uniform Resource Locators). Common URLs refer to files, directories or objects that perform complex tasks, such as database lookups and Internet searches. If you know the URL of a publicly available resource or file anywhere on the web, you can access it through HTTP.
Parts of a URL
A URL contains information that directs a browser to the resource that the user wishes to access. Computers that run web server software make such resources available. Let’s exam-ine the components of the URL
The http:// indicates that the resource is to be obtained using the HTTP protocol. The middle portion, www.deitel.com, is the server’s fully qualified hostname—the name of the server on which the resource resides. This computer usually is referred to as the host, because it houses and maintains resources. The hostname www.deitel.com is translated into an IP address—a unique numerical value that identifies the server much as a tele-phone number uniquely defines a particular phone line. More information on IP addresses is available at en.wikipedia.org/wiki/IP_address. This translation is performed by a domain name system (DNS) server—a computer that maintains a database of hostnames and their corresponding IP addresses—and the process is called a DNS lookup.
The remainder of the URL (i.e., /books/downloads.html) specifies both the name of the requested resource (the XHTML document downloads.html) and its path, or location (/books), on the web server. The path could specify the location of an actual directory on the web server’s file system. For security reasons, however, the path normally specifies the location of a virtual directory. The server translates the virtual directory into a real loca-tion on the server (or on another computer on the server’s network), thus hiding the true location of the resource. Some resources are created dynamically using other information stored on the server computer, such as a database. The hostname in the URL for such a resource specifies the correct server; the path and resource information identify the resource with which to interact to respond to the client’s request.
Making a Request and Receiving a Response
When given a URL, a web browser performs a simple HTTP transaction to retrieve and display the web page found at that address. Figure 21.1 illustrates the transaction, showing the interaction between the web browser (the client side) and the web server application (the server side).
In Fig. 21.1, the web browser sends an HTTP request to the server. The request (in its simplest form) is
GET /books/downloads.html HTTP/1.1
The word GET is an HTTP method indicating that the client wishes to obtain a resource from the server. The remainder of the request provides the path name of the resource (e.g., an XHTML document) and the protocol’s name and version number (HTTP/1.1). The cli-ent’s request also contains some required and optional headers.
Any server that understands HTTP (version 1.1) can translate this request and respond appropriately. Figure 21.2 depicts the server responding to a request. The server first responds by sending a line of text that indicates the HTTP version, followed by a numeric code and a phrase describing the status of the transaction. For example,
HTTP/1.1 200 OK
indicates success, whereas
HTTP/1.1 404 Not found
informs the client that the web server could not locate the requested resource. A complete list of numeric codes indicating the status of an HTTP transaction can be found at www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
The server then sends one or more HTTP headers, which provide additional information about the data that will be sent. In this case, the server is sending an XHTML text document, so one HTTP header for this example would read:
The information provided in this header specifies the Multipurpose Internet Mail Exten-sions (MIME) type of the content that the server is transmitting to the browser. MIME is an Internet standard that specifies data formats so that programs can interpret data cor-rectly. For example, the MIME type text/plain indicates that the sent information is text that can be displayed directly, without any interpretation of the content as XHTML mark- up. Similarly, the MIME type image/jpeg indicates that the content is a JPEG image. When the browser receives this MIME type, it attempts to display the image.
The header or set of headers is followed by a blank line, which indicates to the client browser that the server is finished sending HTTP headers. The server then sends the con-tents of the requested XHTML document (downloads.html). The client-side browser parses the XHTML markup it receives and renders (or displays) the results. The server nor-mally keeps the connection open to process other requests from the client.
HTTP get and post Requests
The two most common HTTP request types (also known as request methods) are get and post. A get request typically gets (or retrieves) information from a server. Common uses of get requests are to retrieve an XHTML document or an image, or to fetch search results based on a user-submitted search term. A post request typically posts (or sends) data to a server. Common uses of post requests are to send form data or documents to a server.
An HTTP request often posts data to a server-side form handler that processes the data. For example, when a user performs a search or participates in a web-based survey, the web server receives the information specified in the XHTML form as part of the request. Get requests and post requests can both be used to send form data to a web server, yet each request type sends the information differently.
A get request sends information to the server as part of the URL, e.g., www.google.com/search?q=deitel. In this case search is the name of Google’s server side form handler, q is the name of a variable in Google’s search form and deitel is the search term. Notice the ? in the preceding URL. A ? separates the query string from the rest of the URL in a request. A name/value pair is passed to the server with the name and the value separated by an equals sign (=). If more than one name/value pair is submitted, each pair is separated by an ampersand (&). The server uses data passed in a query string to retrieve an appropriate resource from the server. The server then sends a response to the client. A get request may be initiated by submitting an XHTML form whose method attribute is set to "get", or by typing the URL (possibly containing a query string) directly into the browser’s address bar (See Chapter 2 for more information on how various search engines operate and Chapter 4 for an in-depth discussion of XHTML forms.)
A post request is specified in an XHTML form by the method "post". The post method sends form data as part of the HTTP message, not as part of the URL. A get request typically limits the query string (i.e., everything to the right of the ?) to a specific number of characters (2083 in IE; more in other browsers), so it is often necessary to send large pieces of information using the post method. The post method is also sometimes preferred because it hides the submitted data from the user by embedding it in an HTTP message. If a form submits several hidden input values along with user-submitted data, the post method might generate a URL like www.searchengine.com/search. The form data still reaches the server and is processed in a similar fashion to a get request, but the user does not see the exact information sent.
Browsers often cache (save on disk) web pages for quick reloading. If there are no changes between the version stored in the cache and the current version on the web, this speeds up your browsing experience. An HTTP response can indicate the length of time for which the content remains “fresh.” If this amount of time has not been reached, the browser can avoid another request to the server. If not, the browser loads the document from the cache. Thus, the browser minimizes the amount of data that must be downloaded for you to view a web page. Browsers typically do not cache the server’s response to a post request, because the next post might not return the same result. For example, in a survey, many users could visit the same web page and answer to a question. The survey results could then be dis-played for the user. Each new answer changes the overall results of the survey.
When you use a web-based search engine, the browser normally supplies the informa-tion you specify in an HTML form to the search engine with a get request. The search engine performs the search, then returns the results to you as a web page. Such pages are sometimes cached by the browser in case you perform the same search again.
Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.