WWW
The World
Wide Web (WWW) is a repository of information linked together from points all
over the world. The WWW has a unique combination of flexibility, portability,
and user-friendly features that distinguish it from other services provided by
the Internet. The WWW project was initiated by CERN (European Laboratory for
Particle Physics) to create a system to handle distributed resources necessary
for scientific research.
1. Architecture
The WWW
today is a distributed client/server service, in which a client using a browser
can access a service using a server. However, the service provided is
distributed over many locations called sites.
Each site
holds one or more documents, referred to as Web pages. Each Web page can
contain a link to other pages in the same site or at other sites. The request,
among other information, includes the address of the site and the Web page,
called the URL, which we will discuss shortly. The server at site A finds the
document and sends it to the client. When the user views the document, she
finds some references to other documents, including a Web page at site B. The
reference has the URL for the new site. The user is also interested in seeing
this document. The client sends another request to the new site, and the new
page is retrieved.
Client (Browser)
A variety
of vendors offer commercial browsers that interpret and display a Web document,
and all use nearly the same architecture. Each browser usually consists of
three parts: a controller, client protocol, and interpreters. The controller
receives input from the keyboard or the mouse and uses the client programs to
access the document. After the document has been accessed, the controller uses
one of the interpreters to display the document on the screen. The client
protocol can be one of the protocols such as FTP or HTTP.
Server
The Web
page is stored at the server. Each time a client request arrives, the
corresponding document is sent to the client. To improve efficiency, servers
normally store requested files in a cache in memory; memory is faster to access
than disk. A server can also become more efficient through multithreading or
multiprocessing. In this case, a server can answer more than one request at a
time.
Uniform Resource Locator
A client
that wants to access a Web page needs the address. To facilitate the access of
documents distributed throughout the world, HTTP uses locators. The uniform
resource locator (URL) is a standard for specifying any kind of information on
the Internet. The URL defines four things: protocol, host computer, port, and
path.
The
protocol is the client/server program used to retrieve the document. Many
different protocols can retrieve a document; among them are FTP or HTTP. The
most common today is HTTP.
The host
is the computer on which the information is located, although the name of the
computer can be an alias. Web pages are usually stored in computers, and
computers are given alias names that usually begin with the characters
"www". This is not mandatory, however, as the host can be any name
given to the computer that hosts the Web page. The URL can optionally contain
the port number of the server. If the port is included, it is inserted between
the host and the path, and it is separated from the host by a colon.
Path is
the pathname of the file where the information is located. Note that the path
can itself contain slashes that, in the UNIX operating system, separate the
directories from the subdirectories and files.
2. WEB DOCUMENTS
The
documents in the WWW can be grouped into three broad categories: static,
dynamic, and active. The category is based on the time at which the contents of
the document are determined.
1. Static Documents
Static
documents are fixed-content documents that are created and stored in a server.
The client can get only a copy of the document. In other words, the contents of
the file are determined when the file is created, not when it is used. Of
course, the contents in the server can be changed, but the user cannot change
them. When a client accesses the document, a copy of the document is sent. The
user can then use a browsing program to display the document.
1.1 HTML
Hypertext
Markup Language (HTML) is a language for creating Web pages. The term markup
language comes from the book publishing industry. Before a book is typeset and
printed, a copy editor reads the manuscript and puts marks on it. These marks
tell the compositor how to format the text. For example, if the copy editor
wants part of a line to be printed in boldface, he or she draws a wavy line
under that part. In the same way, data for a Web page are formatted for interpretation
by a browser
The two
tags <B> and </B> are instructions for the browser. When the
browser sees these two marks, it knows that the text must be boldfaced. A
markup language such as HTML allows us to embed formatting instructions in the
file itself. The instructions are included with the text. In this way, any
browser can read the instructions and format the text according to the specific
workstation.
A Web
page is made up of two parts: the head and the body. The head is the first part
of a Web page. The head contains the title of the page and other parameters
that the browser will use. The actual contents of a page are in the body, which
includes the text and the tags. Whereas the text is the actual information
contained in a page, the tags define the appearance of the document. Every HTML
tag is a name followed by an optional list of attributes, all enclosed between
less-than and greater-than symbols (< and >). An attribute, if present,
is followed by an equal’s sign and the value of the attribute. Some tags can be
used alone; others must be used in pairs. Those that are used in pairs are
called beginning and ending tags. The beginning tag can have attributes and
values and starts with the name of the tag. The ending tag cannot have
attributes or values but must have a slash before the name of the tag.
2.2 Dynamic Documents
A dynamic
document is created
by a Web server
whenever a browser
requests the
document.
When a request arrives, the Web server runs an application program or a script
that creates the dynamic document. The server returns the output of the program
or script as a response to the browser that requested the document. Because a
fresh document is created for each request, the contents of a dynamic document
can vary from one request to another.
2.2.1 Common Gateway Interface (CGI)
The
Common Gateway Interface (CGI) is a technology that creates and handles dynamic
documents. CGI is a set of standards that defines how a dynamic document is
written, how data are input to the program, and how the output result is used.
The term common in CO1 indicates that the standard defines a set of rules that
is common to any language or platform. The term gateway here means that a COl
program can be used to access other resources such as databases, graphical
packages, and so on. The term interface here means that there is a set of
predefined terms, variables, calls, and so on that can be used in any COl
program.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.