How
We Got Here
The development of XML was
not an epiphany that came to a lone inventor working in isolation, nor was it
conceived of as part of a corporation’s product-development efforts. Rather,
XML is an evolution of data formats that existed previously but solved problems
of different sorts. In understanding XML, one needs to understand these formats
and how their limitations prevented their widespread adoption.
Standard
Generalized Markup Language (SGML)
With its roots originating
all the way back in 1969 and its standardization by the ISO in 1986, SGML is
really the forefather of all markup languages. It introduced the notion that
data processing and document processing could be one and the same thing—but
we’re getting ahead of ourselves here.
Note: SGML is formally
standardized as ISO specification 8879:1986. You can obtain more information
from the World Wide Web Consortium (W3C) Web site at http://www.w3.org/MarkUp/SGML/.
Computers have long been used
for document and text processing. In the early days, computers were used to
assist in document preparation and typesetting. They allowed copy creators and
editors to quickly prototype how a specific document would look prior to its
printing on a traditional printing press. As computing progressed, so did its
applica-tion in the document-preparation industry. The advent of word
processors necessitated the invention of a means to indicate how the content
was to be modified for printing. Because software applications at the time were
text based with no graphical capabilities to speak of, the text contained in
the documents were “marked up” using textual com-mands that were later
processed by the final printing destination. These so-called markups surrounded the text and
explained how it was to be handled for printing. This included notations for boldface, underline, font sizing,
placement, and other such com-mands. Word processors didn’t invent markup as a
concept—markup is common in docu-ment creation and editing. Editors have used
markup for decades, if not centuries, to indicate their revisions and changes
to text. Word processors merely implemented a way by which markup could be
encoded in a computer-based system.
The number and type of such
markups proliferated with the number of word processing formats. Markup
languages such as troff, rich-text format (RTF), and LaTeX were cre-ated to
meet these needs. An example of LaTeX can be found in Listing 1.1 (This code is
from a Web site on LaTeX at http://www.oxy.edu/~jquinn/home/Math400/LaTeX/ thesis-example-latexcode.html). Finally, the development
of graphical WYSIWYG (What You See Is What You Get) systems eliminated the need for
textual markup of documents to indicate their final presentation format.
However, the legacy of markup lives on.
LISTING 1.1 LaTeX Example
\documentclass[11pt]{article}
\setlength{\textwidth}{6in}
\setlength{\textheight}{9in}
\setlength{\oddsidemargin}{0.2in}
\setlength{\evensidemargin}{0.2in}
\setlength{\topmargin}{-.6in} \begin{document}
\newtheorem{lemma}{Lemma}[section]
LISTING 1.1 continued
\newtheorem{theorem}[lemma]{Theorem}
\newtheorem{corollary}[lemma]{Corollary}
\newtheorem{conjecture}[lemma]{Conjecture}
\newtheorem{proposition}[lemma]{Proposition}
\newtheorem{definition}[lemma]{Definition}
\def\square{{\Box}}
\title{Title For
a Sample Comprehensive
Paper}
\author{ Your
Name Here \\Department of
Mathematics \\Occidental College
\\ \\
{\it Submitted in partial
fulfillment of the requirements for the degree}\\ {\sc Bachelor of Arts}}
\maketitle
\begin{abstract}
Every paper needs to begin
with an abstract. This is a brief overview of the entire paper. It should be
independent of the body of the paper (i.e. no referencing things to come). If
you feel a definition is needed to make the ideas here clear, then by all means
include it. A lazy reader should be able to get the entire gist of your work by
reading the abstract to be able to determine if it is worth reading more of the
paper.
\end{abstract}
\section{Introduction}
The introduction serves to
acquaint the reader with your topic and place it in a greater perspective.
Notation and definitions which are used throughout the work should be presented
here. You may find yourself repeating the ideas in
the abstract —- that’s okay.
They should be more fleshed out in the introduction.
\section{Main Body}
SGML built upon this markup
history by providing a common format for defining and exchanging markups
between systems that may not share the same markup language inherently. In
1969, IBM sought to simplify the tasks of creating, archiving, searching, and
managing legal documents. Charles Goldfarb headed up this task of creating the
system and defining a format to meet these needs. In the process of doing so,
Goldfarb, along with his coworkers Ed Mosher and Ray Lorie, realized that IBM’s
multiple sys-tems stored their information in different formats. Producing an
application and data format that would cross these systems and produce a
unified result would mean that a standard format would have to be created. The
solution to this set of problems took the form of the Generalized Markup
Language (GML), the initials of which are also the cre-ators’ initials. GML was
designed to provide a standard means for marking up content that could then be
archived, managed, and searched. See Listing 1.2 for an example of an SGML
document.
LISTING 1.2 SGML Example
<!DOCTYPE book [
<!ELEMENT book O O ((title
& subject & author & ISBN?), body)> <!ELEMENT body – O
(bodylines+)>
<!ELEMENT bodylines
O O (#PCDATA)>
<!ELEMENT (title, subject,
author, ISBN) – O (#PCDATA)> ]>
<title>Little Miss
Muffet</title>
<subject>Children’s
fairy tale</subject>
<author>Mother
Goose</author>
<body>
<bodylines>Little Miss
Muffet</bodylines>
<bodylines>Sat on her
tuffet</bodylines>
</body>
SGML also introduced the
notion of a generalized document format. Rather than having proprietary, custom
markup languages that could not be exchanged between systems, a common means
for markup definition was defined. Systems that complied with the SGML
specification could communicate with each other, even if competing vendors cre-ated
them. SGML also brought forth the idea that documents can have custom types
that indicate the nature and purpose of the information contained within.
Rather than specify-ing a single, monolithic specification that was to be used
across all industries, SGML conceived that individual industries would be
concerned specifically with the way they represent information. Each of these
industries would be able to maintain a Document Type Definition (DTD) for
itself and thus be able to exchange documents in an even more specific,
standardized manner.
All these features in SGML
have transformed the simple document into a representation of text content and
its associated data. SGML proved, at a very early age, that document processing
and data processing could be one and the same. This idea would be carried
forward in the development of its subsequent successor formats: XML and HTML.
However, as SGML development
progressed, it became increasingly more overweight and complicated. Both the
creation and parsing of SGML documents were difficult and complex, and the
various “optional” features of SGML started to bog down its ability to become
widely adopted. By necessity, the SGML specification was pulled and influenced
by many conflicting industry groups, each of which wanted to make sure the
language was able to meet their needs. As a result, the creation of a simple,
generic parser for the language was a difficult proposition, at best.
However, the legacy of SGML
continued to live on, not only in the number of documents created in the
language, but in subsequent formats that borrowed heavily from its creative
direction while attempting to side-step some of its complexities.
Hypertext
Markup Language (HTML)
SGML could have continued its
steady growth as the only generalized markup language in use if it weren’t for
the sudden emergence of the Web and its own format for data exchange—the
Hypertext Markup Language (HTML).
Although the Internet has
been around since the late 1960s, it was the development of the Web that truly
brought the Internet into its current prominence and widespread usage. The Web
finally put a visual, interactive, and easy-to-use front end on a network
system that had formerly been dominated by applications such as Telnet, FTP,
and Gopher. The Web provided users a means to easily create repositories of
knowledge that could be linked with one another as well as contain graphical
images and well-formatted layouts. What’s more, the Web was based, in part, on
SGML.
In 1989, a physics researcher
at the CERN European Nuclear Research Facility named Tim Berners-Lee proposed
that information collected and produced by the facility could be shared in a
more interactive and visual manner. Berners-Lee took a peek at what SGML had to
offer on this subject, and upon further exploration, he realized that he could
create a simple DTD based on SGML that would allow users to create simple
hypertext-linked documents. He named this DTD and subsequent development the
Hypertext Markup Language (HTML), a sample of which can be seen at Listing 1.3.
LISTING 1.3 HTML Example
<HTML>
<HEAD>
<TITLE>This an HTML
Hello World!</TITLE>
</HEAD>
<BODY>
<H1>Hello World!</H1>
<FONT SIZE=”2”>Using a
Font Tag, with <B>Boldface</B> and
<I>Italics</I></FONT>
</BODY>
</HTML>
However, HTML is nothing like
SGML when it comes to the strictness and complexity of the language. HTML was
developed relatively quickly and was meant to solve a fairly simple job. It was
created with simple developers in mind; therefore, “sloppiness” was allowed to
thrive. In fact, this sloppiness may be the very reason why the Web exists in
the first place. Because it was so easy to create HTML documents and browsers,
the format flourished in the vacuum of the Internet. Users simply were craving
a document format that could express their ideas in a visual, linked manner.
HTML met this need.
Because it borrows much of
its functionality from SGML, HTML provides many similar features: the use of
angle-bracketed elements and attributes as well as a structure defined by a DTD
that was independent of display mechanisms. Of course, this latter part became
increasingly fuzzy as the various Internet browser vendors started to battle
over control of the market. In particular, Microsoft and Netscape sought to add
their own pro-prietary elements to the HTML language that would be
understandable only by their respective browser platforms. Of course, this
violated the basic tenets of SGML in that the markup language should be
standardized and generalized.
In addition, HTML solved only
one part of the SGML realm of problems—namely the presentational and layout
aspects. HTML was aimed squarely at representing information for display on a
browser or other display devices such as cell phones and handheld devices. The
language was never intended as a means for storing data and metadata (information that describes
data) or for providing a framework for users to exchange data in a structured
manner. HTML had separated the notions of data processing from docu-ment
processing.
It soon became clear that
once again a need for a language such as SGML was needed on the Internet. HTML
was not adequate for the extensible, data-oriented nature of informa-tion
exchange, and SGML was too complex and not native to the Internet environment.
Electronic
Data Interchange
Of course, HTML and SGML were
not the only data formats in existence prior to the emergence of XML. In the
electronic commerce and business communities, another acronym held even more
sway than SGML.
The Transportation Data
Coordinating Committee (TDCC) developed the Electronic Data Interchange (EDI)
format in the early 1970s as a means for transportation industry ven-dors to
specify transaction sets that enabled electronic processing of purchase orders
and bills. At the time, computing power was concentrated in isolated mainframes
that had low storage capacity and even lower bandwidth capabilities for
exchanging information.
Because freight transactions
were dominated by high-volume, low-dollar transactions, transportation
suppliers were early adopters of EDI standards. Many large carriers and
shippers achieved significant productivity gains by switching their internal,
paper-ori-ented systems to electronic transactions enabled by EDI.
Because the presence of an
established message-transport infrastructure, standardized business process
rules, and file formats did not exist in the early years of EDI’s forma-tion,
the EDI format carried with it specifications for how the messages were to be
exchanged and processed. Before the Internet came into widespread use, EDI
messages were sent across private value-added networks (VANs) that ensured that
transactional messages reached their destination with security, integrity, and
messaging validity, along with receipts that guaranteed the messages were
received. The EDI transaction sets also contained strict business rules on how
the messages were to be handled.
The EDI file format used a
fairly arcane syntax that was unintelligible to most humans. Just looking at
Listing 1.4 is enough to give many of us headaches. The structure was aimed at
efficiency and compactness over flexibility and human readability. As such, EDI
parsers and processors were used to create, read, and manage these files. In
general, two parties that wished to conduct an EDI transaction would need to
enter into a trading agreement, choose a VAN for message delivery, build or buy
software to conduct map-ping between data formats and EDI messages, and build
translators to interpret the sender’s message into the company’s native data
format. Each of these operations would have to be accomplished for every new trading
partner added to the network. In addition, VANs charge monthly and
per-transaction fees for the handling of these messages. It is no wonder that
implementation cost and complexity is so high with EDI systems. It is also no
wonder that only the large manufacturers were able to afford to participate!
LISTING 1.4 EDI Example
ISA*00* *00* *01*003897733
*12*PARTNER ID*980923*1804*U*00200*000000002*0*T*@ GS*PO*MFUS*PARTNER
ID*19980924*0937*3*X*004010
ST*850*0001
BEG*00*SA*4560006385**19980923
CUR*BY*USD TAX*1-00-123456-6
FOB*DF***02*DDP
ITD*01*ZZ*****45*****NET 45 -
Payment due 45 days from Document Date TD5*Z****Ship via Airborne
N9*L1**NOTE FOLLOWING
TEXT
MSG*PLEASE CONFIRM PRICE IF
NOT CORRECT. N9*L1**NOTE FOLLOWING TEXT
MSG*CONTACT JACK WITH QUESTIONS
212-555-1212 N1*BT**92*USA1
N1*BY*ACME HARDWARE
CORPORATION*92*MFUS
PER*BD*JOHN DOE
N1*SE*PARTNER COMPANY
NAME*92*0010001000
N1*ST*Acme Hardware
Corporation*92*0000002924
N3*123 Random Hill Rd
N4*Megalopolis*NY*01429*US
PO1*00010*3600*EA*1.233*CT*BP*123456-123*EC*AM*VP*123456*123
PID*F****STROMBOLI,
4000,XCR-P5
SCH*3300*EA***002*19981101
CTT*1
SE*23*0001
GE*1*2
IEA*1*000000002
Each of the EDI transaction
sets defines which fields of data are contained in a specific transactional message.
The format defines the fields themselves, their order of appear-ance, and the
length of the information contained within. A number of “implementation
guidelines” are also applied to the transaction sets to assist in the
development of valid EDI messages.
The EDI transaction sets were
developed by two separate bodies: the American National Standards Institute’s
(ANSI) Accredited Standards Committee (ASC) X12 and the United Nations
Standards Messages Directory for Electronic Data Interchange for Administration,
Commerce, and Transport (EDIFACT). Whereas ANSI X12 met the needs of North
American commerce users, EDIFACT was focused on meeting more international
needs. Later, the ANSI ASC X12 effort was moved to the Data Interchange
Standards Association (DISA) for ongoing management. As such, the
specifications devi-ated somewhat and the “standard” nature of EDI was rapidly
degraded.
EDI has been used as the
basis for a number of industry-specific standards efforts. In particular, the
healthcare industry has used EDI to define its Health Level Seven (HL7)
standard, which is in use by most of the world’s hospitals and insurance
companies for exchanging healthcare and health insurance information. In
addition, other groups including automotive, insurance, government, retail, and
grocery industries have looked to EDI as a format on which to base their
business-to-business interactions.
However, many of the supposed
gains that EDI was to deliver were never realized due to the inability of the
electronic applications to eliminate the paper processes necessary to support
the business processes. EDI exhibits the “80/20 rule,” which states that the
last 20 percent of a company’s trading partners to be implemented in EDI will
represent 80 percent of its savings. The reason for this is simple: The trading
partners that still con-duct business in paper formats and processes still need
to be supported. That means dual and somewhat-redundant processes—one
electronic and one paper—need to be sup-ported. This is very inefficient in the
long run. In addition, EDI was never really able to help the small and
medium-sized trading partners to participate in the electronic com-merce game.
This is primarily due to EDI’s cost and the complexity of implementation. It
was simply too expensive to get all the small-business suppliers to switch from
their paper processes to EDI. This meant that the returns for everyone were
greatly diminished.
Another of EDI’s problems is
its reliance on fixed transaction sets. The rigidity of these transaction sets
makes EDI somewhat impervious to the natural changes that occur in business
processes and methodologies. This rigidity is reflected in the somewhat-strict
manner in which EDI messages must be processed and the standardization process
by which these transaction sets are defined. Transaction sets have a
well-defined field format and structure. Companies are not free to add their
own data elements or redefine data structures. This has required many users to
implement EDI in a nonstandard manner in order for it to serve their business
needs.
However, the EDI industry
sought to fix many of these shortcomings by embracing the Internet as a means
for transportation, and by relaxing many of the strict processing requirements.
EDI has actually made some significant strides in the past five or so years in
trying to adapt to the rapidly changing business frontier. In this regard, it
is unlikely that EDI is going to disappear entirely. Rather, we may find that
within EDI’s already large community base, its use will solidify. However, as a
means for transporting data in general or as a solution for e-business for the
community at large, EDI has had its day in the sun, and now XML is due to bask
in some of the sunlight.
The investment that many
companies have made in EDI is not going to simply be thrown away, however. Many
companies are looking to leverage their EDI expertise into crafting XML
solutions that take advantage of the EDI infrastructure, business processes,
and architecture. In fact, a number of XML proposals seek to “XML-enable” EDI
by simply replacing the arcane EDI format with XML tags. Others seek to mirror
the transaction sets using a similar XML-based element structure. In any case,
many companies are seeking to soften the transition from EDI to XML-based
systems by utilizing the decades of experience in EDI systems and using this
experience to create robust XML-based systems.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.