The
Promise of XML
·
Advantages of XML over
SGML
·
Advantages of XML over
HTML
·
Advantages of XML over
EDI
·
Advantages of XML over
Databases and Flat Files
·
Drawbacks to XML
·
XML-Based Standards
What can XML offer that these
other various formats have been unable to deliver at this point? How will XML
make our lives better, make our systems more efficient, lower our costs, and
increase our revenues? How will XML make the task of representing, storing, and
exchanging data an easier process than using SGML, HTML, or EDI?
Benefits
of XML
The very nature of XML is
that it is a structured document format that represents not only the
information to be exchanged but also the metadata encapsulating its meaning.
Most information has structure of some type. For example, information about a
book contains information about the title, author, chapters, body text, and
index. In turn, body text contains paragraphs, line text, and footnotes. This
information is structured because a document that describes a book would need
to describe that information in a way that a person or machine can understand
it. Author information should not be contained within the index section, and
vice versa. Although SGML has provided this functionality to XML by virtue of
being a “parent language,” XML has simplified the process of defining and using
this metadata.
Although XML is fairly simple
in nature, in that it only needs to follow basic syntax rules to be considered “well-formed,”
one of the biggest features of the language is its ability to provide a means
for guaranteeing the validity of a document. This means that not only can you
send a document to a receiving party but you can also send criteria, in the
form of Document Type Definitions (DTDs) or other schema formats, with which
the document must comply. For example, criteria may specify that an XML
document should contain only the listed set of elements and attributes in a
specific order and in given quantities. XML documents, on the other hand, come
built in with error and validity checking. The DTD or schema that is referred
to by an XML document can guarantee,
at the time of document
creation, that all the elements are correctly specified and in the correct order.
Furthermore, the usage of a more advanced validity-guaranteeing mecha-nism such
as XML Schema can help guarantee that the values of the element content itself
are valid and fall within acceptable ranges. Documents can be validated at
their time of creation or at their time of receipt, and they can be rejected or
accepted on an automated basis without human intervention. At design time,
these errors can be fixed before transmission, and upon receipt, they can be
sent back to the sender for further human processing with an exact pinpointing
as to where these errors have occurred.
Validity-checking software is
also very low cost, if not free. Most parsers on the market are available in
open-source form and come with validation capabilities built in. Although many
of these are currently only DTD compliant, the move to XML Schema–based
validity checking is well under way. Batches of documents can be checked for
compliance against a single DTD or schema, or they can be checked against
different schema based on their destination or origination. Although the use of
a DTD and schema does not guarantee 100-percent validity, it goes a long way
toward ensuring that the vast majority of documents exchanged and received fit
an acceptable policy.
One benefit of using XML with
DTDs or schemas is that XML editors provide structured editing “for free.” As a
developer, how many times have you run a processor on some formatted file only
to get a complaint about a syntax error at line 37? Editing software that only
allows you to enter valid XML will catch many of these errors as you type them.
From another perspective, editors can automatically create a form-style
interface from a DTD or schema. Therefore, XML can provide a simpler user
interface and elimi-nate some of the complexity of creating XML documents.
XML takes advantage of
existing Internet protocols, and as such, designers choosing to use XML in
their solutions don’t have to create new protocols as a means for transport-ing
their documents. Designing a new protocol today may not make sense when
existing and well-understood protocols such as HTTP exist. Using these
protocols makes the doc-ument more portable across multiple platforms, more
easily debugged, and easier to understand how to qualify and route. In addition,
HTTP as a protocol is well understood, and IT engineers know how to manage the
HTTP traffic. Using a new protocol would require inventing a protocol to go
over the wires, which would necessitate identifying new data streams for
firewalls, management of the traffic, and a whole ball of wax that is simply
not necessary for a structured data format.
Because XML is a structured
document that shares many of the same processing and parsing requirements as
SGML and HTML, plenty of generally available parsers have been built. Many of
these parsers are now built in to general browsers and server-side agents.
Chapter 2 talks about these various client-side and server-side parsers and
processors and explains which tools are available for use today.
In addition, the Document
Object Model (DOM) has been created by the W3C as a gen-eral model for how
parsers and processors should interact and process XML documents for
representation as a data-bound tree. As a result, the DOM has produced a
generic, universal method for processing XML documents. Applications that
require XML pro-cessing can access this wealth of tools and specifications and
thus add parsing in a rela-tively pain-free way. Developers do not have to
write new parsers, unless they really want to. Many parsers exist in a wide
variety of languages, and many of these are free.
Another oft-cited benefit of
XML is its ability to be read and written by humans, rather than created by
applications in a machine-only readable format. Although many say that XML will
be primarily used for machine-to-machine communication and can be created using
visual tools that don’t necessitate the actually editing of the code,
experience with HTML has shown that there are numerous occasions when a
developer has to “dip in” to the actual document and make adjustments. It is
for this reason that XML is plain text and uses elements that represent actual
words or phrases that contain some semantic meaning.
XML represents information
and the metadata about that information; therefore, it does not specify any
particular manner for how the data should be processed or provide any
constraints for mechanisms with which to handle the information. This is in
contrast to other formats, such as EDI, certain types of text files, and
databases, that explicitly require accessing the documents in a specific
manner. Furthermore, the files themselves define how the information is to be
processed and what requirements systems must have in order to make sense of the
documents. In contrast, XML documents simply encode information and their
metadata without specifying how the information is to be processed or
displayed.
Often, the capability of XML
to separate its process and data content is known as being future-proof or loosely
coupled, depending on which end of the marketing spectrum you stand. Future-proof in this instance means that no future changes in the
data-exchange layer should affect the programming layer, and vice versa.
Loosely coupled systems allow for “arms-length” exchange of information, where
one party does not need to know details of how the other party plans to process
the information. These systems are then “loosely coupled” from the existing
systems they need to integrate with or whatever sys-tem is to be in place in
future. This allows for changes in the presentation, process, and data layers
without affecting the other layers.
Due to XML’s popularity, ease
of use, and increasing proliferation of tools, the number of individuals and
organizations skilled in XML use is increasing exponentially. It is becoming
considerably easier to find skilled employees and contractors who are familiar
with XML, the standards, and best practices for implementing XML in multiple
environments. Perhaps one of the best arguments for the use of XML is that the
more people there are who make use of the language, the more it will be
supported and capa-ble of meeting your needs. Sometimes the best technologies
are the ones that are the most in use, regardless of their technological
advantages.
Advantages
of XML over SGML
Although XML borrows much of
its functionality from SGML, it provides a number of distinct advantages.
Although SGML may still be suitable for content and data represen-tation, the
tide of public opinion is definitely shifting in XML’s favor. As such, it makes
sense to at least consider XML in place of existing or proposed SGML
implementations.
XML permits well-formed
documents to be parsed without the need for a DTD, whereas many SGML
implementations require some DTD for processing. XML is much simpler and more
permissive in its syntax than SGML. The XML specification is very small,
includes a bare-bones set of features (rather than a bunch of optional features
that can make implementation costs difficult to judge), and avoids some of the
stigma associated with the SGML name.
XML was created because a
direct implementation of SGML on the Internet was diffi-cult. SGML simply did
too much. One of SGML’s benefits is that it provides significant flexibility
for a diverse community of users by providing a wide array of choices, which
resulted in a wide range of syntactical variations for documents. This produced
a specifi-cation that was very difficult for developers to implement. XML 1.0
simplified the speci-fication by eliminating unnecessary flexibility. This
resulted in a specification that was both powerful and easy to implement. The
goal was to aim at meeting the majority of users’ needs, without aiming to meet
all the users’ needs.
Advantages
of XML over HTML
HTML was created to meet a
very different need than XML. It is clear that XML will not now, or perhaps
ever, completely replace HTML. Except of course with regard to the XML-enabled
version of HTML, known as XHTML. HTML
was designed as a language to present hyperlinked, formatted information in a
Web browser. It has no capability to represent metadata, provide validation,
support extensibility by users, or support even the basic needs of e-business.
Fundamentally, the difference is that HTML is intended for consumption by
humans, whereas XML is meant for both machine and human consumption.
Advantages
of XML over EDI
EDI adoption has been fairly
widespread, even though mainly among larger-sized busi-nesses. The cost of EDI
implementation and ongoing maintenance can be measured in the billions in
aggregate. Millions of dollars in transactions occur on a daily basis using
EDI-mediated messages. It would be very difficult, if not impossible, to uproot
all this activity and replace it with exclusively XML-based transactions. These
businesses have so much money and time invested in ANSI X12/EDI that they will
be fairly slow to adopt a new standard, which would necessitate new processing
technology, mapping software, and back-end integration. For them, it would seem
that they would need to discard their existing, working technology in favor of
an unproven and still immature technology.
However, XML offers a number
of clear advantages over EDI, which has long had its time in the sun. XML is a
good replacement for EDI because it uses the Internet for the data exchange.
There have been efforts to provide mechanisms for EDI to also be trans-ported
over the Internet, but many of these have not met with much success. Recent
efforts have attempted to make use of Internet protocols such as SMTP, FTP, and
HTTP to transport EDI, but it is clear that the format was not originally
designed or intended for such use.
Compared to EDI and other
electronic commerce and data-interchange standards, XML offers serious cost
savings and efficiency enhancements that make implementation of XML good for
the bottom line. There are many components to document exchange and electronic
commerce systems: document creation tools, processing components, validity
checking, data mapping, back-end integration, access to a communications
backbone, security, and other pieces of the commerce puzzle. XML greatly
simplifies, if not elimi-nates, many of these steps.
XML’s built-in validity
checking, low-cost parsers and processing tools, Extensible Stylesheet Language
(XSL) based mapping, and use of the Internet keep down much of the e-commerce
chain cost. In many cases, general XML tools can be found that are not only
applicable to the problem to be solved, but are flexible and very inexpensive.
Whereas EDI is a specific domain of knowledge and expertise that comes with a
compa-rable price tag, XML makes use of technology that has been in use for
years, if not decades. Systems that take advantage of this wealth of available
processing power and know-how will greatly reduce not only their costs but also
their time to implementation.
The use of the Internet
itself greatly lowers the barrier for small and medium-sized com-panies that
have found EDI too costly to implement. Simple functionality and low-cost tools
will go a long way in helping these companies afford to exchange high-quality,
structured documents that are capable of supporting commercial exchange and
back-end integration.
As one XML user states, “XML
is hip, happening, now.” EDI is perceived as crusty and old. Text files are
blasé, and databases have increasingly become a staple of data storage locked
in a proprietary format. The idea that XML represents a new, fresh approach to
solving many lingering problems in a flexible manner appeals to many in senior
manage-ment. In many instances, buying into a new technology requires the
approval of the senior levels of IT, if not the corporate and management
levels. With XML’s continuing positive exposure, getting management approval on
an XML project is become an increasingly simpler endeavor.
Another of the drawbacks to
EDI and some text file and database formats is that they don’t easily support
the needs for internationalization and localization. Specifically, in those
languages it is difficult to represent information contained in a non-Latin
alphabet. XML, as part of its initial specification, supports these needs
inherently.
XML syntax allows for
international characters that follow the Unicode standard to be included as
content in any XML element. These can then be marked up and included in any
XML-based exchange. The use of internationalization features helps to surpass
one of the early problems of other formats that cause unnecessary schism and
conflict between different geographies. For example, it is not fair that an
English technical man-ual can be marked up in a file format if a Japanese
manual can’t be likewise formatted. XML sought to solve this problem from the
get-go.
Advantages
of XML over Databases and Flat Files
XML is a structured document
format that includes not only the data but also metadata that describes that
data’s content and context. Most text files simply cannot offer this clear
advantage. They either represent simply the information to be exchanged without
metadata or include metadata in a flat, one-level manner. Common file exchange
formats such as comma-delimited and tab-delimited text files merely contain
data in predefined locations or delimitations in the files. Complex file
formats such as Microsoft Excel con-tain more structured information but are
machine-readable only and still do not contain the level of structuring present
in XML.
Relational and
object-oriented databases and formats can represent data as well as meta-data,
but for the most part, their formats are not text based. Most databases use a
propri-etary binary format to represent their information. There are other
text-based formats that include metadata regarding information and are
structured in a hierarchical representa-tion, but they have not caught on in popularity
nearly to the extent that XML or even SGML has.
Although text files can also
be transmitted via e-mail and over the Web, structured for-mats such as
relational and object-oriented databases are not easily accessible over the
Internet. Their binary-based formats and proprietary connection mechanisms
preclude their ability to be easily accessible via the Internet. Many times,
gateway software and other mechanisms are needed to access these formats, and
when they are made accessible it usually is through one particular transport
protocol, such as HTTP. Other means for accessing the data, such as through
e-mail and FTP, are simply not available.
One of the primary issues
faced by alternate file format and database languages is that processing tools
are custom, proprietary, or expensive. When tools are widespread, they are
usually specific to the particular file format in question. One of XML’s
greatest strengths is that processing tools have become relatively widespread
and inexpensive, if not free.
Drawbacks
to XML
One of the most notable and
significant “knocks” against XML is that it’s huge. XML takes up lots of space
to represent data that could be similarly modeled using a binary format or a
simpler text file format. The reason for this is simple: It’s the price we pay
for human-readable, platform-neutral, process-separated, metadata-enhanced,
structured, validated code.
And this space difference is
not insignificant. XML documents can be 3 to 20 times as large as a comparable
binary or alternate text file representation. The effects of this space should
not be underestimated. It’s possible that 1GB of database information can
result in over 20GB of XML-encoded information. This information then needs to
get stored and transmitted over the network—facts that should make computer,
storage, and net-work hardware manufacturers very happy indeed!
Let’s not also forget that
computers need to process this information. Large XML docu-ments may need to be
loaded into memory before processing, and some XML documents can be gigabytes
in size! This can result in sluggish processing, unnecessary reparsing of
documents, and otherwise heavy system loads. In addition, much of the “stack”
of proto-cols requires fairly heavy processing to make it work as intended. For
example, the Simple Object Access Protocol (SOAP), which is a cross-platform
messaging and com-munication platform for use in remote procedure calls (RPCs)
between and within server systems, is a very heavy protocol to manipulate
on-the-fly. The marshalling that occurs in the process of working with the
protocol can cause system performance to be quite poor because XML is, after
all, a text-based protocol that is being used to make RPCs between systems.
Using XML in this transactional, real-time manner may impose more requirements
on the system as far as parsing and processing than the system can handle.
In addition, a problem of
many current XML parsers is that they read an entire XML document into memory
before processing. This practice can be disastrous for XML doc-uments of very
large sizes. XML is not only a data language but a complicated one at that
(from a parsing perspective). It oftentimes increases code complexity, because
XML can be more difficult to parse than a simpler data format such as comma- or
tab-delim-ited fields.
Despite all the added value
in representing data and metadata in a structured manner, some projects simply
don’t require the complexity that XML introduces. In these cases, simple text
files do the job more efficiently. For example, a configuration file that
includes a short list of a few commands and their values doesn’t require a
multilevel, metadata-enhanced file format for its communication. Therefore, one
shouldn’t take the stance that simply because XML contains structure and
metadata it should be used for all
file formatting and document-exchange needs.
Although XML does offer
validation technology, it is not currently as sophisticated as many of the EDI
syntax checkers. XML editors often lack the detail and helpfulness found in
common EDI editors. Many EDI syntax editors can report error details
through-out a document and can complete the parsing of the entire document.
Many XML editors are unable to proceed beyond the first syntax.
In addition, XML inherits the
notorious security issues associated with the Internet, but it also inherits
the possible solutions to those problems as well. As long as a system is
designed with security in mind, exchanging XML over the Internet should be
fairly prob-lem free.
XML-Based
Standards
We have already discussed the
advantages of the “ML” in XML, but the “X” presents advantages of its own. Extensibility, as applied to XML, is the
ability for the language to be used to define specific vocabularies and
metadata. Rather than being fixed in describ-ing a particular set of data, XML,
in conjunction with its DTDs and schema, is able to define any number of
documents that together form a language of their own.
Indeed, hundreds, if not
thousands, of specific document vocabularies have been created based on XML to
meet the different needs of healthcare, manufacturing, user interface design,
petroleum refining, and even chess games. Text files and relational database
schemas are rigid in that they are meant to represent the information contained
within and nothing more. It would be a difficult proposition at best to add a
new set of informa-tion to a text file or relational database management system
(RDBMS). XML files, espe-cially those created using an “open content model,”
can easily be extended by adding additional elements and attributes. Whole
classes of documents can be defined simply by sending a document with a new DTD
or schema. Sharing a DTD and schema within a user community results in a joint
specification—if not a de facto or explicit standard.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.