Scientific and Engineering
In the early 1990s, the Internet was being used, not by mainly
commercial entities but by scientific and educational establishments. The
foundation technologies for the Web—the Hypertext Transfer Protocol (HTTP) and
the Hypertext Markup Language
(HTML)—were created not for the sake of online e-commerce but to
exchange research papers in the field of physics. Therefore, it makes complete
sense that XML would be a hotbed of activity by those in various scientific,
mathematic, and engineering fields.
This section touches on two standards that have leveraged XML as their
means of document exchange.
Biotech
The rapid increase in the use and exchange of data in the biological
fields has demanded a better way for representing, storing, and exchanging this
information. The use of infor-mation in biology has spawned its own field of
study, bioinformatics, and the recent explosion in genetics research has
likewise required an increasing amount of attention in standardizing
information storage and exchange. To this end, XML has provided the technology
to meet many of these needs.
Bioinformatic Sequence Markup Language (BSML)
Just as in every industry that has large data requirements, the
bioinformatics industry has the challenge of integrating large quantities of heterogeneous
information gathered from different sources and distributed locally and over
the Internet. A bioinformatic sequence is the visual encoding of strings of
nucleotides, the chemical makeup of our DNA. Individual nucleotides, such as
adenosine, cytosine, guanine, taurine, and uracil, are encoded as “acgtu,”
respectively. A sequence is an arbitrarily long string of these charac-ters
that corresponds to a particular encoding of genetic material. As researchers
expand their knowledge of a particular organism’s genetic structure, the
exchange of these strings of genetic encoding becomes increasingly more
important. As is the case almost everywhere that data is present, XML can
facilitate the discovery process by enabling the researchers to integrate and
annotate these sequences. XML also enables the integration of this “genomic”
information with related, or “extragenomic,” information such as liter-ature,
images, and documents that support the particular genetic information being
researched.
Developed by the National Human Genome Research Institute (NHGRI) and
promoted by LabBook, Inc., the Bioinformatic Sequence Markup Language (BSML) is
a proposed XML standard for the communication of bioinformatics data. The BSML
standard is divided into two logical parts: Definitions and Display. The
Definitions section encodes the bioinformatic data, including sequences, sets,
sequence features, analytical outputs, relationships, and annotations. The
optional Display section encodes information for graphic representation of the
bioinformatic data. Multiple users can simultaneously access the same data and
examine different links, files, and sequence views without hav-ing to make
alterations to source documents. In addition, BSML allows users to include
multiple annotations such as documents, tables, charts, and sequence features
and graphs aligned to sequence maps. Although the specification of BSML doesn’t
require any spe-cific browser or graphical interpretation technology, LabBook
provides for a viewer that is tailored around the BSML application. In
addition, LabBook develops and provides freely available tools that help create
and manipulate BSML files.
The BSML specification’s main goal is to represent genetic sequences and
their graphic display properties. In particular, the specification describes
the features of genetic sequences, represents relationships among sequences and
their features, defines graphic objects that represent sequence features and
relationships, provides representation of the relationships between sequences
and source documents (such as sequence and genetic marker databases), and
defines methods for storing and transmitting encoded sequence and graphic
information. Listing 22.9 shows a sample BSML XML instance.
LISTING 22.9 Sample BSML Instance
<!DOCTYPE Bsml SYSTEM “bsml.dtd”> <Bsml>
<Definitions>
<Sequences>
<Sequence id=”SEQ1”
title=”ECRPOBC” seq-type=”dna” units=”bp” length=”12337” shape=”linear”
strands=”2”>
</Sequence>
</Sequences>
</Definitions>
<Display>
<Page>
<View id=”VEW1” seqref=”SEQ1”> </View>
</Page>
</Display>
</Bsml>
Even though LabBook has wrapped commercial products around the standard,
BSML remains in the public domain and is supported by the LabBook efforts.
Chemistry
In the same vein as biological information,
chemistry and materials information also needs to be exchanged. This is
especially vital in the various pharmaceutical, materials processing, plastics,
petroleum, and other industries that rely on accurate chemical infor-mation to
perform their tasks adequately. However, like any other industry, the processes
have been formerly dominated by paper rather than electronic interchange.
Various chemistry industry specifications, such as the Chemical Markup Language
covered next, hope to change this by providing a deep level of specification
for chemical properties as well as the required vocabularies for defining
chemical industry interchange.
Chemical Markup Language
The foundations of the Chemical Markup Language
(CML, or more officially known as XML-CML) can be traced all the way back to
the original days of HTML, when the Internet was frequented mainly by academics
rather than individuals and corporations. The original concept was to provide a
platform-neutral means of exchanging information regarding chemical
compositions. Originally formatted as an SGML DTD, CML began pursuing the XML
direction soon after the language’s development in 1996. Subsequently, CML
became one of the first acknowledged domain-specific DTDs pub-lished for XML.
CML itself doesn’t cover the entire spectrum of
possibilities in the chemical industry. Rather, it focuses on representing
molecules, which the CML Web site defines as “dis-crete entities representable
by a formula and usually a connection table.” CML further specifies a hierarchy
for compound molecules, such as clathrates and macromolecules, reactions, and
macromolecular structures/sequences. In addition, CML “has no specific support
for physicochemical concepts but can support labeled numeric data types of
sev-eral sorts, which can cover a wide range of requirements. It allows
quantities and proper-ties to be specifically attached to molecules, atoms, or
bonds.”
In many respects, CML forms a common basis for most chemical-domain XML
vocabu-laries in much the same way that MathML forms the basis for many
mathematical and scientific-domain XML vocabularies. CML also makes use of and
leverages a number of other XML specifications, including Resource Description
Framework (RDF), XHTML, SVG, PlotML, MathML, Dublin Core, and XML Schema, as
its schema base.
CML supports spectra and other instrumental output, crystallography,
organic and inor-ganic molecules, physicochemical quantities (including units),
MO calculations, macro-molecules (such as sequence protein and ligand), molecular
hyperglossaries (including text and molecules), and hyperlinks. CML
accomplishes this by specifying a core set of elements, such as molecule (to describe a connected set of
atoms), bond, which describes a link between atoms within a molecule, atomArray and bondArray, which provide con-tainers for
atoms and bonds, and electron, which provides details of electrons in atoms, bonds, and molecules.
Also specified are macromolecular, reaction, crystallography, and formula
elements to describe the interaction of these various core elements.
Macromolecular elements include sequence, to describe a macromolecular sequence, and feature, which describes features in a
sequence. Reaction elements are specified by means of reaction, which describes a reaction that contains molecules and links between
them. Crystallography and formulas are described by crystal and formula, which describe crystallographic
unit cell and symmetry in fractional coordinates for atoms and provide a
container for the representation of arbitrary chemical formulas using a text
string with a convention attribute.
LISTING 22.10 Sample CML Document
<molecule convention=”MDLMol”
id=”adrenalin” title=”EPINEPHRINE”> <date day=”22” month=”11”
year=”1995”>
</date>
<atomArray> <atom
id=”a1”>
<string builtin=”elementType”>C</string>
<float builtin=”x2”>-0.2969</float> <float
builtin=”y2”>0.8979</float>
</atom>
<atom
id=”a2”>
<string
builtin=”elementType”>C</string> <float
builtin=”x2”>-0.2969</float> <float
builtin=”y2”>-0.6121</float>
</atom>
<atom
id=”a14”>
<string
builtin=”elementType”>H</string> <float
builtin=”x2”>2.144</float>
<float
builtin=”y2”>2.8844</float> </atom>
</atomArray>
<bondArray> <bond id=”b1”>
<string builtin=”atomRef”>a1</string>
<string builtin=”atomRef”>a2</string> <string
builtin=”order”>1</string>
</bond>
<bond
id=”b2”>
<string builtin=”atomRef”>a1</string>
<string builtin=”atomRef”>a3</string> <string
builtin=”order”>2</string>
</bond>
<bond
id=”b14”>
<string builtin=”atomRef”>a4</string
<string builtin=”atomRef”>a14</string> <string
builtin=”order”>1</string> <string
builtin=”stereo”>H</string>
</bond>
</bondArray>
</molecule>
<reaction
title=”Diels-Alder cycloaddition”
id=”simple_rxn_1”
convention=”stepwise”> <string title=”description”> Simple example of
a A + B -> C reaction. See source for further information.
</string>
<float
title=”yield”
units=”%”>88</float>
<string title=”notes”>taken from Vollhardt
and Schore</string> <list title=”reactionStep” id=”simple_s_1”>
<string title=”description”>cycloaddition</string>
<float title=”yield” convention=”%”>88</float> <string
title=”notes”>one step</string>
<link title=”reactant”
href=”simple_mol_reactant1” id=”simple_lk_1”/> <link title=”reactant”
href=”simple_mol_reactant2” id=”simple_lk_2”/> <link title=”reagent”
id=”simple_lk_3”>
<integer
title=”index”>1</integer>
<string
title=”solvent”>Acetonitrile</string>
<string title=”temperature”
convention=”degC”>100</string> <string title=”duration”
convention=”hours”>3</string> <string title=”notes”>reflux</string>
</link>
<link title=”reagent”
id=”simple_lk_4”> <integer title=”index”>2</integer> <string
title=”notes”>workup</string>
</link>
<link title=”product” href=”simple_mol_product”
id=”simple_lk_5”/> <!-- also catalyst, intermediate, transition state as
needed -->
</list>
</reaction>
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.