History
of Bioinformatics
Bioinformatics
has emerged as a scientific discipline that encompasses the application of
computing science and technology to analyze and manage biological data. All
this began when it was demonstrated by Ingram that there is homology between
sickle cell haemoglobin and normal haemoglobin. This led to comparison of other
proteins with similar biological function. As more and more proteins were
sequenced, it became necessary to have databases which enabled a quick
comparison using computational softwares. With the advent of rapid nucleic acid
sequencing techniques, a large number of sequences started accumulating which again
required computing facilities.
In
1962, Zuckerkandl and Pauling proposed a new approach of studying evolutionary
relations using sequence variability. This initiated a new field called
'molecular evolution'. The approach was based on the observation that
functionally related or homologous protein sequences were similar.
Subsequently, sequence comparisons, analysis of functional relatedness and
inference of evolutionary relationships became possible. Margaret Dayhoff
observed that protein sequences undergo variation during evolution according to
certain patterns. She noted that :
•
amino acids were not replaced at random but were altered with
specific preferences. For example, amino acids with similar physico-chemical
characteristics were preferred, one for another.
•
some amino acids such as tryptophan, was generally not replaced
by any other amino acid.
•
based on several homologous sequences, a point accepted
mutation (PAM) matrix could be developed.
This
laid the first foundation for subsequent work on sequence comparisons using
quantitative approaches.
The
National Biomedical Research Foundation (NBRF) compiled the first comprehensive
collection of macromolecular sequences in the "Atlas of Protein Sequence
and Structure' published from 1965-1978 under the editorship of Margaret O.
Dayhoff. Dayhoff and her research group pioneered the development of computer
methods for the comparison of protein sequences, for the detection of distantly
related sequences and duplications within sequences, and for the inference of
evolutionary histories from alignments of protein sequences.
In
1980, the data library was established at the European Molecular Biology
Laboratory (EMBL) to collect, organize, and distribute nucleotide sequence,
data and related information. Now its successor is the European Bioinformatics
Institute (EBI) located at Hinxton, U.K. The National Centre for Biotechnology
Information also started in USA as a primary information databank and provider
at about the same time. Later, the DNA Data Bank of Japan was initiated. The
Protein Information Resource (PIR) was established in 1984 by the National
Biomedical Research Foundation (NBRF) as a resource to assist researchers in
the identification and interpretation of protein sequence information. Today, all
these databanks are in close collaboration with each other and they exchange
data on a regular basis.
As
the sequence data began to accumulate rapidly, new powerful sequence analysis
softwares were needed. In parallel, firm mathematical basis was also required
to develop algorithms. Scientists from the field of mathematics, biology, and
computer science entered the emerging field of bioinformatics.
The
databanks through their wide network of distribution of information are very
important sources for all researchers who take interest in asking fundamental
questions in biology. Thus, a major primary aim of bioinformatics is to spread
scientifically investigated knowledge for the benefit of the research
community. Other aims include the development of softwares for data analysis.
The
word "bioinformatics" is a combination from biology and informatics.
As it became clear that biological polymers, such as nucleic acid molecules and
proteins, can be transformed into sequences of digital symbols informatics
approaches can be used for analysis. Moreover, only limited set of letters is
required to represent the nucleotide and amino acid monomers. It is the digital
nature of this data that differentiates genetic data from many other types of
biological data, and has allowed bioinformatics to flourish. Another key point
is that the use of sequence data relies upon an underlying reductionist
approach: sequence implies structure which in turn implies function. In the
subsequent sections we will see the details of these activities.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.