Genome Project (HGP)
The international human
genome project was launched in the year 1990. It was a mega project and took 13
years to complete. The human genome is about 25 times larger than the genome of
any organism sequenced to date and is the first vertebrate genome to be
completed. Human genome is said to have approximately 3×109 bp. HGP
was closely associated with the rapid development of a new area in biology
The main goals of Human
Genome Project are as follows
Identify all the genes (approximately 30000) in human DNA.
Determine the sequence of the three billion chemical base pairs
that makeup the human DNA.
To store this information in databases.
Improve tools for data analysis.
Transfer related technologies to other sectors, such as
Address the ethical, legal and social issues (ELSI) that may arise
from the project.
The methodologies of the
Human Genome Project involved two major approaches. One approach was focused on
identifying all the genes that are expressed as RNA (ETSS –
Expressed Sequence Tags ). The other approach was sequence
annotation. Here, sequencing the whole set of genome was taken, that contains
all the coding and non-coding sequences and later assigning different regions
in the sequences with functions. For sequencing, the total DNA from a cell is
isolated and converted into random fragments of relatively smaller sizes and
cloned in suitable hosts using specialized vectors. This cloning results in
amplification of pieces of DNA fragments so that it could subsequently be
sequenced with ease. Bacteria and yeast are two commonly used hosts and these
vectors are called as BAC (Bacterial Artificial Chromosomes)
and YAC (Yeast Artificial Chromosomes). The fragments are
sequenced using automated DNA sequencers (developed by Frederick
Sanger). The sequences are then arranged based on few overlapping regions,
using specialized computer based programs. These sequences were subsequently
annotated and are assigned to each chromosome. The genetic and physical maps on
the genome are assigned using information on polymorphism of restriction
endonuclease recognition sites and some repetitive DNA sequences, called microsatellites.
The latest method of sequencing even longer fragments is by a method called Shotgun
sequencing using super computers, which has replaced the traditional
• Although human genome
contains 3 billion nucleotide bases, the DNA sequences that encode proteins
make up only about 5% of the genome.
An average gene consists of 3000 bases, the largest known human
gene being dystrophin with 2.4 million bases.
The function of 50% of the genome is derived from transposable
elements such as LINE and ALU sequence.
Genes are distributed over 24 chromosomes. Chromosome 19 has the
highest gene density. Chromosome 13 and Y chromosome have lowest gene
The chromosomal organization of human genes shows diversity.
There may be 35000-40000 genes in the genome and almost 99.9
nucleotide bases are exactly the same in all people.
Functions for over 50 percent of the discovered genes are unknown.
Less than 2 percent of the genome codes for proteins.
Repeated sequences make up very large portion of the human genome.
Repetitive sequences have no direct coding functions but they shed light on
chromosome structure, dynamics and evolution (genetic diversity).
Chromosome 1 has 2968 genes whereas chromosome ’Y’ has 231 genes.
Scientists have identified about 1.4 million locations where
single base DNA differences (SNPs – Single nucleotidepolymorphism –
pronounce as ‘snips’) occur in humans. Identification of ‘SNIPS’
is helpful in finding chromosomal locations for disease associated sequences
and tracing human history.
The mapping of human
chromosomes is possible to examine a person’s DNA and to identify genetic
abnormalities. This is extremely useful in diagnosing diseases and to provide
genetic counselling to those planning to have children. This kind of
information would also create possibilities for new gene therapies. Besides
providing clues to understand human biology, learning about non-human
organisms, DNA sequences can lead to an understanding of their natural
capabilities that can be applied towards solving challenges in healthcare,
agriculture, energy production and environmental remediation. A new era of
molecular medicine, characterized by looking into the most fundamental causes
of disease than treating the symptoms will be an important advantage.
Once genetic sequence becomes easier to determine, some people may
attempt to use this information for profit or for political power.
Insurance companies may refuse to insure people at ‘genetic risk’
and this would save the companies the expense of future medical bills incurred
by ‘less than perfect’ people.
Another fear is that attempts are being made to “breed out”
certain genes of people from the human population in order to create a ‘perfect