SURVEY
OF THE HUMAN GENOME
The sequence of the human genome
is 2.9 × 109 base pairs
(2.9 Gbp or gigabase pairs) in length. If the sequence were typed onto paper,
at about 3000 letters per page, it would fill 1 million pages of text. This
extraordinary amount of information is encoded by the sequence of just four
bases, cytosine, adenosine, guanine, and thymine. Most people expected the
human genome sequence to reveal the actual number of genes found in human
beings. In reality the massive amount of sequence needs sophisticated
interpretation in order to determine how many genes it contains. The best
estimates so far predict only 25,000 genes, but the number may be more or less.
Of the identified genes, we only know the function of around 50%. More than 40%
of the predicted human proteins are similar in structure to proteins in
organisms such as fruit flies or worms.
The genome sizes and estimated gene numbers are given for several organisms in Table 8.1. Note that plants presently hold the record for gene numbers, although they have less total DNA than higher animals. Among the animals, the ciliate protozoan Paramecium has less DNA but more genes than any multicellular animal sequenced so far. The largest bacterial genomes have more genes than the smaller eukaryotic genomes. An example is Streptomyces, famous as the source of many antibiotics. The smallest bacterial genomes, such as Mycoplasma, have fewer than 500 genes, although because these bacteria are parasitic they rely on the eukaryotes they infect for many metabolites.
In eukaryotic DNA, genes
encompass thousands or even millions of base pairs, most of which are actually
introns that are spliced out of the mRNA transcript. For example, the gene for
dystrophin (defective in Duchenne’s muscular dystrophy) is 2.4 million base
pairs long, and some of its introns are 100,000 base pairs or more in length.
In contrast, the coding sequence, consisting of multiple exons, is about 3000
base pairs. In such situations, it is not always easy to find coding sequences
among all the noncoding DNA. On the one hand, this may result in genes (or
individual exons) being completely missed. On the other hand, widely separated
exons that are in reality parts of a single coding sequence may be interpreted
as separate genes.
Another confounding factor in determining the number of genes is the presence of pseudogenes. These are duplicated copies of real genes that are defective and no longer expressed. They may be found next to the original, or they may be far away, on different chromosomes. Determining whether or not a “gene” is a pseudogene or genuine may be difficult with sequence data alone. Often the expression of a particular region of DNA must be confirmed by finding corresponding mRNA transcripts. DNA microarrays are a popular approach to confirming whether or not a gene is expressed (see later discussion).
The number of genes also
hinges on how we define a gene. In addition to the 25,000 protein-encoding
genes, there are a thousand or more genes that encode nontranslated RNA. The
ribosomal RNA and transfer RNA genes are the most familiar of these. However, a
variety of other small RNA molecules are involved in splicing of mRNA and in
the regulation of gene expression. Other sequences of DNA may not even be
transcribed, yet are nonetheless important. Should these also be regarded as
genes?
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.