IDENTIFICATION, LOCATION, AND CLONING OF DEFECTIVE GENES
The nature of mutations and many of the techniques used in analyzing them have already been discussed in previous pages. Furthermore, the human genome has now been fully sequenced and in principle the DNA sequence is available for all of the genes responsible for observed hereditary defects. In practice, connecting a particular set of symptoms with a specific gene is not always so straightforward. Here we will outline some general approaches to identifying the genes responsible for hereditary defects and finding their location on the human chromosomes. Confirmation of identity normally involves cloning and further genetic analysis. After this general discussion we will consider two examples in more detail—cystic fibrosis and Duchenne’s muscular dystrophy.
One way to identify genes responsible for hereditary defects is to analyze the symptoms and then make an informed guess as to what kinds of proteins are likely to be involved. Possible candidate genes are then chosen from the list of characterized genes and investigated further. This approach is therefore sometimes called candidate cloning. Because relatively few human genes have been characterized, this method is rarely successful. However, the recent vast increase in genomics and proteomics research is revealing the functions of many mammalian genes, so this method may become more valuable in the near future.
An improved variant of this approach comes from using model organisms, in particular mice. The vast majority of human genes have homologs in mice. Moreover, unlike humans, mice may be directly used for genetic experimentation. As described, Transgenic Animals, it is possible to make mice in which both copies of any chosen gene have been artificially inactivated.
Such knockout mice are then examined for symptoms. Several major programs are now in progress to systematically make mice with knockout mutations affecting every one of the 25,000 or so mammalian genes. Eventually, this information should allow many human genes to be matched with possible symptoms.
Functional cloning begins with a known protein that is suspected of involvement in a hereditary disorder. The amino acid sequence of the protein is determined. Nowadays this Thewould most likely be done by mass spectrometry of peptide fragments generated from the protein by protease digestion. The protein sequence is then used to deduce the coding sequence of the gene and an oligonucleotide probe is synthesized. The probe may be used to screen a cDNA library by hybridization. Alternatively, the probe can be linked to a solid support and used to pull out a specific mRNA molecule from a pool of cellular mRNA. In the latter case, a single specific cDNA is made from the purified mRNA.
The complete cDNA is then sequenced to confirm that the DNA matches the original protein.
The gene must then be localized to a specific region of a particular chromosome. This may be done by screening a set of radiation hybrid cells or by hybridization using a DNA probe with a fluorescent label (i.e., by FISH). Cloned DNA from the target region, carried on a vector capable of carrying large inserts, such as a cosmid or YAC, is then screened to narrow down the location.
Positional cloning is used when the nature of the gene product is unknown. In this case the disease gene must be mapped at least approximately by a genetic approach before further DNA-based screening can proceed. The easiest cases are those in which there is a major chromosomal abnormality, such as a deletion, inversion, or translocation that may be visualized under the light microscope. This may localize the defect to a specific band on a particular chromosome. Alternatively, linkage studies on individuals from families afflicted by the inherited defect may locate the damaged gene close to other genetic markers. These other markers may be known genes, but more often they will be RFLPs, VNTRs, or other sequence polymorphisms.
Such genetic mapping can localize a gene to around 1000 kb. This length of DNA may contain anywhere from 10 to 50 genes, depending on how crowded that region of the genome is. DNA from the suspect region is then cloned, as described earlier for functional cloning. However, in the case of positional cloning we have no previously identified protein that can be used to check for the corresponding gene. Therefore, the hereditary defect must be identified at the DNA level. The suspect DNA may be scanned for the presence of functional genes by a variety of approaches:
(a) The presence of open reading frames indicates a possible coding sequence.
Note that in higher organisms, the coding sequence will typically be fragmented into several exons separated by noncoding introns. These introns may be very long and frequently account for more of the overall length of the gene than the exons.
(b) CpG (or CG) islands are often found upstream of the transcribed regions in vertebrate DNA. These are GC-rich regions that are often methylated for regulatory purposes. They may be identified by the presence of multiple cut sites for restriction enzymes whose recognition sequences consist solely of C and G (e.g., HpaII cuts at C/CGG).
(c) Coding DNA tends to evolve more slowly than noncoding DNA. Consequently, coding DNA from one animal will often hybridize to DNA from a range of related organisms while noncoding DNA does not. Zoo blots are often used to identify coding DNA.
(d) Messenger RNA extracted from those tissues most severely affected by a genetic disease should contain significant levels of mRNA derived from the gene responsible for the defect. Hybridization can be used to see if candidate DNA sequences match those in the mRNA pool. (This assumes that the gene in question is expressed at a reasonably high level. This will usually be true for genes encoding structural proteins and enzymes but not for those encoding regulatory proteins. Note also that the mRNA should be isolated from a healthy person because the defective gene might not be transcribed in patients suffering from the defect.)
(e) Ultimately, sequencing of DNA from healthy and affected individuals should show a difference—if the suspected gene is truly responsible for the hereditary defect.