Molecular Biology

Chapter 24 Outline

 

Sequencing whole genomes:

·  Genomes that have been sequenced (T24.1); Number of base pairs in the yeast, E. coli Drosophila and human genomes.

·  Vectors for large scale genome projects; Yeast Artificial Chromosomes (YACs) have telomeres, origins of replication, centromeres and a insertion site for foreign DNA (F24.2); YAC benefits are that they can contain > 1Mb (million bases) of DNA and they can be kept in yeast; YAC drawbacks are that it is hard to insert foreign DNA, they are hard to isolate, they are unstable and they can scramble inserts; Bacterial Artificial Chromosomes (BACs) are insertions of foreign DNA into vectors based on the F plasmid of E. coli (F24.3); BACs avoid the problems inherent in YACs, but have a smaller insert capacity - up to 300,000bp.

·  The human genome was sequenced using two strategies: The publicly funded strategy was to make a detailed genetic and physical map then sequence the clones comprising the physical map, the private Celera venture would make clones and sequence them, then piece together the sequence based on overlaps between clones.

·  Mapping the human genome; Restriction Fragment Length Polymorphisms (RFLPs) can be used to as genetic markers because they differ between individuals; RFLPs are detected by probing DNA digested with restriction enzymes with probes to specific regions of DNA (F24.4); Variable Number Tandem Repeats (VNTRs) are polymorphic in size in more people; They are based on differences in the number of tandem repeats in minisatellites, but they tend to be found mainly near the ends of chromosomes; Sequence Tagged Sites (STSs) are 60-1000bp sequences that can be detected by PCR; STSs used in mapping are microsatellites, which are 2-4bp long repeats that vary between people and can be found anywhere on the chromosome; Recombination between microsatellites can be used for mapping; STSs can be used to assemble a set of clones called contigs containing a gene of interest (F24.6); Radiation Hybrid Mapping can be used to generate maps between STSs over longer distances than single BACs; For radiation hybrid mapping, DNA in human cells is fragmented by irradiation and fused to hamster cells, where they contain only a few human chromosome fragments; The more often STSs are found together in hybrid cells indicates they are close together. 

·  Shotgun sequencing the human genome (F24.7); A library of 300,000 BAC clones each containing 150,000bp of sequence is made and sequenced at the end to produce sequence tags (STCs) every ~5000bp; The clones are fingerprinted by restriction enzymes to determine their size and whether they are scrambled; They abandoned a more conservative strategy to then generate 35 billion bp of sequence and put the sequence together using computer algorithms that search for overlapping sequences.

·  What we have learned from human genome sequence: Based on Chromosome 22, (1) there are several gaps in the sequence due to unclonable or unsequenceable regions (T24.2), (2) There are 679 annotated genes (i.e. known genes, related genes, predicted genes and pseudogenes), (3) Coding regions account for a tiny fraction of the genome (introns=39%, exons =3% and repeated sequences = 41%) (T24.3), (4) Recombination rate varies across the chromosome (F24.8), (5) The chromosome has several local and long range duplications, and (6) Large pieces of the chromosome 22 are conserved based on homologous genes (i.e,. syntenic), in mice (F24.9); Chromosome 21 has been sequenced and has a much lower gene density (only ~225) and are syntenic to mouse chromosome 10.

·  Both public and private groups estimate ~30,000 genes in the human genome, which is only twice that of fruit flies or nematodes; The expression of the human genome (i.e. splicing) is more complex, thus producing perhaps 100,000 different proteins.

 

Functional Genomics:

·  Expression of all genes can be monitored using microarray technologies; Microarrays are glass slides that contain sequences of some or all genes in an organism that can be hybridized to probes to determine the state of gene expression; DNA microarrays can be produced by affixing ~1nL of DNA to a spot on a slide (F24.10) or by synthesizing oligonucleotides representing some or all genes directly on a slide (F24.11); These DNA Œchips¹ are then hybridized with fluorescently labeled mRNA/cDNA from one or more tissues and the signal is detected and compared to determine the levels of gene expression (F24.12).

·  Once a gene was mapped to a small region of the genome, it needed to be identified; Genes can be identified by first finding exons within the region via exon trapping (F24.14) or by detecting unmethylated CpG islands (which detect transcribed genes) using the HpaII restriction enzyme.

·  Huntington¹s Disease (HD) is a dominant progressive nerve disorder that does not cause symptoms until later in life; Researchers used RFLPs to test a Venezuelan family that had a high incidence of HD through 7 generations; One RFLP marker (G8) was tightly linked to the HD phenotype and had two HinDIII polymorphisms giving four haplotypes (F24.15); The G8 probe is hybridized to DNA from the affected families to determine their haplotypes (F24.16); The haplotypes are compared to the HD phenotypes and for the Venezuelan family the mutant HD gene is associated with haplotype C (F24.17); The G8 probe is on chromosome 4; Using exon trapping they identified a transcript that had an unusual repeat of 23 CAG glutamine codons; The number of repeats is correlated with the disease: 11-34 copies have no incidence of HD, but >38 copies are associated with the disease; The severity and age of onset correlates with the number of repeats.

·  Single Nucleotide Polymorphisms (SNPs) are single nucleotide differences among individuals; These polymorphisms can be used to identify genes for human diseases and polygenic traits such as intelligence, or to make correlations with responses to certain drugs (pharmacogenomics)

·  Bioinformatics is the building and manipulation of biological databases (Genomic DNA, protein, ESTs, etc) to put together information on what genes are expressed, when they are expressed, where they are expressed, what they do and what other genes are needed for them to carry out their function.

·  DNA chip technology promises to provide information on the transcriptome in different cells as a measure of gene expression; Proteomics is the identification and analysis of proteins and their patterns of expression; Proteins are identified by separation on 2D gels, but many proteins are either too hydrophobic or too low in abundance to detect by 2D gel electrophoresis; For proteins that can be detected on 2D gels, partial sequence can be obtained by matrix-assisted laser desorption-ionization time-of-flight (MALDI-TOF) spectroscopy; If this is done for an organism whose sequence is known, the genes corresponding to these sequences can be identified.