GENOMICS, MUTATIONS AND THE INTERNET:
The Naming and Use of Parts
Charles R. Scriver Piotr M. Nowacki
- Address: CR Scriver
- deBelle Laboratory
- McGill University-Montreal Childrenís Hospital Research Institute
- 2300 Tupper Street
- Montreal, QC H3H 1P3, Canada
- tel. 514-934-4417
- fax. 514-934-4329
- e-mail mc77@musica.mcgill.ca
Summary
Mutations are the source of genetic variation and diversity; by their effect, some are neutral, others are pathogenic. In contemporary genetics, mutations appear at the interface between genomics (structural and functional) and genetics (heredity), where they serve gene discovery and mapping (genomics) and generate challenges to modify their phenotypic effects (medical genetics). Assuming the human genome harbours 80,000 transcribed genes each possessing at least 100 different (germ-line) alleles in a typical population, how then to record and recover data on at least 8 million human alleles? Bioinformatics is the essential resource to create the corresponding accessible digital libraries (genomic and locus-specific mutation databases) for this purpose, a goal to which The HUGO Mutation Database Initiative (Science 279:10-11, 1998) aspires. Guidelines now exist for naming alleles (Hum Mut 11:1-3, 1998). The principles behind the practice are illustrated by PAHdb, a prototype locus-specific mutation database (NAR 26:220-225, 1998), and by prototype genomic mutation databases (HGMD (NAR 26:285-287, 1998),; the EBI mutation database; and OMIM).
Science is an assault on ignorance (Ridley, 1991). Its legacies are arrays of concepts, databases and technologies. Ignorance is a powerful entity and understanding is its counterpart. Coleridge in his philosophical mode (adapted apparently in this case from Schelling) observed that until he could understand another writerís ignorance, he remained ignorant of the authorís understanding (Coleridge, 1817). His point of view serves science equally well.
I. THE GENOME PROJECT
It was fashionable during the early stages of The Human Genome Project to criticize it as mindless technology. The critics had missed the point because, like most people, they actually knew rather little about genomes, human or otherwise. Accordingly, the Genome Project was truly an assault on ignorance, and it was science; it was feasible because it was being served by a new generation of technology. If he had been here to observe, Coleridge might have said: "I understand our ignorance of genomes and until the project is complete, I remain ignorant of their understanding".
The Human Genome Project (how homocentric the name for a project that embraces genomes both human and non human) is now entering the final stages of its structural phase. Venter and colleagues (Venter et al. 1998) propose that "shotgun sequencing" of the human genome will produce an accurate ordered nucleotide sequence covering more than 99.9% of the human genome by the year 2001, by which time, all human genes will probably have been mapped to precise positions on chromosomes. A major legacy of this assault on our ignorance will be a comprehensive human genome database generated by analysis of multiple individual human genomes (Venter et al. 1998) displaying the obligatory allelic variation that characterizes biological species.
Mapping and sequencing will bring to a close "structural genomics" in the Genome Project. Meanwhile, this phase will have described the genomes of many other self-replicating organisms each interesting in their own way, with corresponding insights on the evolutionary genetics behind different life forms. The first product of the whole genome shotgun assault on self replicating organisms was the genome of Haemophilus influenzae (Fleischmann et al. 1995), other genomes quickly succumbed (Fraser et al. 1995; Bult et al. 1996; Tomb et al. 1997; Klenk et al. 1997; Fraser et al. 1997; Smith et al. 1997; Deckert et al. 1998), and most recently it was the genome of Treponema pallidum (Fraser et al. 1998) which, when expressed in its villainous spirochete, was capable of reducing Lord Randolph Churchill and Freidrich Neitzsche to mumbling incoherence and Franz Schubert to agonized silence.
Function as well as structure.
In the post-structural era of genomics, there is the prospect of "functional genomics" (Lander, 1996; Heiter and Boguski, 1997; Fields, 1997); the term may be new but it describes the old domain we know as "physiology" (and "pathology"). The relevance of functional genomics was highlighted when the initial structural phase of the Yeast (Saccharomyces cerevisiae) Genome Project came to completion (Goffeau et al. 1996; Oliver, 1996; Miklos and Rubin, 1996). Yeast is an ideal "model" organism enabling the analysis of gene function along with genome cross- referencing for the analysis of genes mutated in human disease (Bassett, Jr. et al. 1997). In this regard, "expression genomics" becomes particularly relevant; it includes systematic analysis and documentation of gene expression in other organisms (transgenics), and it reveals the patterns of host gene expression in organs, tissues and cell types during development, and at maturity of the organism (Strachan et al. 1997).
Among the most admired of "model organisms" is Homo sapiens. Francois Jacob (Jacob, 1982) reminds us that the human genome, like all others, is the product of evolutionary tinkering. Ernst Mayr (Mayr, 1982) knows that Homo sapiens, like all other living organisms, has emergent properties where the functioning whole is more than the sum of its genomic parts. These thoughtful gentlemen have pointed us toward the domains of physiology and homeostasis. Accordingly we need many resources to understand functions and their evolutionary origins; those resources include, for example: model organisms (Goodfellow, 1997), the above-mentioned cross referenced genomic database for yeast and man (Bassett, Jr. et al. 1997), "The Oxford grid" describing syntenic regions of homology in mouse and human structural genomes (Searle et al. 1989; Searle et al. 1994); and the bioinformatic tools to identify clusters of orthologous groups (COGs) consisting of individual orthologous sets of proteins (Tatusov et al. 1997) and orthologous sets of paralogs (Henikoff et al. 1997). .
It is no accident that Victor McKusick displayed the Oxford Grid in recent editions of the Catalogs of Mendelian Inheritance in Man (McKusick, 1994); nor that he is a co-author of early papers on comparative genomics in yeast and man (Tugendreich et al. 1994); nor that he has maintained a human genomic database both in print and on line (OMIM). The ultimate focus of the McKusick-style genomic database is a "neoVesalian human anatomy" (Scriver, 1976), both normal and morbid, the latter highlighted by Mendelian variation (McKusick, 1986; McKusick, 1987). Mutation has so often been the means to reveal a particular (human) locus.
Genetics is the study of inheritance, genomics is the study of genomes (Goodfellow, 1997), mutations are studied at the interface between genetics and genomics, (Cotton et al. 1998; Scriver et al. 1998) and the tools for mutation detection, in both scanning and diagnostic modes, are improving steadily (Cotton, 1993; Grompe, 1993; Cotton, 1997). The result is that the rate of mutation discovery, in the human genome, for example, exceeds the ability of the print literature to keep pace. Online digital databases are tools for biological taxonomy and they will serve mutation repositories equally well (Cotton et al. 1998; Scriver et al. 1998). "In silico genetics" is the latest step in the journey of modern human genetics where cytogenetics, somatic cells genetics, molecular genetics, and transgenics are earlier technological milestones along the way (VA McKusick, personal communication).
The term "mutation" has many meanings; here it means "allelic variant" where allele is a unique change in a nucleotide sequence in the DNA molecule. It might be a "pathogenic" allele (disease-causing or phenotype modifying) or it might be "neutral" without apparent effect on phenotype. A pathogenic allele is likely to be disadaptive and to occur at low frequency in the population; a neutral allele is likely to be polymorphic and by definition to occur at > .01 frequency.
If there are 80,000 human genes (a reasonable estimate) (Fields et al. 1994)), and if each gene harbours at least 100 alleles (of any type, again a reasonable number based on current experience) then the human genome will contain at least 8 million different germline alleles; and more if somatic mutations are included. How does one capture, record and distribute information of this magnitude, in just one genome (for Homo sapiens), let alone for all other genomes of equivalent interest? There can be no other approach but the one available in informatics. But first there is the problem of nomenclature. All taxonomy requires the naming of parts and chromosomes, genes and alleles now have their conventions.
Nomenclature
(The beginning of wisdom is calling things by their right name - Chinese proverb; cited in White (White et al. 1997)).
Chromosomes. The chromosomal constitution of an individual is named and described according to the International System of Cytogenetic Nomenclature (Mitelman, 1995). It describes the diploid number, the sex chromosome constitution, and the variation in number and structure of chromosomes. The location of an individual gene is assigned to a banded region on an arm of a particular chromosome.
Genes. How to name genes has been the topic of two recent workshops. A search for standards in naming homologous genes in different organisms occupied the authors of one report(Blake et al. 1997); it contains a useful list of URLs for primary databases about various organisms and for various purposes. The particular problem of naming human genes is addressed in the other report (White et al. 1997); it describes: i) general rules for naming genes, including requirements for a gene symbol, style of symbol, name for a known gene, and naming of arbitrary genes and loci; ii) guidelines for symbol construction by taking into account hierarchical symbols, gene families and series, homologies with other species, along with preferred abbreviations for different species (e.g. for Homo sapiens, use HSA), genes identified only by their sequence information, genes with known protein or enzyme products (for EC numbers, see Nomenclature Committee of the International Union of Biochemistry and Molecular Biology), and genes for clinical disorders which can be taken into consideration when constructing the symbol. Investigators, authors, reviewers, and editors can do mutual service if we all become familiar with these guidelines and use them.
Alleles. Nomenclature for alleles (mutations) is the subject of recently published guidelines (Antonarakis and and the Nomenclature Working Group, 1998). EBI proposes a controlled vocabulary for databases to describe an allele and its components, for example, a vocabulary for DNA sequence, codon change and amino acid substitution for a missense allele.
The use of nomenclature guidelines for naming alleles is strongly recommended.
The systematic approach to naming alleles is centered on the change in nucleotide sequence. In this system, the number comes first and the letter follows. The number indicates the nucleotide by its position in the DNA sequence; the nucleotide sequence is numbered off its reference sequence which can be retrieved from the corresponding database. The letters in the name represent the wildtype and substituted nucleotide respectively. The systematic name for a major PKU-causing allele is c.1222C->T where c. indicates the nucleotide sequence is the cDNA, available under accession number U49897 in the GenBank database
When the reference nucleotide sequence is not accessible (or its use for naming the allele is not preferred), the mutation can be described by an alternative name. The "trivial name" is used as a convenience despite the ambiguities it will entertain (Antonarakis and and the Nomenclature Working Group, 1998), but in some cases its use has become a firm convention (e.g. the D F508 CFTR allele). The trivial name "R408W" describes a mutation in the human PAH gene corresponding to the systematic name given above (c.1222C->T); here the first letter ( R ), preceding the number, indicates the wildtype amino acid (arginine), the number is the codon, and the second letter ( W ) is the amino acid substitution (tryptophan). Table 1 summarizes how the most prevalent allele in the human gene for hepatic phenylalanine hydroxylase would be coded according to existing guidelines for context (the species and corresponding genome), chromosome, locus, gene, allele, gene product, and associated disease (with its OMIM number).
The example in Table 1 is a simple one for the naming of parts. More complex mutations raise problems that are still under consideration (Blake et al. 1997; Antonarakis and and the Nomenclature Working Group, 1998): see also - http://www.2.ebi.ac.uk/mutations/recommendations/naming.html http://www.ncbi.nlm.nih.gov/collab/FT/index.html
Mutation Databases
Upon completion of the structural phase of genomics, and when the functional phase is well underway, alleles that either serve as markers for loci or modify the function of genes will remain important entities in biological taxonomy. Documentation of alleles in genomes will then be a parallel activity equal in importance to that of structural and functional genomics (our prediction). It will be a responsibility of the HUGO Mutation Database Initiative for which RGH Cotton has been the driving force (Cotton et al. 1998; Scriver et al. 1998).
The intended outcome of the HUGO initiative will be an omnifarious public database centered on the genome of a particular species containing a record of all known alleles (germline and somatic) and their biological significance. Progress with the initiative is being recorded at Cottonís dedicated website.
The ultimate genomic mutation database will be an annotated genomic nucleotide sequence in which the biology intrinsic to the sequence, and its significance in any other aspect, will be documented. This "modest proposal" has elsewhere been offered as a role for HUGO, in the case of the human genome (Little, 1998), anticipated by the announcement of HUGO MDI (Cotton et al. 1998), and evident in several initiatives to create human genomic mutation databases already underway:
- The OMIM disease/gene Database
-The SWISS-PROT database provides annotated mutant sequences
As examples of annotated genomic mutation databases, each of the above offers a different approach. OMIM limits itself to documenting the first 25 alleles (mainly pathogenic, and with a few exceptions as to the number), but it provides pointers to the corresponding locus-specific mutation databases, when they exist; OMIM is a respected source of information about Mendelian phenotypes and the corresponding source literature. HGMD (Cooper et al. 1998) documents mutations with their published or reported sources along with additional information about phenotypes. The EBI mutation webpages offer recommendations for database design and provide links to many resources. None of these databases is sufficient by itself for needs in medical genetics, each is a complementary resource and, at the present time, all benefit from the presence of annotated locus-specific mutation databases. The latter serve genomic needs when they are linked to genomic databases or when there is a search engine that can parse data from the locus-specific database and deposit them in the genomic counterpart. Meantime, locus-specific mutation databases support particular needs of corresponding user groups, which in several cases include both investigators and patients. Note that the relevance of nomenclature is here made apparent: without systematic nomenclature, there can be no universal merging of data from different databases to create an integrated and annotated view of all mutations in a particular genome.
Locus-specific mutation databases.
The present need for locus-specific mutation databases is illustrated by the following statistic given to the authors by David Cooper in Feb. 1998. At the time, HGMD contained data on over 12,500 different alleles in 692 different human genes; the database was increasing by over 2500 new entries annually; 93% of the loci documented in HGMD described fewer than 25 alleles per locus and none of these was supported by a locus-specific mutation database. Hence the relevance of HGMD. On the other hand, 45% of known alleles in the human genome were being documented and annotated in locus-specific databases, the majority of which were linked by pointers with HGMD (and also to EBI and OMIM). Some locus-specific mutation databases contain hundreds of alleles each annotated with auxiliary data. The locus-specific databases contain a vast array of annotations and are valuable in their own right. A directory of mutation databases will appear in the forthcoming 8th ed. of MMBID and is presently available as it develops on line.
Design and Content. Databases are created; they develop, evolve, decay and redevelop. The process can be complicated, a document is a formal record of it and it becomes a separate component of the database to provide a mechanism for continuity and longevity. The curator is responsible for the document which, at a minimum, provides a textual printout of tables and fields if the database is relational in design, and a listing of objects and descriptors for other database types (example document). The curator (i.e. editor) is also responsible for accuracy of content.
Content of a mutation database comprises entities and attributes. An entity is a real- world concept, such as "mutation". An attribute, such as the "name" of the mutation, describes the entity. A database contains as many entities as are required with the corresponding descriptors to meet the needs of the user group. There is need for a core group of entities and a minimum but essential degree of standardization; as mentioned above compatibility between databases (genomic and locus-specific) requires standardized nomenclature for alleles.
As for the essential core of information, the DNA sequence has a context, a core of data which includes species, name of the gene, and reference nucleotide sequence. The core of entities is "mutation" and "source of the information"; mutation and source of data can be linked when there is a unique identifier for each mutation in the database. Whereas an objective of any system of mutation description is to reconstruct alignment of the variant nucleotide sequence on the reference sequence, its annotation expands logically when the core data include not only nucleotide change in DNA but the transcript change in mature messenger RNA and the change in the polypeptide (Lehvaslaiho et al. 1998).
Design of a mutation database begins by asking questions about who and what the database will serve: What are the needs of its curator and user group? How can it be made compatible with other mutations databases? Design continues first by listing (and describing) the entities to be recorded in the database; then by listing the descriptors to be attached to the entities. If the database is to be relational, a sketch of the relationships between the entities may help; only entities, not attributes, share relationships; Figure 1 depicts an entity relationship diagram.
EBI provides detailed recommendations for design of mutation databases. Further discussion can be found in emerging HUGO MDI guidelines. Development and deployment of digital mutation databases requires enabling software and a database management system. One approach to design and deployment of a relatively large locus-specific mutation database (PAHdb) is described elsewhere by the present authors (Nowacki et al. 1998); see also PAHdb and its Newsletter available from the authors. PAHdb is linked to a complementary annotated mutation database at SWISS-PROT.
Genomic and locus-specific mutation databases both exist at the present time, and a unified approach is being organized under the Mutation Database Initiative (Cotton et al. 1998). Others (Lehvaslaiho et al. 1998) have proposed a model, which in its ideal form would be a single public human mutation database curated at one or more institution, serving data on allelic variation in all human genes and their homologues. The curatorial task would be shared between the genomic and locus specific databases, and the result would be structured data when there is agreement on nomenclature, content and basic design. As a first step, the EBI group (Lehvaslaiho et al. 1998) has parsed over 30 publicly available human databases for somatic and germline mutations, analyzed them for common data types and made them available through a common user interface. The interface for this approach is provided by the Sequence Retrieval System (SRS), a tool to index, view and link independent databases (Etzold et al. 1996). Data presented under the SRS system reflect the exact contents in source databases at the time of acquisition with the addition of the relevant context information.
Contents of source databases can be broken up into searchable fields. The relevance of creating fields common to different databases becomes apparent when one is attempting to create unified access to mutation databases. Nine categories of information essential (or at least useful) for unified access have been identified (Lehvaslaiho et al. 1998)
Proof of Pathogenicity. Alleles are either pathogenic or neutral. What is the evidence that a mutation affects phenotype (constitutionally or under particular conditions) and how does one provide reliable annotation? First, it is necessary to deal with artefacts and ambiguities; then to settle on criteria that the mutation is a likely cause of phenotype variation (Cotton and Scriver, 1998). Questions arise most often with missense alleles.
Artefacts include errors introduced by PCR; every "new" mutation should be confirmed on a second PCR product. A variant allele may be found but it may not be the allele responsible for the variant phenotype; the whole functional gene should be analyzed; extent of the DNA sequence analyzed should be stated and efficiency of the detection method known. Ambiguity can also be resolved by in vitro expression analysis; see (Waters et al. 1998).
Criteria for pathogenicity of an allele depend on the information at hand; the results of expression analysis and the relative importance of the mutation type. Information at hand includes mutation type; for example, those producing "functional hemizygosity" (Guldberg et al. 1995) are likely to modify phenotype when combined with homologous alleles in autosomal recessive traits. Segregation analysis should reveal consistent association with the variant phenotype. Missense mutations that affect conserved amino acids in the polypeptide product are assumed to have greater functional significance and are more likely to be pathogenic. Frequency of the allele on a panel of 100 normal chromosomes should be stated. A mutation that is polymorphic is less likely to have a significant effect on phenotype, nonetheless it may be a modifier of gene expression.
Information about mutability of the gene in question is also useful. Alastair Brown provides a web-based program.
The present authors used a program available from Michael Krawczak (Cooper and Krawczak, 1993) to analyze predicted mutability in the exonic nucleotide sequence of PAH (Byck et al. 1997).
Comment
We have introduced the reader to several URLs and references about bioinformatic resources for mutation database. Let it be known that one of the authors (CRS) not so long ago had to learn what "URL" means and still canít operate a computer, while the other author (PMN) enjoys the rapid steady acquisition of essential expertise. Our advice for the novice (and at some time and in some particular way, each of us is a novice) is to: i) work with a colleague with expertise to complement our own; ii) take generic guidance from articles (such as (Harper, 1995)) and journals (such as Trends Guide to the Internet/Elsevier). Meantime, on-line mutation databases have put their intellectual property in the public domain; what to do to protect that property is an issue that has not escaped notice (Scriver et al. 1998; Gardner and Rosenbaum, 1998).
Acknowledgements
We thank Lynne Prevost, our curatorial colleague, who first created PAHdb in her Wordprocessor long ago. Dick Cotton, Heikki Lehvaslaiho and Victor McKusick among many have been stimulating colleagues. This work has been supported in part by the Medical Research Council (Canada), the Networks of Centers of Excellence (Canadian Genetic Diseases Network), Les Fonds de la Recherches en Santé du Québec (Réseau de Médecine Génétique Appliquée) and the Interuniversity Institute for Population Research (IREP).
Bibliography
- Antonarakis SE, and the Nomenclature Working Group (1998) Recommendations for a nomenclature system for human gene mutations. Human Mutation 11:1-3
- Bassett DE, Jr., Boguski MS, Spencer F, Reeves R, Kim S, Weaver T, Hieter P (1997) Genome cross-referencing and XREFdb: Implications for the identification and analysis of genes mutated in human disease. Nature Genetics 15:339-344
- Blake JA, Davisson MT, Eppig JT, Maltais LJ, Povey S, White JA, Womack JE (1997) A report on the International nomenclature workshop held May 1997 at the Jackson Laboratory, Bar Harbour, Maine, USA. Genomics 45:464-468
- Bult CJ, White O, Olsen GJ, et al. (1996) Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii.. Science 273:1058-1073
- Byck S, Tyfield L, Carter K, Scriver CR (1997) Prediction of multiple hypermutable codons in the human PAH gene: Codon 280 contains recurrent mutations in Quebec and other populations. Hum Mut 9:316-321
- Coleridge, S.T. Biographia Literaria., London:1817.
- Cooper DN, Ball EV, Krawczak M (1998) The human gene mutation database. Nucleic Acids Res 26:285-287
- Cooper, D.N. and Krawczak, M. Human Gene Mutation. Oxford:bIOS Scientific Publishers, 1993. pp. 141-144.
- Cotton RGH (1993) Current methods of mutation detection. Mutation Research 285:125-144
- Cotton, R.G.H. Mutation Detection, New York:Oxford University Press, 1997.
- Cotton RGH, McKusick VA, Scriver CR (1998) The HUGO Mutation Database Initiative. Science 279:10-11
- Cotton RGH, Scriver CR (1998) Proof of "Disease-Causing" Mutation. Human Mutation 12:1-3
- Deckert G, Warren PV, Gaasterland T, Young WG, Lenox AL, Graham DE, Overbeek R, Snead MA, Keller M, Aujay M, Huber R, Feldman FA, Short JM, Olsen GJ, Swanson RV (1998) The complete genome of the hyperthermophilic bacterium Quifex aeolicus. Nature 392:353-358
- Etzold T, Ulyanov A, Argos P (1996) SRS: Information retrieval system for molecular biology data banks. Methods Enzymol 266:114-128
- Fields C, Adams MD, White O, Venter JC (1994) How many genes in the human genome? Nature Genetics 7:345-346
- Fields S (1997) The future is function. Nature Genetics 15:325-327
- Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496-512
- Fraser CM, Cogayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al. (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397-403
- Fraser CM, Casjens S, Huang WM, et al. (1997) Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390:580-586
- Fraser CM, Norris SJ, Weinstock GM, et al. (1998) Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281:375-388
- Gardner W, Rosenbaum J (1998) Database protection and access to information. Science 281:786-787
- Goffeau A, Barrell BG, Bussey H, et al. (1996) Life with 6000 Genes. Science 274:546-567
- Goodfellow P (1997) A celebration and a farewell. Nature Genetics 16:209-210
- Grompe M (1993) The rapid detection of unknown mutations in nucleic acids. Nature Genetics 5:111-117
- Guldberg P, Mikkelsen I, Henriksen KF, Lou HC, Guttler F (1995) In vivo assessment of mutations in the phenylalalnine hydroxylase gene by phenylalanine loading. Characterization of seven common mutations. Eur J Pediatr 154:551-556
- Harper R (1995) World Wide Web resources for the biologist. Trends in Genet 11:223-228
- Heiter P, Boguski M (1997) Function Genomics: It's All How You Read It. Science 278:601-602
- Henikoff S, Greene EA, Pietrikovski S, Bork P, Attwood TK, Hood L (1997) Gene Families: The Taxonomy of Protein Paralogs and Chimeras. Science 278:609-614
- Jacob, F. The Possible and the Actual. (The Jessie and John Danz lectures). Seattle and London:University of Washington Press, 1982.
- Klenk HP, Clayton RA, Tomb JF, et al. (1997) The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390:364-370
- Lander ES (1996) The new genomics: global views of biology. Science 274:536-539
- Lehvaslaiho H, Ashburner M, Etzold T (1998) Unified access to mutation databases. Trends in Genetics 14:205-206
- Little P (1998) Human genome annotation - a possible role for HUGO? Nature Genetics 19:222
- Mayr, E. The growth of biological thought: Diversity, evolution and inheritance, Cambridge, MA:Harvard University Belknap Press, 1982. pp. 63-67.
- McKusick VA (1986) The morbid anatomy of the human genome. Medicine 65:1-33
- McKusick VA (1987) Toward a complete map of the human genome. Genomics 1:103-106
- McKusick, V.A. Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders., Baltimore :Johns Hopkins University Press, 1994. Ed.11th
- Miklos GLG, Rubin GM (1996) The role of the genome project in determining gene function: insights from model organisms. Cell 86:521-529
- Mitelman F( (1995) ISCN - 1995: An International System for Human Cytogenetic Nomenclature.. S Karger, Basel
- Nowacki P, Byck S, Prevost L, Scriver CR (1998) PAH Mutation Analysis Consortium Database: 1997. Prototype for relational locus-specific mutation databases. Nucleic Acids Res 26:220-225
- Oliver S (1996) From DNA sequence to biological function. Nature 379:597-600
- Ridley M (1991) A survey of science: The edge of ignorance. The Economist 1-22
- Scriver CR (1976) Genetics: Voyage of discovery for everyman. (Presidential Address). Pediat Res 10:865-872
- Scriver CR, Nowacki PM, Cotton RG (1998) The HUGO Mutation Database Initiative. Genome Digest 5:8-11
- Searle AG, Peters J, Lyon MF, Hall JG, Evans EP, Edwards JH, Buckle VJ (1989) Chromosome maps of man and mouse. IV. Annals of Human Genetics 53:89-140
- Searle AG, Edwards JH, Hall JG (1994) Mouse homologues of human hereditary disease. J Med Genet 31:1-19
- Smith DR, Doucette-Stamm LA, Deloughery C, et al. (1997) Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J Bacteriol 179:7135-7155
- Strachan T, Abitbol M, Davidson D, Beckmann JS (1997) A new dimension for the human genome project: towards comprehensive expression maps. Nature Genetics 16:126-132
- Tatusov R, Koonin EV, Lipman DJ (1997) A Genomic Perspective on Protein Families. Science 278:631-637
- Tomb JF, White O, Kerlavage AR, et al. (1997) The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388:539-547
- Tugendreich S, Bassett DE, Jr., McKusick VA, Boguski MS, Hieter P (1994) Genes covered in yeast and humans. Human Molecular Genetics 3:1509-1517
- Venter JC, Adams MD, Sutton GG, Kerlavage AR, Smith HO, Hunkapiller M (1998) Shotgun sequencing of the human genome. Science 280:1540-1542
- Waters PJ, Parniak MA, Nowacki P, Scriver CR (1998) In vitro expression analysis of mutations in phenylalanine hydroxylase: Linking genotype to phenotype and structure to function. Human Mutation 11:4-17
- White JA, McAlpine PJ, Antonarakis S, et al. (1997) Guidelines for human gene nomenclature (1997).. Genomics 45:468-471
Figure Legend.
An entity relationship diagram describing a module in PAHdb , a relational locus-specific mutation database (Nowacki et al. 1998). This portion of a much larger diagram (Nowacki PM. Thesis. McGill Univ. 1998) describes a single entity in this locus-specific mutation database. An entity is a thing in the real world with its own existence; the entity shown here (in the rectangular box) is "mutation". Entities have attributes to describe them; attributes correspond to fields in data tables and here they are the descriptors of mutations. The attributes shown here (in the ellipses) are linked to the entity by faint lines. In the relational model, only entities are related. In the diagram, "mutation" is connected to other entities (not shown) by heavy lines. (For the complete ER diagram, visit the website listed below: the textual tables and field listings for this (and the other) entities will also be found in the document). PAHdb has a modular design and each module contains an entity with its attributes; modules in PAHdb contain the following entities: mutation, reference, nucleotide sequence, association (with population, geographic region, relative frequency), polymorphic haplotype background, expression analysis in vitro (human and rat). The database management system is Visual FoxPro 5.0, Windows NT. For further information about PAHdb tables, see http://www.debelle.mcgill.ca/pahdb/docu/.
THE NAMING OF PARTS; AN ILLUSTRATION
ENTITY NAME (and SYMBOL)
· Species
H. sapiens (HSA) ·
Chromosome 12 ·
Locus 12q24.1 · Gene
(Symbol) Phenylalanine Hydroxylase
(PAH) ·
Reference Sequence cDNA, U49897 (GenBank) ·
Allele c.1222C->T (systematic) R408W (trivial) ·
Product PAH (EC 1.14.16.1) ·
Disease PKU and non-PKU HPA · OMIM
# 261600 · on line
db
Addendum:
Since this paper was presented to the SSIEM Membership at the 1998 Annual Meeting (York Univ. UK); the following should be mentioned:
Software developed by Heikki Lehvaslaiho can be accessed at http://www2.ebi.ac.uk/cgi-bin/mutations/check.cgi
Tornar a Base de Dades