Neurogenomics: at the intersection of neurobiology and genome sciences


Mark S Boguski & Allan R Jones
Mark S. Boguski and Allan R. Jones are at the Allen Institute for Brain Science, 551 N. 34th Street, Seattle, Washington 98103, USA.

Correspondence should be addressed to Mark S Boguski markbo@alleninstitute.org


Neurogenomics is the study of how the genome as a whole contributes to the evolution, development, structure and function of the nervous system. It includes investigations of how genome products (transcriptomes and proteomes) vary in time and space. Neurogenomics differs markedly from the application of genome sciences to other systems, particularly in the spatial category, because anatomy and connectivity are paramount to our understanding of function in the nervous system. We focus here on some of the influences of genomics and its associated technologies on neuroscience. We discuss comparative genomics, gene expression atlases of the brain, network genetics and applications to behavioral phenotypes, and consider the culture, organization and funding of genome-scale projects.


In the years immediately before and after the Human Genome Project (HGP) began, the biology community at large viewed the project with considerable skepticism. As Princeton University president Shirley Tilghman recalled1, there were three main criticisms: first, that the sequence would not be interpretable; second, that generating the sequence would be boring production work; and third, that the 15-year project would divert resources away from the most creative and imaginative way to do science, that is, in small groups consisting of a principal investigator and a number of students and postdoctoral fellows. However, by 1996, when large amounts of transcriptome data from humans and large amounts of genomic sequence from 'model organisms' had become available, most biologists had changed their minds. As Tilghman put it1, "A genome enthusiast was a genome critic that just got a hit in the EST database"2. (Expressed sequence tags, or ESTs, are cDNA subsequences that are generated rapidly to inventory the transcribed components of a genome—mostly protein-encoding mRNAs.)

By 1996, the first large-scale transcript map of the human genome, containing about 15,000 genes, was published3 and made available on the still-nascent World Wide Web, thus setting the stage for the accelerated positional cloning of hundreds of disease genes, including many mutant alleles underlying neurodegenerative and neuropsychiatric disorders. By 1998, the genomic sequence of the first metazoan organism, Caenorhabditis elegans, was completed, and the enormous ramp-up to produce the first complete draft of the human genome in the next three years was undertaken4. But at that time, eight years into both the HGP and the 'Decade of the Brain', genomics seemed to have had little impact on neuroscience: a perspective in the journal Neuron, marking the end of the first ten years of that journal and the sixth decade of modern neuroscience5, mentioned neither the HGP nor the genomics of any other organism (although this article did point out that progress had been made in what might be called the neurogenetics of Huntington and Alzheimer disease). It was not until 2002 that a consortium of three institutes of the US National Institutes of Health (NIMH, NIDA, NINDS) held a meeting to discuss and set priorities for molecular neurobiology in the 'postgenomic' era (http://www.drugabuse.gov/MeetSum/Postgenomic.html).

Currently there are nearly 1,000 completed or ongoing genome projects representing an impressive array of species. This creates a wealth of new data for comparative genomics studies and for launching new 'model organisms' for biomedical research. Comparative studies of the human and mouse genomes (with experimental follow-up), for example, are expected to contribute enormously to our understanding of physiology, behavior and psychiatric disease6, 7. A draft of the chimpanzee genome has recently become available and has already led to some intriguing observations pertaining to the evolution of speech, hearing and language in the human lineage8, 9.

Here we consider some of the "lessons learned and promises kept"1 of genomics10, and consider how this knowledge might be applied to neurobiological problems. We describe factors that characterize a genomic approach to a problem, some functional genomics applications to studying gene expression in the brain, as well as comparative genomics and 'network genetics'. We conclude with some important considerations of the culture, organization and funding of genome-scale projects.

Revolutions and evolutions
One of the technological revolutions that made genomics and the HGP possible was the Nobel prize-winning innovation of 'rapid' DNA sequencing11 in 1975. However, it took roughly the next two decades for the evolution and development of this technology (and others, such as the polymerase chain reaction or PCR) to progress to the point when another revolutionary period, the genome era, could begin. This second revolution was largely a matter of the rate of data generation (throughput) and the almost unimaginable scales that could consequently be achieved. Today entire genomes can be rapidly sequenced and interpreted, whereas in 1977 the cloning and sequencing of a single mRNA (rabbit -globin) merited the cover and three separate papers in an issue of the journal Cell (vol. 10, issue 4): one for the 5' region, one for the coding sequence and one for the 3' region. One of the important 'promises kept' by the HGP was the 'freedom to do biology'. This meant that biologists would no longer have to spend their time laboriously producing reagents (such as clones and sequences) in their own laboratories. Instead, genomics has created vast databases of basic information and corresponding physical resources that allow investigators to focus on research in their fields.

Functional genomics refers to the development and application of global (genome-wide or system-wide) experimental approaches to assess gene function by making and using genome-scale information and reagents, and is characterized by high-throughput or large-scale experimental methodologies combined with statistical and computational analysis of the results12. Functional genomics enables scientists to ask entirely new types of questions that require the analysis of large numbers of a system's components simultaneously, for example to enhance the understanding of complex physiological functions13.

Strategies for deploying genomics technologies and analyzing the resultant data also had to evolve. For example, it was not obvious that incomplete and inaccurate, but inexpensive and rapidly generated survey data (such as ESTs) could be very useful and greatly expand and accelerate hypothesis-driven, follow-up studies. The practice of systematically collecting data, even before knowing precisely all of the ways in which it might be used, has been referred to as 'discovery-driven' science and has even led to some reconsiderations of biological epistemology14, 15. It took some time for researchers pursuing hypothesis-driven versus discovery-driven approaches to realize that they were allies and not antagonists.

Genomics, particularly high-volume DNA sequencing, has proven the value of discovery-driven science. For example, ESTs16 were initially viewed as kind of an illegitimate offspring of the HGP17, but they have provided at least six generations of utility over the past dozen years, including gene discovery18, gene mapping19, the design and construction of microarrays20, exon detection and alternative splicing analysis21 and discovery of an important class of SNPs—single-nucleotide polymorphisms—in coding sequences22, 23. EST sequencing also remains a mainstay method for transcriptome analysis. (See applications to the nervous system below.)

Potential perils in the application of genome-scale technologies
When new technologies become available, many investigators are understandably eager to try them out (and journals are eager to publish the results). Examples include comparative sequence analysis14, 24 in the early-to-mid-1990s and DNA microarrays25, 26 in the late 1990s. Such periods can be perilous because the limits and optimal use of the technologies may not be well understood, from experimental design to data analysis and interpretation27, 28. Reproducibility may even be a question because the expense of and/or limited access to the technology may lead to some experiments being under-controlled, or insufficiently validated by other research groups. Moreover, new statistical methods and analytic tools are often required to properly interpret the results and are usually unavailable or incomplete when the first data appears. Kohane and colleagues27 provide a number of cautionary tales derived from published studies that used comprehensive genomic measurement technologies; misinterpretation and over-interpretation of results are apparently distressingly common. Mirnics and Pevsner29 (p. 434–439 of this issue) describe many of the difficulties in applying microarray technology to brain tissue. Proteomics technologies are now undergoing their early deployments, with associated growing pains28. Both technologies are limited by non-standardized and non-homogeneous starting materials removed from their anatomical context.

Large-scale, high-throughput biology requires an industrial mindset and a different level of management than does a more traditional research project environment. Strict standardization of protocols and procedures is essential, along with rigorous definition of and adherence to quality control and quality assurance standards. Although these principles are applicable to large-scale projects in any field, they may be particularly challenging when applied to neurobiology because of the anatomical and physiological complexity of many of the experimental systems.

As a final caution, discovery-driven projects usually generate prodigious amounts of data. However, as Geoffrey Duyk of TPG Ventures puts it, confusing throughput with output often blurs the distinction between data and knowledge25. For example, it is very unlikely that generating large repositories of mass spectrometry, proteomics data will yield the same value as high-volume DNA sequencing28.

Comparative (evolutionary) genomics and genome annotation
The field of comparative genomics was born from the comprehensive and systematic cross-species comparisons of gene products30 with the underlying assumption that aspects of function can be inferred from evolutionary conservation of DNA sequence. Such studies led to powerful new connections for investigating the biochemistry of human disease genes, even in model organisms as simple as yeast31. However, the recent completion of the mouse genome portends an unprecedented access to highly informative models of human somatic and psychological diseases6, 7.

With 'excess' sequencing capacity in the wake of the HGP, the National Human Genome Research Institute (NHGRI) and its advisors devised and implemented a process for selecting and sequencing additional genomes. This served to enhance the interpretation of the human genome by comparative genomic analysis, to fill in major phylogenetic gaps in our sampling of extant genomes in the biosphere, and to create new model organisms. This program, dubbed GRASP (for Genome Resources and Sequencing Priorities), has been operating since early 2002 and reviews white-paper proposals (www.genome.gov/page.cfm?pageID=10002154) from the community. Among the organisms selected to date, the honeybee Apis mellifera was chosen, in part, for its ability to learn and engage in complex social behaviors: a complete knowledge of the genome is expected to lead to rapid advances in honeybee genetics and its application to the analysis of learning and behavioral phenotypes (www.genome.gov/11509819).

Publications describing the human genome sequence included extensive bioinformatics-derived annotations of various features, including genes and gene products, often with only inferred or hypothetical functions: it was quickly noted that experimental validation of these computational predictions would be valuable32. Large-scale anatomical and developmental transcriptome mapping, combined with comparative genomics analysis, has the potential to both substantiate hypothetical (predicted) genes and lead to a more focused pursuit of their functions.

Transcriptional profiling of the mammalian brain
The term 'transcriptome' was coined in 1995 by Velculescu and colleagues33 in the context of applying the technique of serial analysis of gene expression (SAGE)34 to the discovery, description and cataloging of mRNA transcripts in yeast. However, transcriptome analysis has a long and interesting history of applications in neuroscience. The greater transcriptional complexity of the brain compared with other organs was first addressed by hybridization analysis in the 1970s, leading to the claim that one-third of the mammalian genome was exclusively dedicated to brain function35, 36. About a decade later, it was possible to summarize the structural properties of 39 'brain-specific' mRNAs37, although it was still necessary to remind readers that "the brain is made of proteins" and that "proteins are encoded by mRNAs".

The first 'genome era' approach to this question appeared in 1992, when one of the earliest applications of EST sequencing identified 2,375 genes expressed in the human brain18. This EST survey approach later formed the basis of the NIMH-NINDS 'Brain Molecular Anatomy Project' (BMAP, www.nimh.nih.gov/grants/0006-cbd1.cfm), and EST surveys continue to be used to study neural tissue transcriptomes. Both SAGE and microarray technologies have been more recently applied to transcriptional profiling of single cells in retina38 and neuronal progenitors39, respectively. However, gene expression profiling in the central nervous system may culminate in detailed anatomical atlases of gene expression in the developing and adult brain40, 41, 42.

According to Cowan43, new methods for anatomical analysis that were first developed in the 1970s led to a renaissance in morphologic studies of the nervous system. Gene expression profiling of cell populations in the brain has the potential to lead to another quantum advance characterized by a new understanding of the number of and relationships among cell types as defined by what portion of the genome is uniquely expressed in them. New insights into function may be gained by analyzing ensembles of transcriptional activities. The full potential of this approach, however, will only be realized by genome-scale, discovery-driven research.

One genome-scale approach to studying the neuroanatomy of gene expression is micro-dissection of particular brain structures followed by mRNA isolation from cells and analysis on microarrays44. This method is, in principle, quantitative, but anatomical resolution and fidelity are limited, and confusion may arise from the combinations of different cell types in a dissected specimen.

Another approach is high-throughput in situ hybridization analysis40. This method can ideally deliver cellular resolution and anatomical fidelity. A major new effort, based on this approach, to create an expression atlas of the adult mouse brain was recently announced45 (see www.brainatlas.org).

The NINDS GENSAT project is creating a gene expression atlas of the mouse CNS based upon individual genes that have been inserted into the mouse genome using bacterial artificial chromosomes (BACs)41. This method is relatively expensive and time-consuming, but yields transgenic mice that may be used for follow-on studies. Other BAC-based technologies are also in use46.

All these approaches have their strengths and weaknesses. Sensitivity, specificity, reproducibility, quantifiability, throughput and cost are all important factors. Although substantial challenges will have to be met, the ultimate goals of anatomical fidelity, cellular resolution and connectivity seem increasingly achievable through the application of genomics approaches and technologies.

'Network genetics' and neurobehavioral phenotypes
Comparative genomics is ideally capable of elucidating interspecies similarities and differences in genome structure and function, but it is intraspecies variation (polymorphism) among genes that underlies critical differences in phenotypes such as susceptibility to disease. The merging of genomics and statistical genetics47, 48 may lead to a revolution in our ability to decipher the genetic basis of complex phenotypes, including common, polygenic diseases and complex behavior. Not only can we now examine variation on a genome scale through high-throughput genotyping technologies, but we can also use the outputs of functional genomics technologies (such as microarray-based transcriptional profiles29) as quantitative phenotypes or traits to assess genotype–phenotype correlations. The power of these approaches, combined with new strategies for high-throughput behavioral phenotyping49, could lead to major advances in our understanding of the molecular-genetic basis of complex behavior and neuropsychiatric disease.

Gene expression profiles observed in segregating populations have refined clinical traits into subtypes that are under the control of different genetic loci47. Similarly, gene expression patterns in recombinant inbred strains of mice correlate with neurobehavioral phenotypes48. Although this approach might be applied to the analysis of any complex trait, it is particularly exciting to imagine that human psychiatric disorders could be accurately diagnosed by biochemical means and broken down into various sub-phenotypes that have different contributing or causative genes. This prospect has enormous implications for the more effective use of existing therapeutics as well as the identification of new drug targets and development of novel therapies.

The culture, organization and funding of genome-scale projects
A recently published study from the Institute of Medicine and the National Research Council explores a new paradigm of biomedical research: hypothesis-generating, discovery-driven science enabled by technology advances that allow for high-throughput data collection and analysis50. This new paradigm—of which the HGP is the most obvious but far from only example—has raised many questions about how such projects should be initiated, funded, organized, managed, staffed and evaluated.

In the wake of the HGP, various NIH institutes struggled with how to support research that depends on new high-throughput or large-scale and expensive technologies. A case in point is the National Heart, Lung and Blood Institute (NHLBI), which approached this challenge with the concept of Programs in Genomic Applications, or PGAs. The goals of this program are to develop information, tools and resources to link genes to biological function on a genomic scale and to provide workshops, courses and visiting scientist programs to facilitate the training of researchers in the use of the data and related technologies developed by the PGAs51. NIH neuroscience institutes have supported some genome-wide initiatives, such as large-scale mutagenesis and phenotypic screening52 and national microarray centers (http://arrayconsortium.cnmcresearch.org/NINDS/home.do).

In these examples, the NIH provides essentially all of the support. But there is another model in which a consortium of government and industry organizations provides important resources to the community. In the past several years, NHGRI has initiated and led consortia such as the International Haplotype Map (HapMap) Project53 and the SNP (single nucleotide polymorphism) Consortium54 to create maps and databases of human genetic variation to accelerate the study of genes associated with complex, polygenic diseases.

Historically, this trend began in 1993 with a project called the Merck Gene Index55, 56, which pioneered the concept of government–industry collaboration to provide, for the community, a genome-scale resource of pre-commercial ('precompetitive') biological data. In this case, Merck & Co. provided financial support for random, high-throughput EST sequencing of the human, and subsequently, the mouse, transcriptomes at the Washington University Genome Sequencing Center, which then submitted the data to the new dbEST division2 of the NIH-sponsored GenBank database for unrestricted pubic access. The physical cDNA clones corresponding to the EST sequences were also available for unrestricted use56.

Some of the technological but also sociological and funding challenges of neuroscience in the 'information age' are coming to the forefront57, 58, 59. For example, the sharing of primary data, even prior to publication, is now the norm in genomics and functional genomics studies, but there is apparently a reluctance to adopt this practice in the neuroscience community60. Some outmoded aspects of academic culture and reward systems with respect to discovery-driven science are significant obstacles57, 60. However, genome-scale projects are too resource-intensive to justify duplication in multiple laboratories. Public and even commercial support of such projects will favor data access policies that promote the broadest possible uses of the data.

Conclusions and future prospects
The historical trends toward large-scale research projects and government-industry collaborations have culminated in a new 'NIH Roadmap' that will result in new models for interdisciplinary team science and novel public–private partnerships61. It is important to begin thinking about how this trend can advance the neurosciences, both basic and applied. The time seems ripe for a new and vigorous exploration of the frontier between neurobiology and genome sciences.

What might be some of the paths forward? The first, and perhaps the most obvious, will be the application of functional genomics technologies to existing model systems for obtaining more global views of gene expression in the nervous system. Transcriptome surveys and analyses and large-scale neuroanatomical mapping of gene expression are already underway and may define a new functional anatomy of the brain. Major advances in the diagnosis and treatment of psychiatric diseases could result if brain gene expression patterns could be developed into quantitative traits and correlated with genetic variation and neurobehavioral phenotypes.

Other advances will come from the application of comparative genomics studies focused on neurobiology and behavioral genetics. The development of new model organisms (such as the honeybee and zebrafish62) for neuroscience research is being accelerated by extensive genome sequencing and the application of comparative and functional genomics technologies.

Soon perhaps, neurogenomics may be able to begin filling in what has been called "the gene gap"63—that is, the curious lack of correspondence between genome size and organismal and behavioral complexity. It remains a mystery, for example, why human beings, with only about 20% more genes than a soil nematode (24,000 versus 19,000 genes), can develop a brain with 100 billion neurons and 1015 connections, whereas C. elegans has only 302 neurons with about 7,000 synapses64. Neurogenomics could perhaps even serve as a new bridge between reductionist and more holistic approaches to the analysis of the nervous system and lead to a new kind of systems neuroscience akin to the re-conceptualization of 'systems biology' in the genome era.