National Biospecimen Network and Public Health *
Two essential elements contribute to the risk of all disease, including cancer: Genetic factors and
environmental exposures. The completion of the first draft of the human genome will allow
researchers to characterize genotypes to fine degrees of detail, and identify genetic traits
associated with individuals. However, much of the genetic variation that is associated with
cancer risk appears to modify risk only in the presence of environmental variability, both for
exposures that increase risk and exposures that decrease risk. There are 10- to 200-fold
differences in rates of disease when comparing different geographic locations around the world,
and over 50 years, up to 10-fold when comparing the same geographic location over time. These
differences cannot be explained by differences in genes, but only by differences in exposures,
modified by the interaction between genetic variation and such exposures.
There exists a broad array of exposures that may influence the presence or absence of disease in
humans. The accurate characterization and measurement of many of the environmental
exposures is difficult, but there is extensive experience in the epidemiologic community. The
human species has adapted to a wide variety of different environments, cultures, diets (both
marginal and excessive), microorganisms and parasites, toxic exposures, and bad habits. There
are thus a wide variety of susceptibilities to, and protections against, these exposures.
Accordingly, an evaluation of variations in environmental exposures is necessary, along with
measurements of genetic variation, to give a true picture of the causes of disease.
In addition, the characterization of the disease phenotypes is still problematic, with a myriad of
classification schemes for different organs and systems that ranges from precise molecular
characterization to vague syndromes. Exactness in description of disease phenotypes is necessary
to identify the reasons for increased or decreased disease risk. The greater the degree of precision
of phenotypic classification, the higher the likelihood of being able to reduce susceptibility or
increase resistance (prevention), to detect early disease, and to treat at the earliest opportunity.
The key to obtaining phenotypic precision is detailed outcome data. As more outcomes
accumulate with time, and as classification schemes improve, the opportunities to define and
redefine homogeneous phenotypic subsets will improve.
To attempt to establish the complete pattern of human disease susceptibility and resistance, and
to identify more precise phenotypes, what is needed is a study of a very large number of
ethnically diverse individuals who are well characterized genetically, whose exposures are well
mapped, and whose illness pattern and mortality can be monitored. This cohort, labeled the “Last
Cohort,” would examine the impact of exposures on causes and rates of disease, and study the
interaction of these exposures with genetic variation. Blood samples would be collected from
healthy individuals, along with detailed information about each individualincluding behaviors
(e.g., diet, smoking, exercise), medical history, reproductive history, family history,
demographics, and geographical location. Over time, a certain proportion of these healthy
individuals (see below) would develop diseases. With the passage of years, researchers would
use the blood samples collected from the healthy donorsplus the additional exposure data on
these donorsto go back and look for early markers of the disease. The Last Cohort, therefore,
represents the opportunity to learn more about etiology (both genetic and environmental) of very
tightly defined disease entities; about early detection (serial blood specimens will provide
opportunities to establish proteomic marker profiles, cross-sectional and noting changes over
time); and, ultimately, about prevention.
The size of the cohort is determined by the degree of human genetic variation, the degree of
variation in exposures, and the size of the specific sets of outcomes to identify. It is estimated
that approximately 500,000 Last Cohort participants would be needed to achieve the desired
results. After 6 years of follow-up, for instance, a healthy cohort of this size aged 50 to 75 would
experience about 40,000 cancers and 55,000 deaths.
The value, over the long term, would be a disease classification system (derived in particular
from other work that the National Biospecimen Network [NBN] would facilitate) that would
allow researchers to divide cancer types into subsets, based on molecularly defined
homogeneous phenotypes. A large, long-term epidemiologic study such as this, tied to genomics
and proteomics with linkage to prediagnostic serum and data and the capacity to follow up for
outcomes of interest, would help researchers answer a number of questions:
- What is the association between specific exposures and molecularly defined disease risk?
(Paradigm: Smoking and lung cancer)
- What is the association between specific allele variants and disease risk? (Paradigm:
Adenomatous polyposis coli (APC) gene truncation mutation and colon cancer)
- What aspects of the interaction between exposure and genetic variants influence disease
risk? (Paradigm: Folate/ 5,10 methylenetetrahydrofolate reductase (MTHFR) variants and
colon polyps)
- What aspects of the proteome profile distinguish those with and without disease? (Paradigm: Prostate-specific
antigen (PSA) and prostate cancer)
How does this relate to the NBN? The most difficult and expensive part of establishing such a
cohort from scratch is collecting fresh specimens for proteomics, mRNA expression, etc., in
order to define the outcomes as precisely as possible. The setting up of the NBN specifically to
collect fresh tissues for these and other purposes markedly enhances the ability to ensure the
collection of such specimens from the cohort members. Further, the establishment of the Last
Cohort means that, in addition to providing specimens for drug development, sub-classifying
outcomes, etc., the NBN will greatly increase both our understanding of causes and our capacity
to develop serum markers for early detection.
If the right technology in put in place over the next 5 to 10 years to do high throughput genomic
sequencing (for susceptibility and resistance) and proteomics (for screening) on very large
populations, but an infrastructure is not in place to best exploit these gains, a significant
opportunity will be lost. The NBN could provide this infrastructure to the Last Cohort study. The
NBN would provide, by virtue of its biorepositories, extensive material capture, systematic
collection strategies, a centralized approach to ethical issues, and long-term follow-up. For the
Last Cohort, the NBN could provide, in addition, at a subset of its collection sites, the linkage to
prediagnostic exposure data and serum needed to undertake the studies such a cohort would
allow.
One way to conceptualize this is to imagine that the specimens collected by the NBN would be
derived from a virtual population cohort that will remain undefined. The Last Cohort proposal
would establish a real cohort so that it represents a very well-defined subset of this virtual cohort
see Figure 7-1.
Figure 7-1. The Last Cohort
Admittedly, this would increase the cost of the NBN (estimates are a two- to three-fold increase),
which plans to collect specimens from far fewer than 500,000 individuals. It is worth noting,
however, that the number of end-point specimens collected for the Last Cohort will be modest
(about 40,000 over the first 6 years, for instance); the additional costs arise from the recruitment
of the cohort and the collection of baseline data and blood samples. The Last Cohort presents a
unique opportunity to understand the causes of human disease, its prevention and early detection,
and should be considered.
|