This manuscript was prepared as part of the Environmental
Epidemiology Planning Project of the Health Effects Institute, September
1990 - September 1992.
Introduction
The environment, for most epidemiologists, comprises everything that
is not genetic; so diet, smoking, and even exercise are considered environmental
factors. Environmental epidemiology, however, has a more restricted connotation,
referring to those environmental factors that are outside the immediate
control of the individual. Smoking, therefore, would not be a factor included
in environmental epidemiology, but the effects of tobacco smoke put into
the air by others would be. Other exposures of interest to environmental
epidemiologists include air pollution, water pollution, and occupational
exposure to physical and chemical agents.
The spread of infectious agents through water, foods, or other environmental
media could be seen as part of environmental epidemiology, but this area
has long been claimed by infectious disease epidemiologists and does not
suffer from most of the methodologic problems facing environmental epidemiologists.
Although there are areas of overlap between infectious disease and environmental
epidemiology, such as the suspension of exotic pathogens in indoor air or
the possibility of environmentally spread oncogenic viruses, environmental
epidemiologists usually do not concern themselves with infectious agents.
Environmental epidemiology comprises the study of more than just physical
and chemical agents, however. Rising health consciousness is a social phenomenon,
and concern about the health of the environment itself, as well as its effect
on us and other species, is a growing preoccupation among scientists and
nonscientists alike. Psychosocial factors are increasingly important concerns
in environmental epidemiology research: Studies of populations living near
electric power lines or nuclear generating power plants can be neither conducted
nor interpreted properly without a clear assessment of the role of the public's
perception of environmental health risks. In some instances the psychologic
reaction of the public may be a major component of the effect of an environmental
influence; in others, the ability to conduct a study at all, and the way
in which it should be conducted, are influenced profoundly by publicity
and public response.
Why make a distinction between environmental exposures that can be controlled
by the individual and those that are beyond his or her control? Those exposures
that are beyond individual control are typically exposures that affect many
individuals simultaneously and for which individual exposure may be difficult
to measure. These conditions frequently lend themselves to what some epidemiologists
call ecologic research, using aggregate rather than individual data. Those
environmental studies that do have individual people as subjects often have
distinctive methodologic features that derive from the nature of the exposure.
It is as much these methodologic distinctions as the subject matter itself
that warrant the use of a special term for environmental epidemiology. Furthermore,
the most important research gaps in the area of environmental epidemiology
may be methodologic problems.
Exposure Assessment
Atop the list of methodologic problems is the problem of exposure assessment,
a problem that extends through all of epidemiologic research but is a towering
obstacle in environmental epidemiology. Routine practice has been to use
crude measures that are only tenuously related to the actual exposure experienced.
Working in a plant, for example, has often been used as an indicator for
occupational exposures that are varied in kind and intensity within the
plant. Community-based sampling of air or water has been used commonly to
approximate individual exposure in many studies. Indeed, in ecologic research,
data may be aggregated over geographic units as large as continents. Any
externally derived information as a proxy for individual exposure introduces
measurement error that will affect the analysis. For exposures such as electromagnetic
fields, which vary strikingly over short distances, measuring an individual's
exposure by proxy measures is bound to result in substantial errors. For
many exposures, a crucial part of the assessment includes the personal history.
Such information is formidable to obtain after the fact and can be obtained
prospectively only with gargantuan effort. These problems in exposure assessment
are compounded by the problems of low prevalence of putative high-risk exposures
to the environmental agents and the low frequency of many of the outcomes
of interest.
The long induction time likely to intervene between the presumed causal
action of many environmental agents and the resulting appearance of disease
aggravates the difficulties of exposure assessment. With a long time interval
between exposure and disease, the investigator must either conduct a long,
expensive prospective study or rely on retrospective measurement of the
exposure information. Retrospective measurement is feasible for certain
types of exposure, such as occupational exposures for which adequate employment
records and industrial hygiene evaluations exist, or smoking for which the
memory of the smoker usually contains a reasonable enough record of the
exposure. For some exposures, such as ionizing radiation, medical records
and employment information may give partial information on the amount and
timing of exposure; but assessing the amount of exposure may involve considerable
guesswork, making retrospective evaluations less informative. For certain
unrecorded and imperceptible exposures, such as electromagnetic fields,
retrospective evaluation can at best be highly indirect.
Better methods of assessing environmental exposures are a high priority
for the future. One hope has been to find exposure biomarkers, which ideally
might serve as built-in biologic dosimeters, to measure the biologic record
of past exposure on the individual. An attraction of biomarkers is the theoretical
concept that if a chronic exposure can affect disease risk, there must be
a biological footprint somewhere in the organism that intermediates the
causal action. The use of biomarkers can overcome measurement error that
stems from an individual's incorrect recall or lack of awareness of an exposure.
The use of biomarkers also can bypass exposure assessment errors arising
from variation in individual absorption or metabolism of exposures by focusing
on a later step in the causal chain. Chromosomal abnormalities among long-lived
lymphocytes have been used in this way to assess the health effects of radiation
in the studies of the Hiroshima and Nagasaki cohorts. Another example of
this use of biomarkers is the possibility of using measurement of DNA adducts
to assess the effects of tobacco smoke in target tissues, a method that
may prove to be much more accurate than asking subjects about their smoking
habits.
An additional approach to refining exposure measurement is to use multiple
measures of exposure routinely until we find exposure measures that reflect
the exposure as completely as the research problem demands. Replicate measures
of exposure also can curb measurement uncertainty. The effect of residual
uncertainty can be quantified by sensitivity analyses that explore the implications
of errors in exposure assessment.
What are the priority areas for improving methods of exposure assessment
in environmental epidemiology? The following areas are those that should
command the highest attention [These recommendations are discussed in greater
detail in the paper by Hatch and Thomas (1)]: a) development
of dosimetric models using a combination of direct measurement, biological
markers, and questionnaire data, and the development of new strategies for
historical dose reconstruction of environmental exposures; b) development
of sensitivity analysis and other approaches to estimating dose uncertainty,
including methodology for validation substudies; and c) development
of methods to measure covariates more accurately.
Study Design
The range of epidemiologic study designs comprises true experiments with
randomized assignment of study subjects to intervention groups, as well
as nonexperimental studies in which randomization cannot be relied upon
to equalize the distorting effect of confounding factors related to both
the exposure and the outcome. Randomized assignment of individuals into
groups with different environmental exposures generally is impractical,
if not unethical; community intervention trials for environmental exposures
have been conducted, although seldom (if ever) with random assignment. Furthermore,
the benefits of randomization are heavily diluted when the number of randomly
assigned units is small, as when communities rather than individuals are
randomized. Thus, environmental epidemiology consists nearly exclusively
of nonexperimental epidemiology. Ideally, such studies use individuals as
the unit of measurement; but often environmental data are available only
for groups of individuals, and investigators turn to so-called ecologic
studies to learn what they can.
The most basic epidemiologic study design, which includes experimental
studies, is the cohort study. In a cohort study, a population is characterized
as to its exposure to an agent of interest, and this population is then
followed to measure the rate of occurrence of one or more types of disease
events within variously defined exposure cohorts. Cohort studies may be
entirely prospective, in which case they are expensive and usually last
a long time, or they may be partially or completely retrospective, in which
case they are shorter and cheaper but typically have to rely on data collected
before the research plan was concocted. Case-control studies, although they
have been described as backward cohort studies involving a comparison of
exposure distributions in cases and controls, may be better conceptualized
as streamlined cohort studies: They involve sampling the base population,
or some facsimile of it, to learn the distribution of exposure within it,
enabling the investigator to estimate the relative rate of disease occurrence
within each exposure cohort. The sampling is usually a big cost-saver. It
comes at a reasonable price--only relative rates of disease occurrence are
calculable, unless the sampling fractions are known. If the sampling fractions
are known, the case-control study can provide estimates of the absolute
disease rates. Like cohort studies, case-control studies can be retrospective
or prospective.
Ecologic studies differ from the basic cohort study in that individual
exposure levels are not measured, or such exposure information, if it is
measured, is not linked to disease occurrence at the individual level. The
usual unit of statistical analysis is typically a geographic area, such
as census tract, county, or state. For each group or region, we can estimate
the distribution of individual exposures or at least the average exposure
level, and we can estimate overall disease rates, but we do not have measurements
of both exposure level and disease status that would allow one to estimate
directly the joint distribution of the two variables. Therefore, it is impossible
to get direct estimates of the rate of disease in exposed and unexposed
populations from ecologic data; indirect estimates must be obtained. The
indirect estimation of effects in ecologic studies and fundamental methodologic
concerns, such as the control of confounding, are replete with methodologic
complications that make ecologic studies a highly specialized methodologic
area in epidemiology. The need to conduct such studies emanates primarily
from the basic difficulty of obtaining high-quality data on environmental
exposures and covariates.
The challenge posed by environmental epidemiology cannot be answered
simply by conducting larger and more expensive studies; the special problems
inherent in this area of research may call for new types of study designs
intended to address these problems. One example is the idea of conducting
a two-stage study in which exposure and disease information are collected
in the first stage, and covariate information is collected on a subset of
subjects in the second stage. This study design should be useful when covariate
information is expensive relative to information on exposure and disease.
The results from stage one estimate a crude effect, and the information
in stage two is used to estimate the effect adjusted for covariates. Covariate
information is collected most efficiently in case-control studies, and therefore,
we can look forward to seeing more two-stage studies in which the second
stage of the investigation is a case-control study.
Another type of study that merits attention is one that focuses on intermediate
steps in the causal path to disease. Such studies could give information
about the relation between acute and chronic effects and provide some results
much earlier than more traditional studies. Surveillance systems may be
worthwhile so that selection and reporting biases can be avoided. As mentioned
above, clearer understanding of the use and conduct of validation substudies
is another important priority in study design. Theoretical work is needed
on the validity of estimates from ecologic analyses to understand the relative
importance of various assumptions and how departures from these assumptions
affect the estimates. Understanding of the interaction of genes and environment
will have to grow rapidly to keep pace with the information explosion about
the genome. All these areas are fertile ground for more theoretical work
on epidemiologic study designs.
Data Analysis
For studies on individuals with information on important confounders
and little measurement error for the confounders, exposure, and outcome
variables, the analytic methodology to assess exposure effects while controlling
for confounding is reasonably well developed. Methods exist to control for
confounding and to assess the exposure effect even when the exposures and
confounding factors have complicated variations over time. Where analytic
problems exist in environmental epidemiology research, it is usually the
result of lack of information on confounding variables or measurement errors
in confounders, exposure, or outcome variables. Such problems are the major
sources of bias in environmental epidemiology research, although bias also
arises from the same sources that affect all nonexperimental epidemiology,
such as selection biases and information biases. Biases can arise in any
study from the use of inappropriate mathematical models in an analysis;
but this is a particularly important problem in ecologic studies, because
they rely on aggregate data. The often-assumed linear relation between exposure
and disease risk may not correspond to the biologic relation between exposure
and disease. Ecologic studies also suffer from biases that distort the estimation
of exposure effects because of heterogeneity of exposure status within population
aggregates.
Measurement error usually has been taken into account by assuming a value
for misclassification probabilities and recalculating effect estimates based
on the assumed value, thus allowing a type of sensitivity analysis. Usually
the misclassification probabilities are known from estimates based on limited
data. A methodologic priority for data analysis is the development of methods
to take account of uncertainty in the assumed values for misclassification
probabilities, thus progressing from a sensitivity analysis to a more direct,
corrected estimation of exposure effects that incorporates measurement error
and the attached uncertainty.
Another important need is improved methods for the analysis of ecologic
studies, especially with regard to controlling confounding. It would be
useful to develop methods to control confounding in aggregate-data studies
using information from surveys on individuals. Such approaches would call
for corresponding innovation in data analysis.
Studies of multiple exposures face the formidable task of separating
effects of interactions from variations in the induction periods and dose-response
curves of different exposures. There is a need for analysis methods that
simultaneously account for interactions, induction periods, and dose-response
in a parsimonious fashion.
The difficulty and expense of epidemiologic research on environmental
problems forces attention toward methods for aggregating results over a
set of studies when appropriate. While many critics of meta-analysis rightly
object to the pooling of inherently noncomparable work, no one argues that
literature reviews are undesirable. It seems reasonable to review published
work as objectively and quantitatively as possible. Meta-analysis should
be thought of simply as a "quantitative literature review," as
Greenland has called it (2). Meta-analyses should rely on the principle
that the primary comparisons from which effect estimates are derived should
be made within each study proper and then given appropriate statistical
treatment, in terms of adjustment and weighting, to combine results across
studies. Better methods are needed for adjusting the individual study-specific
results to reduce bias before combining with other results, especially to
take account of errors in exposure assessment that differ across studies.
Risk Assessment
Some people believe that we now live in a chemical soup that implacably
erodes our health, while others believe that we have engineered an environment
that protects us from most of the important health risks that otherwise
would have been our fate. In either case, however, it is clear that assessing
the risks of our technological world is becoming more complex.
The complexity is compounded by the intricacy of the public policy issues
relating to environmental epidemiology, involving economic, political, and
social concerns that must be taken into account along with the health consequences
of environmental exposure. Perhaps the broadest and most important methodologic
problem in environmental epidemiology is the problem of how environmental
epidemiology should be used in relation to other sources of information
to address these public policy issues. How many studies, and of what type,
are needed before policy should be promulgated? What are the implications
of publication bias (resulting from a failure to publish studies that do
not show a relation between environmental exposures and health problems)?
How should animal studies be weighed in relation to epidemiologic studies?
What role should the public take in the conduct of research and risk assessment?
The answers to these questions are important to us as citizens, but they
are usually seen to be outside the scope of our work as scientists. This
set of questions should be another priority for methodologic research.