The Second NIEHS Predictive-Toxicology Evaluation Experiment: 30 Chemical Carcinogenicity Bioassays
Environmental Health Perspectives 104, Supplement 5, October 1996

The NIEHS Predictive-Toxicology Evaluation Project: Chemcarcinogenicity Bioassays
Douglas W. Bristol,1 Joseph T. Wachsman,2 and
Arnold Greenwell1
1Cancer Biology Group, Laboratory of Environmental Carcinogenesis and Mutagenesis, Division of Intramural Research, 2Environmental Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina
Abstract
The Predictive-Toxicology Evaluation (PTE) project conducts collaborative experiments that subject the performance of predictive-toxicology (PT) methods to rigorous, objective evaluation in a uniquely informative manner. Sponsored by the National Institute of Environmental Health Sciences, it takes advantage of the ongoing testing conducted by the U.S. National Toxicology Program (NTP) to estimate the true error of models that have been applied to make prospective predictions on previously untested, noncongeneric-chemical substances. The PTE project first identifies a group of standardized NTP chembioassays either scheduled to be conducted or are ongoing, but not yet complete. The project then announces and advertises the evaluation experiment, disseminates information about the chembioassays, and encourages researchers from a wide variety of disciplines to publish their predictions in peer-reviewed journals, using whatever approaches and methods they feel are best. A collection of such papers is published in this Environmental Health Perspectives Supplement, providing readers the opportunity to compare and contrast PT approaches and models, within the context of their prospective application to an actual-use situation. This introduction to this collection of papers on predictive toxicology summarizes the predictions made and the final results obtained for the 44 chemcarcinogenesis bioassays of the first PTE experiment (PTE-1) and presents information that identifies the 30 chemcarcinogenesis bioassays of PTE-2, along with a table of prediction sets that have been published to date. It also provides background about the origin and goals of the PTE project, outlines the special challenge associated with estimating the true error of models that aspire to predict open-system behavior, and summarizes what has been learned to date. -- Environ Health Perspect 104(Suppl 5):1001-1010 (1996)
Key words: predictive toxicology, carcinogenesis, decision support, hazard identification, activity classification, risk assessment, pattern recognition, human heuristic, expert system, machine learning, artificial intelligence
This paper summarizes the results of the NIEHS Predictive Toxicology Evaluation Project. Manuscript received August 14, 1996; manuscript accepted August 15, 1996.
The authors thank R. Tennant, J. Spalding, S. Stasiewicz, and J. Ashby for initiating the predictive-toxicology evaluation project, along with Dr. J. M. Parry, Editor of Mutagenesis, who supported publication of papers in the first experiment. We also thank the editor, Dr. T. Goehl, and staff of Environmental Health Perspectives Supplements, for publishing this collection of predictive-toxicology papers.
Address correspondence to Dr. D. W. Bristol, NIEHS, B3-09, PO Box 12233, Research Triangle Park, NC 27709. Telephone: (919) 541-2756. Fax: (919) 541-0696. E-mail: bristol@niehs.nih.gov
Abbreviations used: CAS RN, Chemical Abstracts Service Registry Number; CE, clear evidence; E, equivocal; EE, equivocal evidence; EQV, equivocal overall classification; LOE, level of evidence; N, none; NE, no evidence; NEG, negative overall classification; NIEHS, National Institute of Environmental Health Sciences; NP, no prediction made; NTP, U.S. National Toxicology Program; P, positive; POS, positive overall classification; PT, predictive-toxicology; PTE, Predictive Toxicology Evaluation; PTE-1/PTE-2, first or second PTE experiment; QSAR, quantitative structure-activity relationship; SE, some evidence; STT, short-term tests; TR, Technical Report; W+, weakly positive; W+/U, weak positive or uncertain probability for being positive.
Definitions
Chembioassay. An experiment or study involving the exposure of a whole-animal test system to a test article and is conducted according to a standardized protocol so that the range and magnitude of biological responses that characterize an end point activity, such as carcinogenicity, may be observed; the test system for the U.S. National Toxicology Program (NTP) studies normally utilizes both genders of one rat and mouse strain; the test article is usually a well-characterized, organic chemical, inorganic compound, mineral, polymer, or mixture.
Level of evidence (LOE). NTP assigns a LOE to each sex-species, chemical-carcinogenicity experiment, as defined in each NTP Technical Report (TR). These are CE, clear evidence; SE, some evidence; EE, equivocal evidence; NE, no evidence; for older studies, they are P, positive; E, equivocal; and N, none.
Overall LOE. The LOE assigned to each sex-species experiment, combined with a classification for the overall bioassay study, using the following algorithm: a) If the LOE for one or more of the experiments is CE, SE, or P, then the overall classification is positive (POS); b) If the LOE for all of the experiments is NE or E, then the overall classification is negative (NEG); c) If the LOE for one or more of the experiments is EE or E and the LOE for the other experiments is NE or N, then the overall classification is equivocal (EQV); d) Experiments classified as inadequate study (IS) are given no consideration in arriving at the overall LOE classification.
Need for Predictive-Toxicology Models
The NTP conducts standardized chembioassays in rodents to identify and characterize exposures to substances that may be associated with carcinogenic or other toxicological effects on human health (1). Current regulations require that safety testing be performed in connection with the development of new chemicals or new uses of known chemicals. However, before the advent of such regulations, more chemicals came into use than can ever be tested using conventional methods. At the present time, society in general and the discipline of toxicology in particular, face the parallel tasks of performing safety evaluations that support the development of new chemical uses before human exposures are permitted and assessing the potential hazard posed by exposures to chemicals that lack safety evaluations. This situation creates an urgent need to develop PT models that
- generate predictions of known reliability or accompanied by confidence level estimate
- identify hazardous-chemical exposures more rapidly at a lower cost than current procedures
- apply to all types of test articles, including organic, inorganic, polymeric, mineral, and mixtures
- provide information that supports sound decision making for the effective and efficient management of laboratory animal testing that is still needed by regulatory and chemical development programs
- refine and reduce reliance on the use of large numbers of laboratory animals in the conduct of chembioassays
- accelerate the performance of risk assessments and the conduct of research and development programs.
Goals of Predictive-Toxicology Research
The development of models that reliably identify the hazard for untested chemical substances, of any type, using attribute values that can be computed or obtained with minimum testing time and cost is widely recognized to be the most immediate goal of PT research.
The return of information and overall value of an NTP bioassay increases when it is included in a PTE experiment because each prediction made about its outcome represents an additional hypothesis that is tested by the bioassay. Thus, in addition to characterizing the toxicity of individual chemicals (i.e., identify hazard), standardized bioassay tests also stimulate PT research by providing both learning sets for the development of models and the means to subject model performance to hypothesis testing.
Another, less perceived, aspect of PT research has potential value that far exceeds the generation of reliable predictions per se. Some PT models are based on pattern-recognition analysis of a learning set (2-8). The learning set is a database that includes a representative number and range of classified cases, where the chembioactivity of each case towards a particular toxicity end point has been determined by standardized testing. Each classified case in the learning set is represented by a corresponding array of values on attributes, selected to refiect various aspects of either or both biological factors and chemical structure that may infiuence activity. Although "data-mining" by pattern-recognition analysis can be limited by the availability of suitable learning sets, it represents a new approach that has great potential to help discover and confirm the key factors and relationships that govern the various, multifactorial, mechanistic pathways and determine toxic effects. Thus, the ultimate value and most important goal of PT research may lie in the development of its potential to help identify, characterize, and understand the various mechanisms or modes of action that determine the type and level of response observed when biological systems are exposed to chemicals. Because PT research can confirm existing hypotheses regarding mechanisms and stimulate the formation of new ones (9), it is complementary to and synergistic with the conduct of mechanistic studies.
The discovery aspect of PT research may also lead to an important refinement in the use of quantitative structure-activity-relationship (QSAR) models. A classical, extra thermodynamic QSAR approach (10,11) can only be applied to model chembioactivities governed by a unique mechanistic pathway, i.e., where chembioactivity is controlled by a single rate-limiting step. This limits the legitimate application of each different QSAR model, to untested chemicals that can be expected to be processed under the control of the same mechanism for which the QSAR was developed. When faced with selecting a QSAR model to study the mechanistic behavior of an untested chemical, there is no legitimate way to determine which of the many available might apply most appropriately. This uncertainty would be eliminated by the development of PT models that predict not only the activity expected for an untested chemical, but also indicate the mechanistic pathway that governs it. Thus, the output of such PT models would serve to guide the selection of QSAR models that may be used legitimately to elucidate mechanistic details and gain understanding that fosters better interpretation of the activity predicted.
Evaluation of Predictive-Toxicology Models
The advantages offered by PT research are clear; however, difficult problems remain that involve both model development and acceptance issues (12). A recent, definitive study of difficulties associated with the model confirmation problem (9) reports
Verification, validation, and confirmation of numerical models of natural systems is impossible. This is because natural systems are never closed and because model results are always non-unique. Models can be confirmed by the demonstration of agreement between observation and prediction, but confirmation is inherently partial. Complete confirmation is logically precluded by the fallacy of affirming the consequent and by incomplete access to natural phenomena. Models can only be evaluated in relative terms, and their predictive value is always open to question. The primary value of models is heuristic.
This important publication explains why it is impossible to establish confidence limits on boundaries of the feature space spanned by a PT model, which might otherwise be used to guide and restrict its application to legitimate cases. Also, because the boundaries of PT models are inexact, the legitimate range of application for PT models will always be uncertain, to some extent. The complex nature of the model confirmation problem presents a perplexing challenge to both developers and potential users; to gain acceptance and fulfill their promise, PT models must demonstrate performance accuracy that earns the confidence of would-be users.
PT-model evaluations based on cross-validation techniques (13) provide useful feedback during development of a model by analysis of a learning set of classified cases, but alone, they cannot provide the information needed to discriminate between high classification accuracy, a sign of model brittleness due to overlearning, and low prediction accuracy for unclassified cases.
The PTE Project
Overview
This project enlists the interdisciplinary resources of the entire PT community in the conduct of experiments that rigorously determine the extent to which predictions, made prospectively, agree with experimental observation. It provides objective, experimentally determined estimates for true error of model performance. It creates unique opportunities for the user and model-developer communities to jointly assess the strengths and weaknesses of various PT models and to evaluate the principles and ideas underpinning their development. More specifically, the PTE project:
- identifies test sets of bioassays that focus predictive-toxicology research efforts on a common goal and thereby provides a means for the rigorous, experimental evaluation of PT models;
- provides information on NTP test results as well as samples of test-chemical to the research community,
- encourages involvement of researchers from diverse disciplines to promote the application of a wide range of alternative approaches to solving this difficult problem and to maximize the yield of what can be learned from the comparative evaluation experiments,
- disseminates information about predictions generated to encourage rigorous evaluation of PT-model performance through publication of manuscripts and sponsorship of conferences.
Origin
Tennant and Ashby (2) completed an extensive review of results from NTP standardized tests, to evaluate putative correlations between attributes for chemical substructure features and short-term test (STT) results, often used by toxicologists, because they were thought to carry information of value for predicting chemcarcinogenesis. They used heuristic techniques to analyze a large and uniform learning set, which eventually included 301 classified NTP chemcarcinogenesis bioassays, plus values on attributes obtained from various STT for mutagenicity, the most informative of which was the Salmonella assay (14), Ashby structural-alert assignments, histopathology results from subchronic toxicity and chronic carcinogenicity studies, plus values on ancillary attributes possibly related to chemcarcinogenesis. After publishing the last in a series of papers (2), the authors were confident that some of the knowledge gained by their in-depth analysis had relevance to the prediction of chemcarcinogenesis. They subjected their new heuristic rules and relationships to the most rigorous test possible by publishing prospective predictions for the outcome of 44 NTP chemcarcinogenesis bioassays being tested by the NTP (3). With the support and cooperation of the editor of Mutagenesis, others were invited to publish sets of predictions, basing them on the methods they preferred (15). A variety of researchers responded and the original set of published predictions evolved to become PTE-1.
Figure 1 illustrates how Tennant et al. used the NTP-standardized testing program to first develop their human-heuristic PT model and then to evaluate the accuracy of its performance. The fiow diagram identifies the basic components needed to develop and evaluate PT models and indicates the type, source, and fiow of information typical of what might be used to generate prospective predictions and organize a PTE experiment.
Figure 1. NTP standardized testing fuels the engine that powers the PTE project and drives learning in the young science of chemical toxicology.
The "Tox testing" module in Figure 1 represents the engine that drives learning in toxicology, because it is the primary source of phenomenological observations, the foundation for learning in science. Standardized toxicity testing fosters the healthy growth and maturation of this relatively young discipline (12) by providing learning sets that support the development of models and theories. It is important to use learning sets that include a sufficient number and variety of classified cases to adequately represent the uncertain number of multifactorial, mechanistic pathways that are associated a complex toxicity endpoint like chemcarcinogenicity.
Figure 2 illustrates how a fully evaluated and confirmed PT model simplifies, when testing, learning, comparing, and modifying steps are no longer needed. A fully confirmed model needs only a few basic components to generate reliable predictions about hazard associated with exposure to untested chemicals. Information generated by the model is interpreted and used with confidence by decision makers.
Figure 2. Flow chart of the essential components of a fully confirmed PT model.
PTE-1: Prediction Sets, Final Bioassay Results, and Workshop Conclusions
Final results for the 44 NTP bioassays that made up PTE-1 are presented in Table 1. The sets of predictions generated by PTE-1 are listed in Table 2. Several papers evaluating various aspects of the PTE-1 experiment have already been published. We hope that this compilation of PTE-1 prediction sets accompanied by presentation of the final results for all 44 of the PTE-1 bioassays will inspire the publication of more papers that involve analyses of the results from this experiment to extend what has already been learned.
During 1993 the NIEHS conducted an international workshop to evaluate what had been learned from the PTE-1 collaboration. Broad consensus was evoked during discussions on some points while widely different opinions were heard on others. The workshop reached two main conclusions (16). First, SAR-based models do not perform as accurately as models that utilize biological attributes and, second, models that used multiple attributes to represent the chemcarcinogenicity endpoint performed better than models that were based on one or two attributes.
PTE-2: 30 Chemcarcinogenesis Bioassays and 17 Prediction Sets
Table 3 identifies the 30 NTP chemcarcinogenesis bioassays incorporated into PTE-2. This table includes the 2-D structure of each test article. The SMILES code for each test chemical is also included for those who might want to generate 3-D structures or compute physicochemical property values for them.
Table 3.
Table 3. Continued
Table 3. Continued
Table 4. tabulates the 17 sets of predictions published as part of PTE-2 to date. It provides a rapid overview of the predictions published for any of the 30 chemcarcinogenesis bioassays.
Table 4.
Support Provided to Foster Participation in PTE Experiments
The primary purpose of a PTE experiment is to learn by focusing the intellectual resources of different research groups on a common problem. When the set of test cases for a PTE experiment is reasonably representative for the end point activity, the overall learning potential for an evaluation experiment is infiuenced more by the number and variety of models applied to generate predictions than by the number of test-set bioassays. Therefore, it is important that as many predictors participate as possible.
The original announcement for PTE-2 (17) made available a package of comprehensive information that was distributed by mail or fax. Early in 1996, a page for the PTE Project was established on the Internet, as a link to the NIEHS home-page. It provides updates about the current status of the PTE-2 experiment and access to the NTP database information is of particular interest to PTE participants; the more important Internet addresses include:
References
1. Huff JE, Haseman JK, Rall DP. Scientific concepts, value, and significance of chemical carcinogenesis studies. Ann Rev Pharmacol Toxicol 31:621-652 (1991).
2. Ashby J, Tennant RW. Definitive relationships among chemical structure, carcinogenicity, and mutagenicity for 301 chemicals tested by the US NTP Mutation Res 257:229-306 (1991).
3. T. Tennant RW, Spalding J, Stasiewicz S, Ashby J. Prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 44 chemicals by the National Toxicology Program. Mutagenesis 5:3-14 (1990).
4. Bahler DR, Bristol DW. The induction of rules for predicting chemical carcinogenesis in rodents. In: Proceedings, First International Conference on Intelligent Systems for Molecular Biology, 6-9 July 1993, Bethesda, Maryland (Hunter L, Searls D, Shavlik J, ed), Menlo Park, CA:MIT Press, 1993;29-37.
5. Bristol DW, Bahler D. Predicting bioactivity for complex endpoints. In: Proceedings, Toxicology Forum Annual Summer Meeting, 10-14 July 1995, Aspen, Colorado. Fairfax, VA:CASET Assoc Ltd, 1996; 129-55.
6. King RD, Srinivasan A. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environ Health Perspect 104(Suppl 5):000-000 (1996).
7. Lee Y, Buchanan BG, Rosenkranz HS. Carcinogenicity predictions for a group of 30 chemicals undergoing rodent cancer bioassays based on rules derived from subchronic organ toxicities. Environ Health Perspect 104(Suppl 5):000-000 (1996).
8. Marchant C, Prediction of rodent carcinogenicity using the DEREK system for thirty chemicals being tested by the National Toxicology Program. Environ Health Perspect 104(Suppl 5):000-000 (1996).
9. Oreskes N, Shrader-Frechette K, Belitz K. Verification, validation, and confirmation of numerical models in the earth sciences. Science 263:641-646 (1994).
10. Franke R. Theoretical Drug Design Methods, Pharmacochem. Lib 7. Rev English translation. Amsterdam:Elselvier, 1984.
11. Kubini H. QSAR: Hansch Analysis and Related Approaches, New York:VCH Publishers, 1993.
12. Bristol DW. Summary and recommendations: activity classification and structure-activity relationship modeling for human health risk assessment of toxic substances. Toxicol Lett 79:265-280 (1995).
13. Weiss SM, Kulikowski CA. Computer Systems That Learn. San Mateo, CA: Morgan Kaufman, 1993.
14. Haseman J, Zeiger E, Shelby M, Margolin B, Tennant RW. Predicting rodent carcinogenicity from four in vitro genetic toxicity assays: an evaluation of 114 chemiclas studied by the NTP. J Am Stat Assn 85:964-71 (1990).
15. Parry JM. Editorial. Mutagenesis 5:89 (1990)
16. Wachsman JT, Bristol DW, Spalding J, Shelby M, Tennant RW. Predicting Chemical Carcinogenesis in Rodents: A Meeting Report. Environ Health Perspect 101:444-445 (1993).
17. NIEHS. Strategies for predicting chemical carcinogenesis in rodents. Science 264:146 (1994).
18. Bootman J Speculations on the rodent carcinogenicity of 30 chemicals currently under evaluation in rat and mouse bioassays organised by the US National Toxicology Program. Environ and Mol Mutagen 27:237-243 (1996)
19. Jones TD, Easterly CE. A rash analysis of National Toxicity Program data: Predictions for 30 compounds to be tested in rodent carcinogenesis experiments. Environ Health Perspect 000-000
20. Zhang YP, Sussman N, Macina OT, Rosenkranz HS and Klopman G. Prediction of the carcinogenicity of a second group of organic chemicals undergoing carcinogenicity testing. Environ Health Perspect 000-000
21. Benigni R, Andreoli C, Zito R. Prediction of rodent carcinogenicity of further 30 chemicals bioassayed by the U.S. National Toxicology Program. Environ Health Perspect 000-000
22. Kerckaert GA, Brauninger R, LeBoeuf RA, Isfort RJ. Use of the Syrian hamster embyro cell transformation assay for carcinogenicity prediction of chemicals currently being tested by the NTP in rodent bioassays. Environ Health Perspect 000-000
23. Purdy R. A mechanism-mediated model for carcinogenicity, model content and prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 25 organic chemicals. Environ Health Perspect 000-000
24. Moriguchi I, Hirano H, Hirono S. Prediction of the rodent carcinogenicity of organic compounds from their chemical structures using the FALS method. Environ Health Perspect 000-000
25. Lewis, DFV, Ioannides C, Parke DV. Compact and molecular structure in toxicity assessment: a prospective evaluation of 30 chemicals currently being tested for rodent carcinogenicity by the NCI/NTP. Environ Health Perspect 000-000
26. Tennant RW, Spalding J. Predictions for the outcome of rodent carcinogenicity bioassays: identification of trans-species carcinogens and noncarcinogens. Environ Health Perspect 000-000
27. Ashby J. Prediction of rodent carcinogenicity for 30 chemicals. Environ Health Perspect 000-000
28. Huff J, Weisburger E, Fung V. Multicomponent criteria for predicting carcinogenicity: 30 NTP chemicals. Environ Health Perspect 000-000
Last Update: March 24, 1998