The Second NIEHS Predictive-Toxicology Evaluation Experiment: 30 Chemical Carcinogenicity Bioassays
Environmental Health Perspectives 104, Supplement 5, October 1996
Prediction of the Rodent Carcinogenicity of Organic Compounds from Their Chemical Structures Using the FALS Method
Ikuo Moriguchi,1 Hiroyuki Hirano,2 and Shuichi Hirono1
1School of Pharmaceutical Sciences, Kitasato University, Tokyo, Japan; 2Zeria Pharmaceutical Co., Ltd., Tokyo, Japan
Abstract
Fuzzy adaptive least-squares (FALS), a pattern recognition method recently developed in our laboratory for correlating structure with activity rating, was used to generate quantitative structure-activity relationship (QSAR) models on the carcinogenicity of organic compounds of several chemical classes. Using the predictive models obtained from the chemical class-based FALS QSAR approach, the rodent carcinogenicity or noncarcinogenicity of a group of organic chemicals currently being tested by the U.S. National Toxicology Program was estimated from their chemical structures. -- Environ Health Perspect 104(Suppl 5):1051-1058 (1996)
Key words: QSAR, FALS, rodent carcinogenicity, predictive models
This paper is part of the NIEHS Predictive Toxicology Evaluation Project. Manuscript received 15 February 1996; manuscript accepted 30 May 1996.
Address correspondence to Dr. Ikuo Moriguchi, School of Pharmaceutical Sciences, Kitasato University, Shirokane, Minato-ku, Tokyo 108, Japan. Telephone: 03-3444-6161. Fax: 03-3440-5246.
Abbreviations used: FALS, fuzzy adaptive least squares; QSAR, quantitative structure-activity relationship; IARC, International Agency for Research on Cancer; NTP, National Toxicology Program; NIEHS, National Institute of Environmental Health Sciences.
Introduction
The prediction of carcinogenicity has become a subject of great importance for regulatory perspectives and ecotoxicity assessments. Especially, prediction only from the chemical structure is desired, since it can be utilized even when a test compound is unavailable or does not exist. Approaches using some correlative methods for noncongeneric chemicals were reviewed by Richard (1), who found that published prediction accuracies were in excess of 90%, while prospective prediction accuracies were less than 70% in these approaches. Moreover, worse results were published for a prospective prediction of rodent carcinogenicity using a variety of quantitative structure-activity relationship (QSAR) approaches (2). Further studies are required to improve the predictive reliability.
We have recently developed fuzzy adaptive least-squares (FALS) (3,4), a pattern recognition method for correlating structure with activity rating, and applied the method to a noncongeneric structure-carcinogenicity correlation (5). Ideally, rational preclassification of compounds based on possible carcinogenic mechanisms should be extensively investigated to enhance the predictive accuracy of noncongeneric QSAR approaches. Unfortunately, for this purpose there is still not sufficient knowledge concerning molecular mechanisms of carcinogenicity. In this study, a rough chemical classification was adopted to generate the predictive models. Using data from the International Agency for Research on Cancer (IARC) (6) and the National Toxicology Program (NTP) (7,8) on carcinogenicity as training sets, FALS QSAR models for eight chemical classes were generated. Based on these models, prospective predictions of rodent carcinogenicity of 25 organic chemicals issued by the National Institute of Environmental Health Sciences (NIEHS) were accomplished.
Methods
FALS Methodology
FALS is a nonparametric pattern classifier. It formulates QSAR in a single discriminant function irrespective of the number of activity rating classes, as:
Z=w0+w1x1
+w2x2+...+wpxp
[1]
In this equation, xk=kth descriptor (k=1,2,...,p) for structures, wk (k=0,1,2,...,p)=weight coefficient, and Z= discriminant score. A novel feature of FALS is that the degree to which each compound belongs to its activity class is given by a fuzzy membership function (9). In FALS, a bell-shaped membership function for each activity class is assumed to give the membership grade for the class members.
In the simplest case, in which the number of activity rating classes is only two, e.g., carcinogenic/noncarcinogenic dichotomization as in this study, the membership function, M(Z), for each activity class is given as:
For carcinogenic activity,
M(Z)=1/[1+{(Z-Boundary)/0.1-1}4]
when Z
Boundary+0.1,
otherwise M(Z)=1 [2]
For noncarcinogenic activity,
M(Z)=1/[1+{(Boundary-Z)/0.1-1}4]
when Z
Boundary-0.1,
otherwise M(Z)=1 [3]
In these equations, Boundary takes the value of (n1-n2)/(n1+n2), where n1 and n2 are the numbers of noncarcinogens and carcinogens, respectively, in the training set. The calculated value of M(Z) is the membership grade.
The weight coefficients in the discriminant function are generated so as to maximize the sum of the membership grade over the set of compounds by an adaptive least-squares iteration. The resultant discriminant functions that have various descriptors are validated by the leave-one-out prediction. The discriminant function with a scientifically reasonable set of structural descriptors giving the best leave-one-out prediction is finally adopted as the QSAR model. The FALS methodology has been described on a number of occasions (3-5).
Database and Chemical Classes
A database including a total of 586 compounds listed in Table 1 was used for the training sets. The compounds had been designated as carcinogenic or noncarcinogenic by IARC (6) and/or NTP (7,8) based upon evaluation of rodent test data. If the two agencies' carcinogenicity/noncarcinogenicity assignments differed for any given compound, the NTP designation was adopted. Compounds giving equivocal evidence of carcinogenicity were not used. Inorganic and metallo-organic chemicals, polymers, and mixtures were also excluded from the training sets.
Table 1.
Table 1. Continued
Table 1. Continued
The chemical classification was designed to be broad enough to permit a reasonable number of training compounds to fall into each class for generation of statistically significant QSAR models. With a special reference to the chemical features of the compounds to be predicted, the following eight chemical classes were investigated: class 1, hydrocarbons (39 compounds); class 2, heterocyclics (185 compounds); class 3, nitro and nitroso compounds and N-oxides (98 compounds); class 4, halides (152 compounds); class 5, alcohols, phenols, and ethers (160 compounds); class 6, carbonyl compounds (205 compounds); class 7, nonaromatic amines (25 compounds); and class 8, oxygenated sulfur compounds (52 compounds). An individual compound can appear in several classes according to its chemical structure. 2,3,5,6-Tetrachloro-4-nitroanisole, for example, appears in classes 3, 4, and 5.
Structural Descriptors
Three kinds of variables--continuous variables, discrete variables, and indicator variables--were investigated as candidate descriptors. Molecular weight, hydrophobic constant (log P), and its squared value were used as continuous variables. The log P (octanol/water) values used were calculated using the revised version (10) of our simple method (11,12). Discrete variables were defined as the number of specific atoms, bonds, functional groups, and specific ring and chain structures. The upper values of the discrete variables other than the number of specific atoms and bonds were empirically set at 3.0 so as to avoid possible overestimation for polyfunctional structures. Indicator variables were defined as 1 for the presence and 0 for the absence of any kind of structural or physicochemical features considered to be contributing to carcinogenicity.
Results and Discussion
Generation of Predictive Models
The FALS analyses were performed for carcinogenic/noncarcinogenic dichotomization using eight sets of data for the various chemical classes. As a result, the eight satisfactory equations including from 5 to 25 descriptors (Moriguchi et al., unpublished data) were derived. They are listed in Table 2.

Table 2.
Descriptors with positive coefficients are usually considered to contribute in a positive way to the estimate of carcinogenicity, whereas descriptors with negative coefficients contribute in a negative way. However, this is not always valid beyond the chemical classes. Moreover, strictly speaking, these coefficients cannot be used to make general inferences about the contribution of each fragment within a variety of structures. They are valid only when used in the context of the present multidimensional model within each chemical class.
The results of recognition and leave-one-out prediction of the eight QSAR models are shown in Table 3. The values of the mean membership grade were fairly good, from 0.860 to 0.949 in the recognition and from 0.783 to 0.923 in the leave-one-out prediction. The false negative was from 1.6 to 5.8% in the recognition and from 3.1 to 8.0% in the leave-one-out prediction. These equations were then used for the carcinogenicity prediction of 25 organic chemicals.
Prospective Prediction of the Organic Chemicals
The second NIEHS Predictive-Toxicology Evaluation Project involves the rodent carcinogenicity of 30 chemicals consisting of 25 organic and 5 inorganic compounds. The five inorganic compounds were omitted from our FALS prediction because sufficient carcinogenicity data for inorganic chemicals were not available for generating predictive QSAR models. The prediction of the 25 organic compounds was performed using the QSAR models for the eight chemical classes listed in Table 2. Salts such as scopolamine hydrobromide trihydrate and sodium xylenesulfonate were treated as undissociated forms. The results are shown in Table 4.
Table 4.
From the chemical features, compounds 1 (scopolamine) and 2 (codeine) fall into three chemical classes, and compounds 5 (tetrahydrofuran), 10 (D&C Yellow No. 11), 13 (1-chloro-2-propanol), 14 (diethanolamine), 15 (phenolphthalein), 18 (furfuryl alcohol), 19 (primaclone), 24 (oxymetholone), and 26 (emodin) fall into two chemical classes. When there were discrepancies between the estimates by two or three QSAR models, we evaluated them as "equivocal." Among the 25 organic chemicals, 14 showed positive, 5 showed equivocal, and 6 showed negative carcinogenicity. Further detailed predictions by the correlative method are thought to be unreliable, since there are not sufficient data concerning mechanisms and sites of tumor formation with a wide variety of chemicals for the generation of statistically significant QSAR models.
In these predictions, the mutagenicity and subchronic toxicity test data were not considered. The prediction based on the QSAR models can be performed in a very short time at a very low cost, and it can be utilized even when the test compound does not exist. Unfortunately, the first round of this exercise showed that the results by the correlative methods were not very good (2). It is considered that the predictive power of correlative methods significantly depends upon the quality and quantity of the training set data used. Sufficient high-quality data covering a large variety of chemical structures, as well as the use of mechanism-based descriptors, will enhance the prospective prediction accuracies of the QSAR approaches.
References
1. Richard AM. Application of SAR methods to non-congeneric data bases associated with carcinogenicity and mutagenicity: issues and approaches. Mutat Res 305:73-97 (1994).
2. Hileman B. "Expert intuition" tops in test of carcinogenicity prediction. Chem Eng News 71(25):35-37 (1993).
3. Moriguchi I, Hirono S, Matsushita Y, Liu Q, Nakagome I. Fuzzy adaptive least squares applied to structure-activity and structure-toxicity correlations. Chem Pharm Bull 40:930-934 (1992).
4. Moriguchi I, Hirono S, Liu Q, Nakagome I. Fuzzy adaptive least squares and its application to structure-activity studies. Quant Struct-Act Relat 11:325-331 (1992).
5. Moriguchi I, Liu Q, Hirano H, Hirono S. Noncongeneric structure-toxicity correlation using fuzzy adaptive least-squares. In: Classical and Three-Dimensional QSAR in Agrochemistry, ACS Symposium Series 606 (Hansch C, Fujita T, eds). Washington:American Chemical Society Books, 1995;141-152.
6. Soderman J, ed. CRC Handbook of Identified Carcinogens and Noncarcinogens, Vol. 1. Boca Raton, FL:CRC Press, 1982.
7. Ashby J, Tennant RW. Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat Res 257:229-306 (1991).
8. Gold LS, Manley NB, Slone TH, Garfinkel GB, Ames BN, Rohrbach L, Stern BR, Chow K. Sixth plot of the carcinogenic potency database: results of animal bioassays published in the general literature 1989 to 1990 and by the National Toxicology Program 1990 to 1993. Environ Health Perspect 103(Suppl 8):3-122 (1995).
9. Novak V. Fuzzy Sets and Their Applications, Bristol:Adam Hilger, 1989;222-234.
10. Moriguchi I. Development of fuzzy adaptive least-squares and its uses in quantitative structure-activity relationships. Yakugaku Zasshi 115:805-822 (1995).
11. Moriguchi I, Hirono S, Liu Q, Nakagome I, Matsushita Y. Simple method of calculating octanol/water partition coefficient. Chem Pharm Bull 40:127-130 (1992).
12. Moriguchi I. Hirono S, Nakagome I, Hirano H. Comparison of reliability of log P values for drugs calculated by several methods. Chem Pharm Bull 42:976-978 (1994).
Last Update: March 24, 1998