Quantcast
Skip to main content
Environmental Health Perspectives Podcasts - The Researcher's Perspectives
Full
About EHP Publications Past Issues News By Topic Authors Subscribe Press International Inside EHP Email Alerts spacer
Environmental Health Perspectives (EHP) is a monthly journal of peer-reviewed research and news on the impact of the environment on human health. EHP is published by the National Institute of Environmental Health Sciences and its content is free online. Print issues are available by paid subscription.DISCLAIMER
spacer
NIEHS
NIH
DHHS
spacer
Current Issue

EHP Science Education Website




EHP on Twitter

AAAR

Comparative Toxicogenomics Database (CTD)

spacer
Environmental Health Perspectives Volume 112, Number 16, November 2004 Open Access
spacer
Using Decision Forest to Classify Prostate Cancer Samples on the Basis of SELDI-TOF MS Data: Assessing Chance Correlation and Prediction Confidence

Weida Tong,1 Qian Xie,2 Huixiao Hong,2 Hong Fang,2 Leming Shi,1 Roger Perkins,2 and Emanuel F. Petricoin3

1Center for Toxicoinformatics, Division of Biometry and Risk Assessment, and 2Bioinformatics Group, National Center for Toxicological Research, Jefferson, Arkansas, USA; Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Bethesda, Maryland, USA

Abstract
Class prediction using "omics" data is playing an increasing role in toxicogenomics, diagnosis/prognosis, and risk assessment. These data are usually noisy and represented by relatively few samples and a very large number of predictor variables (e.g., genes of DNA microarray data or m/z peaks of mass spectrometry data) . These characteristics manifest the importance of assessing potential random correlation and overfitting of noise for a classification model based on omics data. We present a novel classification method, decision forest (DF) , for class prediction using omics data. DF combines the results of multiple heterogeneous but comparable decision tree (DT) models to produce a consensus prediction. The method is less prone to overfitting of noise and chance correlation. A DF model was developed to predict presence of prostate cancer using a proteomic data set generated from surface-enhanced laser deposition/ionization time-of-flight mass spectrometry (SELDI-TOF MS) . The degree of chance correlation and prediction confidence of the model was rigorously assessed by extensive cross-validation and randomization testing. Comparison of model prediction with imposed random correlation demonstrated biologic relevance of the model and the reduction of overfitting in DF. Furthermore, two confidence levels (high and low confidences) were assigned to each prediction, where most misclassifications were associated with the low-confidence region. For the high-confidence prediction, the model achieved 99.2% sensitivity and 98.2% specificity. The model also identified a list of significant peaks that could be useful for biomarker identification. DF should be equally applicable to other omics data such as gene expression data or metabolomic data. The DF algorithm is available upon request. Key words: , , , , , , , , . Environ Health Perspect 112:1622-1627 (2004) . doi:10.1289/txg.7109 available via http://dx.doi.org/ [Online 5 August 2004]


Address correspondence to W. Tong, Center for Toxicoinformatics, Division of Biometry and Risk Assessment, NCTR, 3900 NCTR Rd., HFT020, Jefferson, AK 72079 USA. Telephone: (870) 543-7142. Fax: (870) 543-7662. E-mail: wtong@nctr.fda.gov

The authors declare they have no competing financial interests.

Received 22 March 2004 ; accepted 5 August 2004.


The full version of this article is available for free in HTML or PDF formats.
spacer
 
Open Access Resources | Call for Papers | Career Opportunities | Buy EHP Publications | Advertising Information | Subscribe to the EHP News Feeds News Feeds | Inspector General USA.gov

Download Adobe Acrobat Reader to view PDF files located on this site.