Reproducibility-optimized test statistic for ranking genes in microarray studies

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):423-431. doi: 10.1109/tcbb.2007.1078.

Abstract

A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Data Interpretation, Statistical
  • Gene Expression Profiling / methods*
  • Genes / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity