Over-optimism in bioinformatics: an illustration

Abstract : In statistical bioinformatics research, different optimization mechanisms potentially lead to “over-optimism” in published papers. The present empirical study illustrates these mechanisms through a concrete example from an active research field. The investigated sources of over-optimism include the optimization of the data sets, of the settings, of the competing methods and, most importantly, of the method's characteristics. We consider a “promising” new classification algorithm that turns out to yield disappointing results in terms of error rate, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. We quantitatively demonstrate that this disappointing method can artificially seem superior to existing approaches if we “fish for significance”. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should be validated using “fresh” validation data sets. The R codes and preprocessed versions of the data sets as well as additional files can be downloaded from http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020−professuren/boulesteix/overoptimism/,such that the study is completely reproducible.
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal-supelec.archives-ouvertes.fr/hal-00514107
Contributor : Karine El Rassi <>
Submitted on : Wednesday, September 1, 2010 - 11:35:26 AM
Last modification on : Wednesday, July 10, 2019 - 7:14:02 PM
Long-term archiving on : Thursday, December 2, 2010 - 2:43:19 AM

File

Jelizarow2010.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00514107, version 1

Collections

Citation

Monika Jelizarow, Vincent Guillemot, Arthur Tenenhaus, K. Strimmer, Anne-Laure Boulesteix. Over-optimism in bioinformatics: an illustration. Bioinformatics, Oxford University Press (OUP), 2010, 26, pp.1990-1998. ⟨hal-00514107⟩

Share

Metrics

Record views

283

Files downloads

253