Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways

AuthorsStephen Sonis, Enrique J. DeAndres-Galiana, Juan Lewis Fernandez-Martinez, Leorey N. Saligan
PublishedDecember 01, 2016
JournalJournal of Computational Biology


To better understand the impact of microarray preprocessing normalization techniques on the analysis of biological pathways in the prediction of chronic fatigue (CF) following radiation therapy, this study has compared the list of predictive genes found using the Robust Multiarray Averaging (RMA) and the Affymetrix MAS5 method, with the list that is obtained working with raw data (without any preprocessing).  First, we modeled the spiked-in data set where differentially expressed genes were known and spiked-in at different known concentrations, showing that the precisions established by different gene ranking methods were higher than working with raw data.  The results obtained from the spiked-in experiment were extrapolated to the CF data set to run learning and blind validation.  RMA and MAS5 provided different sets of discriminatory genes that have a higher predictive accuracy in the learning phase, but lower predictive accuracy during the blind validation phase, suggesting that the genetic signatures generated using both preprocessing techniques cannot be generalizable.  The pathways found using the raw data set better described what is a priori known for the CF disease.  Besides, RMA produced more reliable pathways than MAS5.  Understanding the strengths of these two preprocessing techniques in phenotype prediction is critical for precision medicine.  Particularly, this article concludes that biological pathways might be better unraveled working with raw expression data.  Moreover, the interpretation of the predictive gene profiles generated by RMA and MAS5 should be done with caution.  This is an important conclusion with a high translational impact that should be confirmed in other disease data sets.

Read full publication