Abstract
This article proposes nonparametric inference procedures for analyzing microarray gene expression data that are reliable, robust, and simple to implement. They are conceptually transparent and require no special-purpose software. The analysis begins by normalizing gene expression data in a unique way. The resulting adjusted observations consist of gene-treatment interaction terms ( representing differential expression) and error terms. The error terms are considered to be exchangeable, which is the only substantial assumption. Thus, under a family null hypothesis of no differential expression, the adjusted observations are exchangeable and all permutations of the observations are equally probable. The investigator may use the adjusted observations directly in a distribution-free test method or use their ranks in a rank-based method, where the ranking is taken over the whole data set. For the latter, the essential steps are as follows: 1. Calculate a Wilcoxon rank-sum difference or a corresponding Kruskal-Wallis rank statistic for each gene. 2. Randomly permute the observations and repeat the previous step. 3. Independently repeat the random permutation a suitable number of times. Under the exchangeability assumption, the permutation statistics are independent random draws from a null cumulative distribution function (c.d.f.) approximated by the empirical c.d.f. Reference to the empirical c.d.f. tells if the test statistic for a gene is outlying and, hence, shows differential expression. This feature is judged by using an appropriate rejection region or computing a p-value for each test statistic, taking into account multiple testing. The distribution-free analog of the rank-based approach is also available and has parallel steps which are described in the article. The proposed nonparametric analysis tends to give good results with no additional refinement, although a few refinements are presented that may interest some investigators. The implementation is illustrated with a case application involving differential gene expression in wild-type and knockout mice of an E. coli lipopolysaccharide (LPS) endotoxin treatment, relative to a baseline untreated condition.
| Original language | English |
|---|---|
| Pages (from-to) | 783-797 |
| Journal | Journal of Biopharmaceutical Statistics |
| Volume | 15 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - 2005 |
Subject classification (UKÄ)
- Cardiology and Cardiovascular Disease
Free keywords
- rank methods
- normalization
- nonparametric methods
- multiple testing
- microarray
- gene expression
- false discovery rate
- distribution-free
- exchangeable random variables
- SAM
- statistical analysis