In bioinformatics, genome-wide experiments look for important biological differences between two groups at a large number of locations in the genome. Often, the final analysis focuses on a p-value based ranking of locations which might then be investigated further in follow up experiments. However, this strategy may result in small effect sizes, with low p-values, being ranked more favourably than larger more scientifically important effects. Bayesian ranking techniques may offer a solution to this problem provided a good prior distribution for the collective distribution of effect sizes is available.
We develop an Empirical Bayes ranking algorithm, using the marginal distribution of the data over all locations to estimate an appropriate prior. In simulations and analysis using real datasets, we demonstrate favourable performance compared to ordering p-values and a number of other competing ranking methods. The algorithm is computationally efficient and can be used to rank the entirety of genomic locations or to rank a subset of locations, pre-selected via traditional FWER/FDR methods in a 2-stage analysis.
An R-package, EBrank, implementing the ranking algorithm is available on CRAN.
Supplementary data are available at Bioinformatics online.