TP 4: Identifying differentially expressed genes with limma

In this TP, you will get some practice using the BioConductor package limma. It implements the mod t and B statistics, so that you can rank genes for differential expression. As usual, you should always make sure you read the help documentation for each function you do not already know.

The limma User's Guide is extremely useful, you will probably want to refer to it often (not just today, but throughout the rest of the course, including the exam). The latest version can be found at http://bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf. You will be reading several sections of this today. Start off by reading the brief Introduction beginning on p.5 and also Sections 8.1 and 8.2. You will be analyzing the Affy e. coli and estrogren experiments referred to in the Introduction. Sections 9.1, 9.2 and 9.5 are useful for parameterization and corresponding design matrix for 2 condition (e. coli) and factorial (estrogen) experiments.

You should You might want to skim through the chapter on Statistics for Differential Expression (Chapter 13, p.60). Over the next few weeks, this material should start to make more sense.

The function lmFit fits a linear model to each gene separately. Following that with eBayes will get the mod t and B statistics. Make sure that you look at the structure of the object you create with these (called fit in the user guide). To get all the names of components of fit, you can type names(fit). The B-stat is contained in the lods component. Do not worry just yet about what 'fdr' (false discovery rate) means, we will learn more about this on Friday when we cover multiple hypothesis testing.

e. Coli data

Here you will work through Example 17.1 (p. 98 of the user guide). We do not have the cel files, but there is a bioConductor package that contains these data as an AffyBatch.

Begin by starting R, then install and load the package ecoliLeucine as well as limma. Also compute RMA values, so that you have you data matrix that will be analyzed for DE genes:

source("https://bioconductor.org/biocLite.R")
biocLite("ecoliLeucine")
library(ecoliLeucine)
library(limma)
data(ecoliLeucine)
eset <- rma(ecoliLeucine)
pData(eset)

Now you can continue the example from the top of p. 99

estrogen

Follow Example 17.2 for practice in analyzing a factorial experiment. Any necessary packages that you have not already installed can be found from the bioConductor website. The analysis should follow straightforwardly from the example.

Well if you have made it this far you have done a lot of work! Do not worry about writing a lab report this time, but print out a table of the top 50 most DE genes and bring it to class next week.