Additional Guidelines for Exam

General

The length of your final exam will be limited to 9 pages MAXIMUM, plus a 1 page list of differentially expressed genes. (This limit does not include your R code.) For each page over the limit, I will deduct 1/2 point from your final course note. Therefore, you need to be selective about what you will include.

Each page of your report should be numbered. Margins should be 2.5 cm on all four sides (top, bottom, left, right). The point size should not be smaller than 12pt.

Figures should be reasonably sized. Each figure should have meaningful axis labels that are large enough to read. If multiple lines occur on a figure, a legend should be included. Each figure should also be numbered and contain a descriptive caption.

Tables should also be numbered and contain a caption describing what is presented.

It is very important to give clear, but brief, descriptions and explanations for your analyses. The overall reasoning you are using should be clear. Remember, you are trying to show me that you know what you are doing here – I will not assume that you do unless you demonstrate it.

Your exam must be your own work. You may not talk to or consult with anyone other than me about your exam. (This does not apply for the practice exam.) You must use your own words in your report – DO NOT COPY from ANY other sources.

 
In the case of plagiarism (plagiat) or any other fraud, I will immediately notify the Vice-president
 
of Academic Affairs, and will apply the most severe consequence possible, up to and including EXPULSION.

 

 

General – Statistical analyses

For Affymetrix GeneChips, you will base statistical analyses on summary expression values (RMA) for each probe set (NOT on normalized single probe values).

You must write out explicitly any model that you use in your analysis, making sure to define any notation. Any model can be parameterized in different ways, but you should use only one version of the model. If you include more than one model for a single problem you will not receive full credit.

A brief description of the multiple testing adjustment that you use should be included. You should also explain (briefly) the importance of adjusting for multiple tests.

You should give an explanation of the statistic you are using to rank genes. Typically, you would use (absolute value of) mod-t (or adjusted p-value) or B-stat.

You should also explain your choice of threshold for determining differential expression. Some possibilities include: a particular value for the B-stat (e.g. 0, or some other reasonable value depending on the context), threshold based on some specific number of genes for followup (e.g. 20, 50, etc.), threshold based on an adjusted p-value (e.g. 0.05, 0.01, etc.).

There should be some graphical display of characteristics of differentially expressed genes compared to other genes (e.g. average MA plot, volcano plot, etc.). You should not include too many plots here. It is nice to have the genes labeled in a meaningful way.

Quality assessment: You should make meaningful plots, but remember not to include too many plots of the same thing. You can look at them all in your analysis, but be selective in what you include in your report. Many of you included both histograms and boxplots of single chip PM intensities - these give virtually identical information. If it is needed to show this information, please choose just one of these. (Usually boxplots are easier to interpret here.) You should also make sure to include relevant image plots. Typically these would be from the robust regression weights or residuals and NOT from the original intensities. HINT: the plot should be white/green or blue/red, NOT mostly black. It is also a good idea to include NUSE boxplots. Again, you need to remember to explain the purpose of the plots. You should say whether or not you need to exclude any chips, which chips you are excluding, and explain your reasoning.

Normalization: Here, you will want to use RMA, either computed using median polish (affy package) or robust regression (affyPLM package), but you should give a brief description (in your own words) of what that means, including the model and steps for obtaining a final summary expression measure.

Gene list: You should include a single page table of the top differentially expressed genes according to your criterion, or at most the top 50 most DE genes.

Reproducible R code: You should include as PLAIN (ASCII) TEXT the R code that I can use to reproduce your analyses. (Please note that '.rtf' is NOT plain text.) I should be easily able to copy/paste this code into R and get the same results as you did. (If you are using knitr, you can send the .Rnw or .Snw file instead; if using RMarkdown you can send the .Rmd file.)