Statistics for genomic data analysis
Weekly outline
-
-
To contact me (24/24):
email: darlene.goldstein@epfl.ch
tel/sms/whatsapp/signal: 079 427 2501
skype: darlenegoldstein
Course format:
Although we are currently allowed to attend the course on campus, ALL LECTURES WILL ALSO BE PRERECORDED (from a different year) , since some of you may be unable to attend in person from time to time. Please follow the lecture before the office hours or lab time so that you can ask any additional questions then.
Course / Lab times:
Most weeks we will have lecture Thursday 8.00 - 10.00 and lab 10.15 - 12.00, both in MA B1 11. Some weeks we will have a shorter lab time and start lecture later.
Office hours:
I will be available for your questions each week Thursday 12.00-13.00 (after class) in my office MA B1 477 and also by appointment (in person or zoom).
Course language:
This course is given in English, but feel free to speak in either English or French.
Organization: Your course note will be based on 2 short TP reports (1/2 point each) and an individual report (up to AT MOST 10 pages, not including references or large figures, 5 points). You will report on an analysis of genomic (microarray) data where there will be 2 tasks: to identify genes that are differentially expressed between 2 conditions and to carry out a cluster analysis to identify (potentially novel) subgroups.
The purpose of this course is to help you to learn something without too much stress!! The TP reports get credit as long as you turn in a reasonable effort. You can also do the individual report twice: a preliminary version, which will be commented according to the posted criteria, then a final version, where you can incorporate the comments, due at the end of the semester. Only the final version will count towards your course note. The deadlines will be posted on the course moodle page.
In order to give you time to work on your reports, there will be no in-person lectures and mainly optional topics toward the end of the course. These 'extra' topics are NOT required, there are slides (and possibly videos) in case you are interested. There is no penalty associated with not following them.
Resources:
A useful book for both statistics and R:- A Handbook of Statistical Analyses Using R, 3rd edition. Torsten Hothorn and Brian S. Everitt. CRC Press.
Some resources to get you started with R, R Studio and R Markdown:
-
Repository of R packages, you can download R from here. Also a good source of documentation (see 'Contributed' under the Documentation heading).
-
RStudio makes R easier to use. It includes a code editor, debugging & visualization tools. Choose the free desktop version corresponding to your computer and operating system.
-
Tutorials and examples for reproducible research using R:
-
Accessible from your EPFL account
-
-
Week 1: Molecular biology and technology background
VIDEO ONLY - NO IN-PERSON MEETING -
Week 2: Quantifying expression for Affy chips (RMA); IDE: Identifying differentially expressed (DE) genes
-
You do not need to read all of this!! But Chapter 1 might be helpful for better understanding of the molecular biological background and biotechnological aspects of Affymetrix GeneChips and experimentsl. To get a more detailed explanation of the RMA background adjustment, see pages 16-21.
-
This corresponds to chapter 3 of the BioConductor Case Studies book.
-
If the link in the TP to DAFLcel.zip does not work for you, you can download it here.
-
Week 3: Quality assessment for Affy chips; robust regression and affyPLM
-
Co-authored by your fearless leader (me!!)
-
CLASS AND LAB CANCELLED
-
Week 5: Experimental design; linear modeling
-
Venables + Ripley, MASS ch. 6 (especially 6.2, 6.7) (MASS = Modern Applied Statistics with S)
-
Week 6: Hypothesis testing review; multiple testing; permutation test
-
Venables + Ripley, MASS ch. 4
-
If you are having problems setting margins in a latex document, have a look at this - it shows all of the layout parameters on a page
-
Week 7: Cluster analysis
-
To get comments on this practice exam, please deposit your draft by 1 May (any time).
-
Week 8: Classification (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me
-
Optional classification activity
-
Optional classification activities
-
Week 9: Annotation; Gene set testing (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me
-
Easter holiday
NO LECTURE OR LAB THIS WEEK;
NO OFFICE HOURS THIS WEEK
-
Week 10: Introduction to sequencing data, RNA-seq; generalized linear models (GLMs) (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)NO OFFICE HOURS THIS WEEK - if you have questions, please email me
-
Please deposit your practice exam here.
-
Week 11: Sequence data; DE for RNA-seq data (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me
-
Week 12: Genetic association studies/GWAS (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on TP 7 (practice exam)
NO OFFICE HOURS THIS WEEK - if you have questions, please email me
-
Week 12: Miscellaneous topics (optional)
NOTE: NO LECTURE OR LAB THIS WEEK: time to work on exam
NO OFFICE HOURS THIS WEEK - if you have questions, please email me -