Weekly outline

    • To contact me (24/24):
      email: darlene.goldstein@epfl.ch
      tel/sms/whatsapp/signal: 079 427 2501
      skype: darlenegoldstein

      Course format: 

      Although we are currently allowed to attend the course on campus, ALL LECTURES WILL ALSO BE PRERECORDED, since some of you may be unable to attend in person from time to time. Please follow the lecture before the 'office hours' or lab time so that you can ask any additional questions then.

      Office hours: I will be available for your questions each week Fridays 12.00-13.00 in my office MA B1 477 and also by appointment.

      The lab time is Tuesday 16.00-18.00 in CO3. I understand that there will be some class conflicts, so I will try to find a new room for those of you who are unable to attend at the assigned time. For the first week, in any case, please try to attend either my office hours or the lab time, and we will figure out a solution to any conflict.

      Course language: 

      This course is given in English, but feel free to speak in either English or French.

      Resources:
      A useful book for both statistics and R:

      • A Handbook of Statistical Analyses Using R, 3rd edition. Torsten Hothorn and Brian S. Everitt. CRC Press.
      Some resources to get you started with R, R Studio and R Markdown:
    • Repository of R packages, you can download R from here. Also a good source of documentation (see 'Contributed' under the Documentation heading).
    • RStudio makes R easier to use. It includes a code editor, debugging & visualization tools. Choose the free desktop version corresponding to your computer and operating system.
    • Tutorials and examples for reproducible research using R:
    • Forum for students to find group members. Once you have formed a group, please send me 1 email containing the names of all group members. As a reminder, your group can contain 1-4 persons.
  • 19 February

    Week 1: Course organization, reproducible research, confidence interval / hypothesis testing review

    Organization: you will write a short group report (~ 5-7 pages; a 'group' can be 1-4 persons), a short group article critique (1 page, it can be in question/answer format), and a longer individual report (up to ~ 7-10 pages). The 2 reports will be about data analyses you carry out. The group data set will be assigned to you. For the individual report, you can choose a topic from a list that I will provide once we have covered all the eligible topics in lecture. I will announce when you can email me your choice, so please do not send me an email earlier than that. Once you email me your choice, I will assign you a data set on that topic.

    The purpose of this course is to help you to learn something without too much stress!! That is why you can do each of the 2 reports twice: a preliminary version, which will be commented according to posted criteria, then a final version, where you can incorporate the comments, due at the end of the semester. Only the final version will count towards your course note. The deadlines will be posted on the course moodle page. 

    For the article critique, you will get the 1/2 point (full credit) as long as you submit it by the deadline - you don't need to do a preliminary version.

    In order to give you time to work on your reports, there will be no in-person lectures and mainly optional topics toward the end of the course. These 'extra' topics are NOT required, there are slides (and possibly videos) in case you are interested. There is no penalty associated with not following them.

    NOTE: this first week's LECTURE is ONLINE ONLY, there will be NO in-class lecture. Please come to the lab meeting in CO3 on Tuesday afternoon for the EDA presentation and to get started with R and RStudio.

    Grading

    • 1/2 point: short report 1 (either regression or anova), can be in a group of up to 4 people
    • 1/2 point: short critique on a scientific article (will be assigned to you), can be in a group of up to 4 people
    • 5 points: individual analysis report (your choice among a number of topics)
  • 26 February

    Week 2: Linear regression modeling

    You can already email me your groups (1 email per group); remember, each group can contain 1-4 persons. Each group will be assigned to analyze EITHER a regression data set OR an anova data set.
  • 4 March

    Week 3: Experimental design, Analysis of variance (anova)

    Report 1: (initial/preliminary deadline Friday 12 April - any time)

    The purpose of this assignment is to give you practice writing a scientific report. Report writing is an extremely important skill, regardless of whether you continue in an academic career, in government or in industry.

    You should analyze your data in an appropriate manner (either like lab week 2 for regression or lab week 3 for anova, or a combination if you have both factor and continuous explanatory variables) and write a short report, ~ 5 pages (7 pages max).

    The goal is NOT to replicate the analysis presented in the paper corresponding to the data set, so don't worry if you do something different, or obtain results that are different from the paper when you are doing the same thing that the paper seems to describe. YOU are in charge of the analyses you carry out !!

    Please submit your report as a .pdf file, (NOT .DOC, etc.) in the moodle assignment space, 1 per group. The spaces will be labeled R1, R2, A1, A2, for regression problems 1-2 and anova problems 1-2. Your file name should be labeled as XX-##.pdf, etc., where XX is your assigned problem (either R1, R2, A1, or A2) and ## is your group number.

    Your report should contain a short background/intro to the problem (including the aim of the original study), a presentation of the results of your statistical analyses, including exploratory data analysis, model fitting and final model, along with a short discussion of any shortcomings of the final model, and your conclusions. Include relevant graphics and tables, but DO NOT include any raw R code or output (you will be penalized for this if you do). Your graphs should be 'pretty', if you copy/paste a graph from the screen, it will most likely appear to be blurry (png file) and you will be penalized for this. It is easiest to include nice-looking graphs if you save a pdf version and use R Markdown, but this is not the only way.

    Please use 12 point size and margins of 2.5 cm. Please remember to number each page at the bottom (including page 1). Inside the top margin of each page, please include the surnames of each group member (separated by commas).

    Do not include a cover page, abstract, table of contents,  or EPFL logo, and do not exceed 7 pages (not including any references) or you will be penalized.

    Your report will also be graded based on language use and overall presentation. (It can be in either English or French.)

    As a reminder, this report counts for 1/2 point (out of 6) of your course note.

    The initial deadline 12 April, any time) is for your preliminary report. The final version is due by 30 June (any time).

    If you turn in your report before the initial deadline then we will be able to comment on your report and you can re-do it before the final deadline. If you need to turn it in later, that's ok, I should still have enough time to comment it for you, so.... NO STRESS !!!!!

    When you email me with the names of your group members I will send you the dataset (after Lab 3).
  • 11 March

    Week 4: Model selection
  • 18 March

    Week 5CLASS CANCELLED - no lecture / no lab this week
  • 25 March

  • 31 March - 6 April - PÂQUES / EASTER

    NO CLASS OR LAB - PÂQUES / EASTER

  • 8 April

    Week 7: Survival analysis 

    Second assignment  This assignment is a statistical critique of a published paper. Your report can either be written as a full review or in a question/answer format by just simply by responding to each question. Your report should not be more than 1 page.

    You can turn in this report any time before the final deadline - 30 June 2024. You will get full credit (i.e. 1/2 point toward your course note) for turning in a reasonable effort.

    There is a deposit slot near the bottom of the course moodle page for you to submit your report.

    Groups who worked on regression problems:
    L1: http://www.jcancer.org/v09p1421.htm

    Groups who worked on anova problems:
    L2: https://www.sciencedirect.com/science/article/pii/S1743919118307337

    A guide sheet (study assessment questions) is uploaded to help you to address statistical issues.

    The file contains a longer list of questions to consider when evaluating a study in your future career. As a guide for your 2nd assignment report, please make sure that you respond particularly to the following: (numbers in parentheses represent points out of 6)

    (1) 1. Briefly give the biomedical background for the paper. What question/hypothesis is being investigated?

    (1) 2. What data are collected (include how many individuals, what variables, inclusion / exclusion criteria for the study)?

    (1) 3. What analyses were carried out? Are these analyses appropriate for the problem?

    (1) 4. What other analyses should have been done (or might have been done but not shown)? Explain.

    (1) 5. Is there any mention of power of the analyses? How would you go about trying to estimate power? 
    (NOTE: you do NOT have to actually give power estimates, just say how you might go about it.)

    (1) 6. What conclusions do the authors draw? Are these conclusions substantiated by the results? Explain.

  • 12 April - Deposit Prelim Report 1

    Please deposit only 1 report per group.
    • Please deposit your first group assignment here if you did Regression problem 1, as a pdf file named R1-## , where ## is your group number. The preliminary due date is 12 April (any time).
    • Please deposit your first group assignment here if you did Regression problem 1, as a pdf file named R2-## , where ## is your group number. The preliminary due date is 12 April (any time).
    • Please deposit your first group assignment here if you did Regression problem 1, as a pdf file named A1-## , where ## is your group number. The preliminary due date is 12 April (any time).
    • Please deposit your first group assignment here if you did Regression problem 1, as a pdf file named R1-## , where ## is your group number. The preliminary due date is 12 April (any time).
    • You can turn in any LATE prelim Report 1 here (R1 / R2 / A1 / A2).
  • 15 April

    This week
    Week 8Discrete data analysis, contingency tables, 2x2 tables; data visualization; asymptotic and exact tests
  • 22 April

    Week 9: Genetic association studies, genome-wide association studies (GWAS); principal components analysis, multiple hypothesis testing

    NOTE: There will be NO CLASS today.

    This week's labs are OPTIONAL and there will be NO LAB MEETING; you might want to have a look at them though if you choose to do a GWAS as your individual report.
  • 22 April - Individual Report Topic Choice

    Choose your individual project topic from one of the following:
    • survival analysis
    • logistic regression
    • generalized linear model (other than logistic, e.g. Poisson)
    • discrete data / contingency table analysis
    • genome-wide association study (GWAS)

    and EMAIL ME your choice (please follow the email instructions in the announcement). I will then send you a dataset for analysis (or you can start working on the GWAS tutorial if you are doing a GWAS).

    Your final report should be ~7-10 pages (absolute maximum, not including references; fewer pages is better if you can be concise).


    The preliminary deadline is Friday 17 May (any time), then we will give you feedback in 1-2 weeks. You should then have a few more weeks to work on it before the final deadline of 30 June (any time).

    NOTE: As a reminder, you MUST work on this individual analysis and report ALONE. Your analysis and report should represent YOUR OWN WORK. DO NOT COMMUNICATE WITH ANYONE in ANY WAY about this project. If you have ANY question or problem, please ask ONLY ME and NOT anyone else.

    I will consider ANY violation of this policy as PLAGIARISM (PLAGIAT) and will report any suspicion of plagiarism/plagiat to the Vice-présidence académique – Affaires juridiques. I have reported previous students who have been sanctioned for violating this rule, so please DO NOT TEST ME ON THIS.

    If you have ANY questions, please don't hesitate to ask ME and ONLY ME. Do not risk your course note or your EPFL career by asking or communicating with any student.
  • 29 April

    Week 10: Clinical trials (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 6 May

    Week 11: Meta-analysis (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 13 May

    Week 12: Introduction to mixed-effects models (OPTIONAL)

    NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.

  • 17 May - Deposit Prelim Report 3 (Individual)

    Please name your preliminary report as follows:
          lastname-topic-prelim.pdf
    (for example, if I were doing survival analysis, my report would be named
    goldstein-survival-prelim.pdf).

    As a reminder, the possible topic names are:
                logistic; glm; survival; discrete; gwas.
  • 20 May - HOLIDAY - NO CLASS OR LAB

    Week 13Monday 20 May - NO CLASS  (férié / holiday); NO LAB THIS WEEK.

  • 27 May

  • Group Report 2 - Deposit slots