Foundations of Data Science
Weekly outline
-
Summary
We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas and techniques that come from probability, information theory as well as signal processing.
Content
This class presents basic concepts of Information Theory and Signal Processing and their relevance to emerging problems in Data Science and Machine Learning.
A tentative list of topics covered is:
- Information Measures
- Multi-arm Bandits
- Detection and Estimation
- Distribution Estimation, Property Testing, and Property Estimation
- Exponential Families
- Signal Representations
- Compression and Dimensionality Reduction
- Information Measures and Generalization Error
Materials
- Lecture Notes (Version Sept 5). Note: Check for updates on a semi-regular basis.
Additional Material:- T. M. Cover and J. A. Thomas, Elements of Information Theory (Click to get access to the full PDF via the EPFL library). New York: Wiley. Second Edition, 2006.
- T. Lattimore and C. Szepesvari, Bandit Algorithms
Schedule
Classes:
- Tuesday 11:15-13:00 (CE 1 104)
- Thursday 17:15-19:00 (INF 1)
- Tuesday 13:15-15:00 (CE 1 104)
ED Discussion Forum
- We will use the ED Discussion Forum for this class. Everyone is strongly encouraged to make the most of this!
- Ask questions!
- Answer questions!
- The class staff will check the forum on Monday afternoon and on Thursday afternoon.
SWITCHtube Channel
We will not make new video recordings this year. You can access the videos from a couple of years ago. The content is largely the same, but the order of the topics is slightly different.
Grading
- If you do not hand in your final exam your overall grade will be NA.
- Otherwise, your grade will be determined based on the following weighted average: 10% for the Homework (specifically, 9% for the graded Homework sets and 1% for your activity on the ED forum), 30% for the Midterm Exam, 60% for the Final Exam.
- The Midterm Exam will take place on Thursday, November 14, 2024, 17:15-19:00
The Final Exam will take place on at some point between January 13, 2025 and February 1, 2025.
-
Sept 10: Introduction and Probability Review
Sept 12: Information Measures -
Sept 17: Information Measures
Sept 19: Information Measures -
Sept 24 : Information Measures
Sept 26 : Multi-Arm Bandits -
Oct 1 : Multi-Arm Bandits
Oct 3 : Multi-Arm Bandits -
Oct 8 : Multi-Arm Bandits
Oct 10 : Detection & Estimation -
Oct 15: Detection and Estimation
Oct 17: Parameter estimation, Fisher Information, Cramer-Rao Lower Bound -
Fall Break - no class, no exercise session
-
Oct 29: Distribution Estimation
Oct 31: Distribution Estimation -
Nov 5: Distribution Estimation
Nov 7: Property Testing -
Nov 12 : Exponential Families
Nov 14 : Midterm Exam -
Nov 19: Exponential Families
Nov 21: Exponential Families -
Nov 26 Signal Representations: Linear Algebra Review++ (chapter 3)
Nov 28 Signal Representations: Fourier & Hilbert -
Dec 3: Signal Representations: Time-Frequency perspective
Dec 5: Compression: Dimensionality Reduction (PCA and Random Projections) -
December 10: Compression: Dimensionality Reduction (Random Projections), then classic data compression
December 12: Compression: Classic data compression -
December 17: Information-theoretic perspective on Generalizaton of Learning Algorithms
December 19: Overview of the class, followed by Outlook.