Foundations of Data Science
Weekly outline
-
Summary
We discuss a set of topics that are important for the understanding of modern data science but that are typically not taught in an introductory ML course. In particular we discuss fundamental ideas and techniques that come from probability, information theory as well as signal processing.
Content
This class presents basic concepts of Information Theory and Signal Processing and their relevance to emerging problems in Data Science and Machine Learning.
A tentative list of topics covered is:
- Information Measures
- Multi-arm Bandits
- Detection and Estimation
- Distribution Estimation, Property Testing, and Property Estimation
- Exponential Families
- Signal Representations
- Compression and Dimensionality Reduction
- Information Measures and Generalization Error
Materials
- Lecture Notes will be posted here before the beginning of the class.
Additional Material:- T. M. Cover and J. A. Thomas, Elements of Information Theory (Click to get access to the full PDF via the EPFL library). New York: Wiley. Second Edition, 2006.
- T. Lattimore and C. Szepesvari, Bandit Algorithms
Schedule
Note: Our Schedule deviates slightly from what is shown on IS-Academia.- Tuesdays:
- 11:15-12:30, BC 01 (Lecture)
- 12:30-13:15, Lunch Break
- 13:15-14:30, BC 01 (Lecture)
- 14:30-15:00, BC 01 (Solve HW Problem 1 together)
- Wednesdays:
- 13:15-15:00, GC B3 30 (Exercises)
- Exception: Wed, Sept 24: 13:15-15:00 Lecture
ED Discussion Forum
- We will use the ED Discussion Forum for this class. Everyone is strongly encouraged to make the most of this!
- Ask questions!
- Answer questions!
- The class staff will check the forum on Monday afternoon and on Thursday afternoon.
SWITCHtube Channel
We will not make new video recordings this year. You can access the videos from a couple of years ago. The content is largely the same, but the order of the topics is slightly different.
Grading
- If you do not hand in your final exam your overall grade will be NA.
- Otherwise, your grade will be determined based on the following weighted average: 10% for the Homework, 30% for the Midterm Exam, 60% for the Final Exam.
- The Midterm Exam will take place on Wednesday, November 12, 2025, 13:15-15:00
The Final Exam will take place on at some point between January 12, 2026 and January 31, 2026.
-
Sept 9: Introduction and Probability Review
Sept 10: Exercise Session (Homework 1) -
Sept 16: Information Measures
Sept 17: Exercise Session (Homework 2) -
Sept 23 : Information Measures
Sept 24 : Information Measures (Lecture, exceptionally) -
Sept 30: Information Measures
Oct 1: Exercise Session (Homework 2) -
Oct 7: Multi-Arm Bandits
Oct 8: Exercise Session (Homework 3) -
Oct 14: Multi-Arm Bandits
Oct 15: Exercise Session (Homework 3) -
No class, No exercise session
-
Oct 28: Distribution Estimation
Oct 29: Exercise Session (Homework 4) -
Nov 4: Distribution Estimation
Nov 5: Exercise Session (Homework 4) -
Nov 11: Property Testing
Nov 12, 13:15-15:00: Midterm Exam -
Nov 18: Exponential Families
Nov 19: Exercise Session (Homework 5) -
Nov 25: Signal Representations
Nov 26: Exercise Session (Homework 6) -
Dec 2: Signal Representations
Dec 3: Exercise Session (Homework 6) -
December 9: Compression: Dimensionality Reduction (Random Projections), then classic data compression
December 10: Exercise Session (Homework 7) -
December 16: Information-theoretic perspective on Generalizaton of Learning Algorithms
December 17: Exercise Session (Homework 7)