COV878: Special Module in Machine Learning

Instructor: Adarsh Barik
Credit: 1 (1-0-0)
Semester 1 (2025-2026)

COV878 is a 1-Credit Special Module in Machine Learning. This is a seminar-style course which explores the statistical foundations of machine learning, specifically in the finite sample regime. The primary objective is to rigorously analyze the performance of learning algorithms by examining their generalization ability, sample complexity, and convergence properties.

Who should take the course?

This course is intended for (post)graduate and advanced undergraduate students interested in the theoretical foundations of machine learning. PhD students specializing in machine learning are particularly encouraged to integrate the concepts learned here into their research. A basic level of mathematical maturity is expected; students with concerns about their background should consult the instructor.

Expected background

A basic understanding of probability theory, linear algebra, and calculus is required for this course. Prior experience or familiarity with machine learning is also beneficial, though not mandatory.

Learning outcomes

Upon completing this course, students will be familiar with key concepts and proof techniques from statistical learning theory, information theory, and optimization, enabling them to:

  1. Derive generalization bounds for learning problems

  2. Analyze the sample complexity of learning algorithms

  3. Understand convergence rates of learning algorithms

Content

This course will primarily focus on the non-asymptotic analysis of learning algorithms. Key topics include concentration bounds, empirical risk minimization, PAC-Bayes theory, Rademacher and Gaussian complexity measures, Karush-Kuhn-Tucker (KKT) conditions, primal-dual witness techniques, convergence rates, restricted strong convexity, and Fano’s inequality, among others.

Course plan

Grading

Scribe 20%
Paper summary report 20%
Presentation 40%
In-class participation 20%

Scribe

Paper Summary Report

Paper summary report will provide a summary of the paper. The report must include the following headers:

  1. Summary of the contributions

  2. Strengths

  3. Weaknesses

  4. Main results

  5. Challenges and technical innovations in the analysis

  6. Open questions

In-class participation

Collaboration and use of AI Tools

Lectures and Some Important Deadlines

Lecture Topic Remark
Lecture 1 Markov’s Inequality and Applications
Lecture 2 Hoeffding’s Inequality and Applications
Lecture 3 Fano’s Inequality and Applications
Lecture 4 (TBD)
Lecture 5 (TBD)
Lecture 6 (TBD)
Lecture 7 (TBD) Review paper – title and abstract due
Lecture 8 (TBD)
Lecture 9 (TBD)
Lecture 10 (TBD)
Lecture 11 (TBD)
Lecture 12 (TBD) Paper summary report due
Lecture 13 Final Presentation
Lecture 14 Final Presentation

Scribe is due within one week of the lecture date.

Reference Reading Materials

Cover, Thomas M. 1999. Elements of Information Theory. John Wiley & Sons.

Duchi, John, and Yoram Singer. 2009. “Efficient Online and Batch Learning Using Forward Backward Splitting.” The Journal of Machine Learning Research 10: 2899–2934.

Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Jean-Francis Roy. 2015. “Risk Bounds for the Majority Vote: From a Pac-Bayesian Analysis to a Learning Algorithm.” Journal of Machine Learning Research (JMLR).

Kakade, Sham M, Karthik Sridharan, and Ambuj Tewari. 2008. “On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization.” Advances in Neural Information Processing Systems 21.

McAllester, David. 2007. “Generalization Bounds and Consistency.” Predicting Structured Data, 247–61.

Negahban, Sahand N, Pradeep Ravikumar, Martin J Wainwright, and Bin Yu. 2012. “A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers.” Advances in Neural Information Processing Systems.

Raskutti, Garvesh, Martin J Wainwright, and Bin Yu. 2010. “Restricted Eigenvalue Properties for Correlated Gaussian Designs.” The Journal of Machine Learning Research 11: 2241–59.

Tropp, Joel A. 2012. “User-Friendly Tail Bounds for Sums of Random Matrices.” Foundations of Computational Mathematics 12 (4): 389–434.

Wainwright, Martin J. 2009. “Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using L1-Constrained Quadratic Programming (Lasso).” IEEE Transactions on Information Theory 55 (5): 2183–2202.

Wainwright, Martin J. 2019. "High-Dimensional Statistics: A Non-Asymptotic Viewpoint". Cambridge university press.