COV878: Special Module in Machine Learning

COV878 is a 1-Credit Special Module in Machine Learning. This is a seminar-style course which explores the statistical foundations of machine learning, specifically in the finite sample regime. The primary objective is to rigorously analyze the performance of learning algorithms by examining their generalization ability, sample complexity, and convergence properties.

Who should take the course?

This course is intended for (post)graduate and advanced undergraduate students interested in the theoretical foundations of machine learning. PhD students specializing in machine learning are particularly encouraged to integrate the concepts learned here into their research. A basic level of mathematical maturity is expected; students with concerns about their background should consult the instructor.

Expected background

A basic understanding of probability theory, linear algebra, and calculus is required for this course. Prior experience or familiarity with machine learning is also beneficial, though not mandatory.

Learning outcomes

Upon completing this course, students will be familiar with key concepts and proof techniques from statistical learning theory, information theory, and optimization, enabling them to:

Content

This course will primarily focus on the non-asymptotic analysis of learning algorithms. Key topics include concentration bounds, empirical risk minimization, PAC-Bayes theory, Rademacher and Gaussian complexity measures, Karush-Kuhn-Tucker (KKT) conditions, primal-dual witness techniques, convergence rates, restricted strong convexity, and Fano’s inequality, among others.

Course plan

Grading

Scribe

Paper Summary Report

Paper summary report will provide a summary of the paper. The report must include the following headers:

In-class participation

Collaboration and use of AI Tools

Lectures and Some Important Deadlines

Reference Reading Materials

Hour	Lecture	Topic	Remark
1 (14/08)	Lecture 1	Markov’s Inequality and Applications
2 (19/08)	Lecture 2	PAC Learning	Ref: (Shalev-Shwartz and Ben-David 2014)
3 (20/08)		Hoeffding’s Inequality and Applications
4 (20/08)	Lecture 3	Fano’s Inequality and Applications	Ref: (Scarlett and Cevher 2019; Cover 1999)
5 (26/08)		Fano’s Inequality and Applications (contd.)
6 (27/08)	Lecture 4	McDiarmid’s Inequality and Applications
7 (27/08)	Lecture 5	Rademacher Complexity	Ref: (Shalev-Shwartz and Ben-David 2014), Review paper – title and abstract due
8 (02/09)		Rademacher Complexity (contd.), VC Dimension	Ref: (Kakade et al., 2008)
9 (02/09)	Lecture 6	VC Dimension (contd.)	Ref: (Shalev-Shwartz and Ben-David 2014)
10 (03/09)	Lecture 7	Beyond PAC (Primal Dual Witness)	Ref: (Wainwright, Martin 2019)
11 (19/09)		Beyond PAC (contd.)	Paper summary report due
12 (23/09)		Presentation 1-5 (23/09)
13 (23, 24/09)
14 (24/09)		Presentation 6-9

Cover, Thomas M. 1999. Elements of Information Theory. John Wiley & Sons.

Duchi, John, and Yoram Singer. 2009. “Efficient Online and Batch Learning Using Forward Backward Splitting.” The Journal of Machine Learning Research 10: 2899–2934.

Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Jean-Francis Roy. 2015. “Risk Bounds for the Majority Vote: From a Pac-Bayesian Analysis to a Learning Algorithm.” Journal of Machine Learning Research (JMLR).

Kakade, Sham M, Karthik Sridharan, and Ambuj Tewari. 2008. “On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization.” Advances in Neural Information Processing Systems 21.

McAllester, David. 2007. “Generalization Bounds and Consistency.” Predicting Structured Data, 247–61.

Negahban, Sahand N, Pradeep Ravikumar, Martin J Wainwright, and Bin Yu. 2012. “A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers.” Advances in Neural Information Processing Systems.

Raskutti, Garvesh, Martin J Wainwright, and Bin Yu. 2010. “Restricted Eigenvalue Properties for Correlated Gaussian Designs.” The Journal of Machine Learning Research 11: 2241–59.

Scarlett, Jonathan, and Volkan Cevher. 2019. “An Introductory Guide to Fano’s Inequality with Applications in Statistical Estimation.” arXiv Preprint arXiv:1901.00555.

Shalev-Shwartz, Shai, and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press.

Tropp, Joel A. 2012. “User-Friendly Tail Bounds for Sums of Random Matrices.” Foundations of Computational Mathematics 12 (4): 389–434.

Wainwright, Martin J. 2009. “Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using L1-Constrained Quadratic Programming (Lasso).” IEEE Transactions on Information Theory 55 (5): 2183–2202.

Wainwright, Martin J. 2019. "High-Dimensional Statistics: A Non-Asymptotic Viewpoint". Cambridge university press.

Scribe	20%
Paper summary report	20%
Presentation	40%
In-class participation	20%