COV878 is a 1-Credit Special Module in Machine Learning. This is a seminar-style course which explores the statistical foundations of machine learning, specifically in the finite sample regime. The primary objective is to rigorously analyze the performance of learning algorithms by examining their generalization ability, sample complexity, and convergence properties.
This course is intended for (post)graduate and advanced undergraduate students interested in the theoretical foundations of machine learning. PhD students specializing in machine learning are particularly encouraged to integrate the concepts learned here into their research. A basic level of mathematical maturity is expected; students with concerns about their background should consult the instructor.
A basic understanding of probability theory, linear algebra, and calculus is required for this course. Prior experience or familiarity with machine learning is also beneficial, though not mandatory.
Upon completing this course, students will be familiar with key concepts and proof techniques from statistical learning theory, information theory, and optimization, enabling them to:
Derive generalization bounds for learning problems
Analyze the sample complexity of learning algorithms
Understand convergence rates of learning algorithms
This course will primarily focus on the non-asymptotic analysis of learning algorithms. Key topics include concentration bounds, empirical risk minimization, PAC-Bayes theory, Rademacher and Gaussian complexity measures, Karush-Kuhn-Tucker (KKT) conditions, primal-dual witness techniques, convergence rates, restricted strong convexity, and Fano’s inequality, among others.
Deliverables from the instructor: This will be a seminar-type 1-credit module. I will cover one topic over the course of one to two lectures. The students are expected to do some background reading on the topics covered in the class. The plan is to cover most of the topics before mid-term (2-3 hours a week for the first 5-6 weeks).
Deliverables from the students:
Each student is expected to conduct a paper review, prepare a summary, and deliver a classroom presentation. While the paper summary and presentation are due after the mid-term, students are encouraged to begin working on them from the very onset of the course.
Each student will also be responsible for scribing one lecture, with the scribe due within one week of the lecture date.
There will be no traditional exams. Final grades will be determined by the quality of the paper summary, presentation, lecture scribing, and class participation.
Attendance is mandatory and will count towards class participation.
Scribe | 20% |
Paper summary report | 20% |
Presentation | 40% |
In-class participation | 20% |
Each student will be assigned a lecture to scribe.
Scribing must be done in LaTeX. The required template file can be found here. Students can also refer to the LaTex source code for the first lecture.
Paper summary report will provide a summary of the paper. The report must include the following headers:
Summary of the contributions
Strengths
Weaknesses
Main results
Challenges and technical innovations in the analysis
Open questions
The grade for in-class participation will be based on both attendance and active engagement in class discussions.
The first five minutes of each lecture will be devoted to reviewing the summary and unsolved exercises from the previous lecture, providing an opportunity for students to contribute to the discussion.
Collaboration and discussion among students are strongly encouraged. However, all deliverables (including the scribed lecture, paper summary report, and presentation) will be graded individually. You may acknowledge your collaborators by including their names in the acknowledgement section.
Plagiarism in any form (including the use of AI tools) is strictly prohibited. Any violation of this policy will result in a score of 0 for the entire course.
Lecture | Topic | Remark |
---|---|---|
Lecture 1 | Markov’s Inequality and Applications | |
Lecture 2 | Hoeffding’s Inequality and Applications | |
Lecture 3 | Fano’s Inequality and Applications | |
Lecture 4 | (TBD) | |
Lecture 5 | (TBD) | |
Lecture 6 | (TBD) | |
Lecture 7 | (TBD) | Review paper – title and abstract due |
Lecture 8 | (TBD) | |
Lecture 9 | (TBD) | |
Lecture 10 | (TBD) | |
Lecture 11 | (TBD) | |
Lecture 12 | (TBD) | Paper summary report due |
Lecture 13 | Final Presentation | |
Lecture 14 | Final Presentation |
Scribe is due within one week of the lecture date.
Cover, Thomas M. 1999. Elements of Information Theory. John Wiley & Sons.
Duchi, John, and Yoram Singer. 2009. “Efficient Online and Batch Learning Using Forward Backward Splitting.” The Journal of Machine Learning Research 10: 2899–2934.
Germain, Pascal, Alexandre Lacasse, Francois Laviolette, Mario Marchand, and Jean-Francis Roy. 2015. “Risk Bounds for the Majority Vote: From a Pac-Bayesian Analysis to a Learning Algorithm.” Journal of Machine Learning Research (JMLR).
Kakade, Sham M, Karthik Sridharan, and Ambuj Tewari. 2008. “On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization.” Advances in Neural Information Processing Systems 21.
McAllester, David. 2007. “Generalization Bounds and Consistency.” Predicting Structured Data, 247–61.
Negahban, Sahand N, Pradeep Ravikumar, Martin J Wainwright, and Bin Yu. 2012. “A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers.” Advances in Neural Information Processing Systems.
Raskutti, Garvesh, Martin J Wainwright, and Bin Yu. 2010. “Restricted Eigenvalue Properties for Correlated Gaussian Designs.” The Journal of Machine Learning Research 11: 2241–59.
Tropp, Joel A. 2012. “User-Friendly Tail Bounds for Sums of Random Matrices.” Foundations of Computational Mathematics 12 (4): 389–434.
Wainwright, Martin J. 2009. “Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using L1-Constrained Quadratic Programming (Lasso).” IEEE Transactions on Information Theory 55 (5): 2183–2202.
Wainwright, Martin J. 2019. "High-Dimensional Statistics: A Non-Asymptotic Viewpoint". Cambridge university press.