Optimization for Machine Learning
COL870/COL8385 is a 3-credit Special Topics course in Machine Learning. The course will cover topics in optimization in both offline and online settings. The material will be motivated throughout by applications to modern machine learning problems, and will include both foundational ideas and advanced topics.
This course is intended for both (post)graduate and undergraduate students interested in the optimization foundations of machine learning. A basic level of mathematical maturity is expected; students with concerns about their background should consult the instructor.
A fundamental understanding of linear algebra, calculus, and probability theory is required for this course. Prior experience or familiarity with machine learning is also beneficial, though not mandatory.
Upon completing this course, students will be familiar with key concepts in optimization, enabling them to:
Understand the fundamental concepts such as convex sets, convex functions and optimality criteria for the optimization problems
Understand first-order optimization algorithms and analyze their convergence properties
Gain familiarity with the foundational concepts in online learning
Key topics include convex sets and functions, conjugates, subdifferentials, primal and dual problem formulations, strong and weak duality, minimax characterizations, and optimality conditions including the Karush-Kuhn-Tucker (KKT) criteria. The course will also cover first-order optimization methods such as gradient descent, stochastic gradient descent (SGD), accelerated gradient techniques, subgradient methods, and Frank-Wolfe algorithms. Additionally, foundational concepts in online learning will be explored, including online gradient descent, online mirror descent, Follow-The-Regularized-Leader (FTRL), and parameter-free algorithms.
| Midsemester Exam | 30% |
| Final Exam | 30% |
| Assignments | 25% |
| Scribe | 10% |
| In-class participation | 5% |
Minimum 75% attendance and marks equivalent to a grade B or above will be considered as audit pass.
Each student will be assigned a lecture to scribe.
Scribing must be done in LaTeX, and I will provide the required style files and template.
Scribe will be due within one week of the lecture date.
The grade for in-class participation will be based on both attendance and active engagement in class discussions.
The first five minutes of each lecture will be devoted to reviewing the summary and unsolved exercises from the previous lecture, providing an opportunity for students to contribute to the discussion.
Collaboration and discussion among students are strongly encouraged. However, all deliverables will be graded individually. You may acknowledge your collaborators by including their names in the acknowledgement section.
Plagiarism in any form (including the use of AI tools) is strictly prohibited. Any violation of this policy will result in a score of 0 for the entire course.
Convex Optimization. S. Boyd and L. Vandenberghe. Cambridge University Press, Cambridge, 2003
Introductory Lectures on Convex Optimization: A Basic Course. Y. Nesterov. Kluwer, 2004.
Numerical Optimization. J. Nocedal and S. J. Wright, Springer Series in Operations Research, Springer-Verlag, New York, 2006 (2nd edition).
A Modern Introduction to Online Learning. Francesco Orabona. arXiv preprint arXiv:1912.13213 (2019).