HKUST

MATH 5470: Statistical Machine Learning
Fall 2025


Course Information

Synopsis

This course covers several topics in statistical machine learning:


Prerequisite: Some preliminary course on (statistical) machine learning, applied statistics, and deep learning will be helpful.

Instructors:

Yuan Yao

Time and Place:

Tu 6:30-9:20pm, Lecture Theater D (LTD), HKUST

Reference (参考教材)

An Introduction to Statistical Learning, with applications in R (ISLR). By James, Witten, Hastie, and Tibshirani

ISLR-python, By Jordi Warmenhoven.

ISLR-Python: Labs and Applied, by Matt Caudill.

Manning: Deep Learning with Python, by Francois Chollet [GitHub source in Python 3.6 and Keras 2.0.8]

MIT: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Tutorials: preparation for beginners

Python-Numpy Tutorials by Justin Johnson

scikit-learn Tutorials: An Introduction of Machine Learning in Python

Jupyter Notebook Tutorials

PyTorch Tutorials

Deep Learning: Do-it-yourself with PyTorch, A course at ENS

Tensorflow Tutorials

MXNet Tutorials

Theano Tutorials

The Elements of Statistical Learning (ESL). 2nd Ed. By Hastie, Tibshirani, and Friedman

statlearning-notebooks, by Sujit Pal, Python implementations of the R labs for the StatLearning: Statistical Learning online course from Stanford taught by Profs Trevor Hastie and Rob Tibshirani.

Homework and Projects:

TBA (To Be Announced)

Schedule

Date Topic Instructor Scriber
09/02/2025, Tue Lecture 01: A Historic Overview of AI and Statistical Machine Learning. [ slides (pdf) ] Y.Y.
09/09/2025, Tue Lecture 02: Supervised Learning: linear regression and classification [ slides ] Y.Y.
09/11/2025, Thu Seminar.
  • Title: PKU Quest: AI-Powered Math Education Practice at Peking University [ announcement ] [ slides ]
  • Speaker: Leheng Chen and Zihao Liu, Peking University
  • Time: Thursday Sep 11, 2025, 3:30pm
  • Venue: Room 2612B (near Lift 31 & 32)
  • Abstract: The advent of Generative AI necessitates a paradigm shift in higher education, calling for new, diverse models of interaction between students, teachers, and AI. In response to this challenge, Peking University has developed PKU Quest, an AI-assisted platform designed to explore these new pedagogical frontiers. PKU Quest focuses on optimizing for the unique demands of mathematics education, and has developed the "Math Tutor," a tool specifically designed for math problem-solving support. Instead of providing direct answers, the Math Tutor engages students in a heuristic and exploratory dialogue, guiding them to develop independent thinking and problem-solving skills. This application has now been implemented across all foundational mathematics courses at Peking University. This presentation will share our journey in developing PKU Quest, discussing the motivations, challenges, and practical outcomes of what we consider a first step in exploring the vast potential of AI in education.
  • Bio: Leheng Chen is a Ph.D. student at the Beijing International Center for Mathematical Research (BICMR), Peking University, advised by Professor Bin Dong. He has broad interests in the application of artificial intelligence. Previously, he explored research directions in AI for Science, such as thermodynamic modeling and foundation models for partial differential equations, with his work published in Physical Review E and at an ICLR Workshop. He has since shifted his research focus to the practical application of AI in Education, where he designed and developed "PKU Quest," an AI-assisted teaching and learning platform for Peking University.
    Zihao Liu (Leo) is a Ph.D. student in Applied Mathematics and Artificial Intelligence at the School of Mathematical Sciences, Peking University. His interests span the application of AI to education and scientific understanding, with recent work focusing on improving the pedagogical effectiveness of AI-powered educational agents and building benchmark datasets for evaluating AI capabilities. As the founder and lead developer of PKU Quest and AKIS (AI Knowledge Intelligent Solution), he focuses on the practical deployment of AI-in-education systems and has helped design and develop “AIBOOKS,” an intelligent digital-textbook platform, and “Math Tutor,” a guided problem-solving assistant for students. He is deeply committed to advancing the integration of AI and education.
Y.Y.
09/16/2025, Tue Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ]
    [ Seminar ]
  • Speaker: QRT guest speakers [ poster ]
Y.Y.
09/23/2025, Tue Lecture 04: Project 1 [ pdf ] via canvas zoom and on-site class is cancelled due to typhoon signal 8.
    [ Reference ]:
  • Kaggle: Home Credit Default Risk [ link ]
  • Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ]
Y.Y.
09/27/2025, Sat Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ]
Y.Y.
09/30/2025, Tue Lecture 06: Support Vector Machines [ YY's slides ]
    [Reference]:
  • To view .ipynb files below, you may try [ Jupyter NBViewer]
  • Python Notebook for Support Vector Machines [ svm.ipynb ]

  • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin.
  • Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss leads to max margin.
  • Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto. On Early Stopping in Gradient Descent Learning. Constructive Approximation, 2007, 26 (2): 289-315. [ link ]
  • Jingfeng Wu, Peter L. Bartlett, Jason D. Lee, Sham M. Kakade, and Bin Yu. Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization. [ arXiv:2509.17251 ]
Y.Y.
10/11/2025, Sat Lecture 06+: Advanced topics on early stopping regularization.
    [ Seminar ]
  • Title: A Statistical View on Implicit Regularization: Gradient Descent Dominates Ridge [ slides ]
  • Speaker: Dr. Jingfeng WU, UC Berkeley
  • Time: 10:30am, LTF
  • Abstract: A key puzzle in deep learning is how simple gradient methods find generalizable solutions without explicit regularization. This talk discusses the implicit regularization of gradient descent (GD) through the lens of statistical dominance. Using linear regression as a clean proxy, we present three surprising findings. First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with online stochastic gradient descent (SGD). While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems -- those with fast and continuously decaying covariance spectra -- which includes all problems satisfying the standard capacity condition. This is joint work with Peter Bartlett, Sham Kakade, Jason Lee, and Bin Yu.
  • Bio: Jingfeng Wu is a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley. His research focuses on deep learning theory, optimization, and statistical learning. He earned his Ph.D. in Computer Science from Johns Hopkins University in 2023. Prior to that, he received a B.S. in Mathematics (2016) and an M.S. in Applied Mathematics (2019), both from Peking University. In 2023, he was recognized as a Rising Star in Data Science by the University of Chicago and UC San Diego.
Y.Y.
10/14/2025, Tue Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ] and OpenReview Submission Instruction for Project 1 [ slides ]
Y.Y.

by YAO, Yuan.