MATH 5470: Statistical Machine Learning

HKUST

MATH 5470: Statistical Machine Learning
Fall 2025

Course Information

Synopsis

This course covers several topics in statistical machine learning:

1. supervised learning (linear and nonlinear models, e.g. trees, support vector machines, deep neural networks),

2. unsupervised learning (dimensionality reduction, cluster trees, generative models, generative adversarial networks),

3. reinforcement learning (markov decision process, deep rl).

Prerequisite: Some preliminary course on (statistical) machine learning, applied statistics, and deep learning will be helpful.

Instructors:

Time and Place:

Tu 6:30-9:20pm, Lecture Theater D (LTD), HKUST

Reference (参考教材)

An Introduction to Statistical Learning, with applications in R (ISLR). By James, Witten, Hastie, and Tibshirani

ISLR-python, By Jordi Warmenhoven.

ISLR-Python: Labs and Applied, by Matt Caudill.

Manning: Deep Learning with Python, by Francois Chollet [GitHub source in Python 3.6 and Keras 2.0.8]

MIT: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Tutorials: preparation for beginners

Python-Numpy Tutorials by Justin Johnson

scikit-learn Tutorials: An Introduction of Machine Learning in Python

Jupyter Notebook Tutorials

PyTorch Tutorials

Deep Learning: Do-it-yourself with PyTorch, A course at ENS

Tensorflow Tutorials

MXNet Tutorials

Theano Tutorials

The Elements of Statistical Learning (ESL). 2nd Ed. By Hastie, Tibshirani, and Friedman

statlearning-notebooks, by Sujit Pal, Python implementations of the R labs for the StatLearning: Statistical Learning online course from Stanford taught by Profs Trevor Hastie and Rob Tibshirani.

Homework and Projects:

TBA (To Be Announced)

Schedule

Date	Topic	Instructor	Scriber
09/02/2025, Tue	Lecture 01: A Historic Overview of AI and Statistical Machine Learning. [ slides (pdf) ]	Y.Y.
09/09/2025, Tue	Lecture 02: Supervised Learning: linear regression and classification [ slides ] [ Reference ] To view .ipynb files below, you may try [ Jupyter NBViewer] Linear Regression Python Notebook [ MAFS6010_Regression.ipynb ] Linear Classification Python Notebook [ MAFS6010_Classification.ipynb ]	Y.Y.
09/11/2025, Thu	Seminar. Title: PKU Quest: AI-Powered Math Education Practice at Peking University [ announcement ] [ slides ] Speaker: Leheng Chen and Zihao Liu, Peking University Time: Thursday Sep 11, 2025, 3:30pm Venue: Room 2612B (near Lift 31 & 32) Abstract: The advent of Generative AI necessitates a paradigm shift in higher education, calling for new, diverse models of interaction between students, teachers, and AI. In response to this challenge, Peking University has developed PKU Quest, an AI-assisted platform designed to explore these new pedagogical frontiers. PKU Quest focuses on optimizing for the unique demands of mathematics education, and has developed the "Math Tutor," a tool specifically designed for math problem-solving support. Instead of providing direct answers, the Math Tutor engages students in a heuristic and exploratory dialogue, guiding them to develop independent thinking and problem-solving skills. This application has now been implemented across all foundational mathematics courses at Peking University. This presentation will share our journey in developing PKU Quest, discussing the motivations, challenges, and practical outcomes of what we consider a first step in exploring the vast potential of AI in education. Bio: Leheng Chen is a Ph.D. student at the Beijing International Center for Mathematical Research (BICMR), Peking University, advised by Professor Bin Dong. He has broad interests in the application of artificial intelligence. Previously, he explored research directions in AI for Science, such as thermodynamic modeling and foundation models for partial differential equations, with his work published in Physical Review E and at an ICLR Workshop. He has since shifted his research focus to the practical application of AI in Education, where he designed and developed "PKU Quest," an AI-assisted teaching and learning platform for Peking University. Zihao Liu (Leo) is a Ph.D. student in Applied Mathematics and Artificial Intelligence at the School of Mathematical Sciences, Peking University. His interests span the application of AI to education and scientific understanding, with recent work focusing on improving the pedagogical effectiveness of AI-powered educational agents and building benchmark datasets for evaluating AI capabilities. As the founder and lead developer of PKU Quest and AKIS (AI Knowledge Intelligent Solution), he focuses on the practical deployment of AI-in-education systems and has helped design and develop “AIBOOKS,” an intelligent digital-textbook platform, and “Math Tutor,” a guided problem-solving assistant for students. He is deeply committed to advancing the integration of AI and education.	Y.Y.
09/16/2025, Tue	Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Model Selection (Subset, Ridge, Lasso, and Principal Component Regression) [ Selection.ipynb ] [ Seminar ] Speaker: QRT guest speakers [ poster ]	Y.Y.
09/23/2025, Tue	Lecture 04: Project 1 [ pdf ] via canvas zoom and on-site class is cancelled due to typhoon signal 8. [ Reference ]: Kaggle: Home Credit Default Risk [ link ] Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ]	Y.Y.
09/27/2025, Sat	Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Decision Trees, Bagging, Random Forests and Boosting [ tree.ipynb ]	Y.Y.
09/30/2025, Tue	Lecture 06: Support Vector Machines [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Support Vector Machines [ svm.ipynb ] Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin. Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss leads to max margin. Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto. On Early Stopping in Gradient Descent Learning. Constructive Approximation, 2007, 26 (2): 289-315. [ link ] Jingfeng Wu, Peter L. Bartlett, Jason D. Lee, Sham M. Kakade, and Bin Yu. Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization. [ arXiv:2509.17251 ]	Y.Y.
10/11/2025, Sat	Lecture 06+: Advanced topics on early stopping regularization. [ Seminar ] Title: A Statistical View on Implicit Regularization: Gradient Descent Dominates Ridge [ slides ] Speaker: Dr. Jingfeng WU, UC Berkeley Time: 10:30am, LTF Abstract: A key puzzle in deep learning is how simple gradient methods find generalizable solutions without explicit regularization. This talk discusses the implicit regularization of gradient descent (GD) through the lens of statistical dominance. Using linear regression as a clean proxy, we present three surprising findings. First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with online stochastic gradient descent (SGD). While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems -- those with fast and continuously decaying covariance spectra -- which includes all problems satisfying the standard capacity condition. This is joint work with Peter Bartlett, Sham Kakade, Jason Lee, and Bin Yu. Bio: Jingfeng Wu is a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley. His research focuses on deep learning theory, optimization, and statistical learning. He earned his Ph.D. in Computer Science from Johns Hopkins University in 2023. Prior to that, he received a B.S. in Mathematics (2016) and an M.S. in Applied Mathematics (2019), both from Peking University. In 2023, he was recognized as a Rising Star in Data Science by the University of Chicago and UC San Diego.	Y.Y.
10/14/2025, Tue	Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ] and OpenReview Submission Instruction for Project 1 [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] LeNet5 for MNIST dataset in Pytorch Notebook [ LeNet5_mnist.ipynb ] LeNet5 for Cifar10 dataset in Pytorch Notebook [ LeNet5_cifar10.ipynb ] AlexNet for Cifar10 dataset in Pytorch Notebook [ AlexNet_cifar10.ipynb ] Original AlexNet source codes in Computer History Museum [ github ] ResNet for Cifar10 dataset in Pytorch Notebook [ ResNet_cifar10.ipynb ] Fine-tuning (transfer learning) of ResNet in Pytorch Notebook [ finetuning_resnet.ipynb ] Visualization of VGG16 in Pytorch Notebook [ vgg16-visualization.ipynb ] Class activation heatmap of VGG16 in Pytorch Notebook [ vgg16-heatmap.ipynb ] Neural Style of HKUST at Starry Night in Pytorch Notebook [ neural_style_starry-hkust.ipynb ] Adversarial examples of LeNet5 with MNIST [ LeNet5_mnist_fgsm.ipynb ] Website of Openreview	Y.Y.
10/28/2025, Tue	Lecture 08: An Introduction to Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Attention and Transformer [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Character-level RNN, LSTM and GRU for Name Classification [ char_rnn_classification_tutorial.ipynb ] RNN for generating Shakespeare's Sonnet [ rnn.ipynb ] [ shakespeare.txt ] LSTM for generating Shakespeare's Sonnet [ rnn_lstm_shakespeare.ipynb ] Generating Shakespeare's Sonnet: RNN, LSTM, Bidirectional LSTM, and Momentum-LSTM [ rnn_lstm_BiLSTM_mlstm_shakespeare.ipynb ] [ rnn_lstm_biLSTM_shakespeare.ipynb ] [ shakespeare.txt ] Bidirectional RNN for MNIST in pytorch [ bidirection_lstm_mnist.ipynb ] [ Youtube ] Illustrated Transformer by Jay Alammar: [ link ] The Annotated Transformer Tutorial by Sasha Rush: [ link ] Nobel Prize in Physics 2024 Jorgen Schmidhuber's Critique: The #NobelPrizeinPhysics2024 for Hopfield & Hinton rewards plagiarism and incorrect attribution in computer science. It's mostly about Amari's "Hopfield network" and the "Boltzmann Machine." 1. The Lenz-Ising recurrent architecture with neuron-like elements was published in… — Jürgen Schmidhuber (@SchmidhuberAI) October 9, 2024 Geoff Hinton's response to Schmidhuber's critique: Geoff Hinton's response to Schmidhuber's critique on r/ML https://t.co/kUIznzbRBh https://t.co/joNcuzJnFk pic.twitter.com/3HWbfT5T9F — hardmaru (@hardmaru) April 23, 2020 Jorgen Schmidhuber: [ Deep Learning: Our Miraculous Year 1990-1991 ] [ Critique of Honda Prize for Dr. Hinton ]	Y.Y.
11/04/2025, Tue	Lecture 9: Transformer and Applications [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Pytorch Twitter Sentiment Analysis: RNN, LSTM, BiLSTM, Multihead Self-Attention. [ RNN (ipynb) ] [ LSTM (ipynb) ] [ BiLSTM (ipynb) ] [ BiLSTM with Multihead Attention (ipynb) ] [ BERT embedding with BiLSTM (ipynb) ] [ Sentiment140 dataset ] Pytorch Sentiment Analysis with IMDB data: RNN, (bi)-LSTM, CNN, Transformer, BERT, etc. [ GitHub ] Illustrated Transformer by Jay Alammar: [ link ] The Annotated Transformer Tutorial by Sasha Rush: [ link ] BERT generation of Shakespeare's sonnet: [ BERT_shakespeare_gen.ipynb ] BERT next sentence generation of Shakespeare's sonnet and Chinese poems: [ BERT_shakespeare_nextsen.ipynb ] Chinese BERT (Whole-Word-Masking): [ link ] [ Seminar ] Title: Transformers As Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. [ slides ] [ video ] Speaker: Prof. Song MEI, University of California at Berkeley. Abstract: Neural sequence models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving in-context algorithm selection, akin to what a statistician can do in real life -- A single transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures. Bio: Song Mei is an Assistant Professor in the Department of Statistics and the Department of Electrical Engineering and Computer Sciences at UC Berkeley. In June 2020, he received Ph.D. from Stanford, with Prof. Andrea Montanari. Song's research is motivated by data science and AI, and lies at the intersection of statistics, machine learning, information theory, and computer science. His current research interests include language models and diffusion models, theory of deep learning, theory of reinforcement learning, high dimensional statistics, quantum algorithms, and uncertainty quantification. Song received Sloan Research Fellowship in 2025 and NSF career award in 2024. Reference: Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. NeurIPS, 2023 (Oral). [ arXiv:2306.04637]	Y.Y.
11/11/2025, Tue	Lecture 10: An Introduction to Reinforcement Learning with Applications [ slides ] and Final Project Initialization [ project2.pdf ] [ Reference ]: Google DeepMind's Deep Q-learning playing Atari Breakout: [ youtube ] To view .ipynb files below, you may try [ Jupyter NBViewer] Deep Q-Learning Pytorch Tutorial: [ link ] A Tutorial of Reinforcement Learning for Quantitative Trading: [ Tutorial ] [ Replicate ] FinRL: Deep Reinforcement Learning for Quantitative Finance [ GitHub ] Reinforcement Learning and Supervised Learning for Quantitative Finance: [ link ] Prof. Michael Kearns, University of Pennsyvania, Algorithmic Trading and Machine Learning, Simons Institute at Berkeley [ link ] [ Kaggle Contests ]: Kaggle: Home Credit Default Risk [ link ] Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ] Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales. [ link ] Kaggle: Ubiquant Market Prediction - Make predictions against future market data. [ link ] Kaggle: G-Research Crypto Forecasting. [ link ] Type-II diabetes and Alzheimer’s disease. [ slides (pdf) ] [ slides (pptx) ] [Paper Replication]: Shihao Gu, Bryan Kelly and Dacheng Xiu "Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, 2020, 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award. [ link ] Jingwen Jiang, Bryan Kelly and Dacheng Xiu "(Re-)Imag(in)ing Price Trends", The Journal of Finance, 78: 3193-3249, 2023. [ ssrn ][ https://doi.org/10.1111/jofi.13268 ]	Y.Y.