MATH 5470: Statistical Machine Learning

HKUST

MATH 5470: Statistical Machine Learning
Spring 2024

Course Information

Synopsis

This course covers several topics in statistical machine learning:

1. supervised learning (linear and nonlinear models, e.g. trees, support vector machines, deep neural networks),

2. unsupervised learning (dimensionality reduction, cluster trees, generative models, generative adversarial networks),

3. reinforcement learning (markov decision process, deep rl).

Prerequisite: Some preliminary course on (statistical) machine learning, applied statistics, and deep learning will be helpful.

Instructors:

Time and Place:

Mon 6:30-9:20pm, Rm 4579, Lift 27/28 (60) and Zoom from CANVAS, HKUST

Reference (参考教材)

An Introduction to Statistical Learning, with applications in R (ISLR). By James, Witten, Hastie, and Tibshirani

ISLR-python, By Jordi Warmenhoven.

ISLR-Python: Labs and Applied, by Matt Caudill.

Manning: Deep Learning with Python, by Francois Chollet [GitHub source in Python 3.6 and Keras 2.0.8]

MIT: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Tutorials: preparation for beginners

Python-Numpy Tutorials by Justin Johnson

scikit-learn Tutorials: An Introduction of Machine Learning in Python

Jupyter Notebook Tutorials

PyTorch Tutorials

Deep Learning: Do-it-yourself with PyTorch, A course at ENS

Tensorflow Tutorials

MXNet Tutorials

Theano Tutorials

The Elements of Statistical Learning (ESL). 2nd Ed. By Hastie, Tibshirani, and Friedman

statlearning-notebooks, by Sujit Pal, Python implementations of the R labs for the StatLearning: Statistical Learning online course from Stanford taught by Profs Trevor Hastie and Rob Tibshirani.

Homework and Projects:

TBA (To Be Announced)

Schedule

Date	Topic	Instructor	Scriber
05/02/2024, Mon	Lecture 01: A Historic Overview and Introduction to Supervised Learning. [ slides (pdf) ] [ slides (pdf) ] [ Seminar ] Title: Solving olympiad geometry without human demonstrations [ announcement ] Speaker: Dr. Trieu H. TRINH, New York University Time: Tuesday Feb 6, 2024, 2-3pm Abstract: Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004. Bio: Trieu recently graduated from his PhD program at New York University in January 2024. Prior to NYU, he worked for 2 years at Google Brain. His research covers a wide range of topics: Self-supervised learning in images, long term dependencies in RNNs, Commonsense reasoning in LLMs, and most recently mathematical reasoning. [ Reference ]: Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He and Thang Luong. Solving olympiad geometry without human demonstrations. Nature, 625:476-482, 17 January 2024. [ https://doi.org/10.1038/s41586-023-06747-5 ]	Y.Y.
19/02/2024, Mon	Today's lecture is cancelled and will be rescheduled to later this semester.	Y.Y.
26/02/2024, Mon	Lecture 02: Supervised Learning: linear regression and classification [ slides ] [ Reference ] To view .ipynb files below, you may try [ Jupyter NBViewer] Linear Regression Python Notebook [ MAFS6010_Regression.ipynb ] Linear Classification Python Notebook [ MAFS6010_Classification.ipynb ]	Y.Y.
04/03/2024, Mon	Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Model Selection (Subset, Ridge, Lasso, and Principal Component Regression) [ Selection.ipynb ]	Y.Y.
11/03/2024, Mon	Lecture 04: Moving beyond Linearity [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for ISLR Chapter 7 Lab [Ch7_Lab.ipynb ]	Y.Y.
18/03/2024, Mon	Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Decision Trees, Bagging, Random Forests and Boosting [ tree.ipynb ]	Y.Y.
25/03/2024, Mon	Lecture 06: Support Vector Machines [ YY's slides ] and Final Project Initialization [ project.pdf ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Support Vector Machines [ svm.ipynb ] Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin. Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss leads to max margin. [Reading Material]: Shihao Gu, Bryan Kelly and Dacheng Xiu "Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, (2020), 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award. [ link ] Jingwen Jiang, Bryan Kelly and Dacheng Xiu "(Re-)Imag(in)ing Price Trends", Chicago Booth Report, Aug 2021 [ link ] [ Reference ]: Kaggle: Home Credit Default Risk [ link ] Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ] Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales. [ link ] Kaggle: Ubiquant Market Prediction - Make predictions against future market data. [ link ] Kaggle: G-Research Crypto Forecasting. [ link ]	Y.Y.
08/04/2024, Mon	Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] LeNet5 for MNIST dataset in Pytorch Notebook [ LeNet5_mnist.ipynb ] LeNet5 for Cifar10 dataset in Pytorch Notebook [ LeNet5_cifar10.ipynb ] AlexNet for Cifar10 dataset in Pytorch Notebook [ AlexNet_cifar10.ipynb ] ResNet for Cifar10 dataset in Pytorch Notebook [ ResNet_cifar10.ipynb ] [ Topics on CNNs ]: To view .ipynb files below, you may try [ Jupyter NBViewer] Fine-tuning (transfer learning) of ResNet in Pytorch Notebook [ finetuning_resnet.ipynb ] Visualization of VGG16 in Pytorch Notebook [ vgg16-visualization.ipynb ] Class activation heatmap of VGG16 in Pytorch Notebook [ vgg16-heatmap.ipynb ] Neural Style of HKUST at Starry Night in Pytorch Notebook [ neural_style_starry-hkust.ipynb ] Adversarial examples of LeNet5 with MNIST [ LeNet5_mnist_fgsm.ipynb ]	Y.Y.
15/04/2024, Mon	Lecture 08: An Introduction to Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Attention and Transformer [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Character-level RNN, LSTM and GRU for Name Classification [ char_rnn_classification_tutorial.ipynb ] Generating Shakespeare's Sonnet: RNN, LSTM, Bidirectional LSTM [ rnn_lstm_shakespeare.ipynb ] [ rnn_lstm_biLSTM_shakespeare.ipynb ] [ shakespeare.txt ] Bidirectional RNN for MNIST in pytorch [ bidirection_lstm_mnist.ipynb ] [ Youtube ] Illustrated Transformer by Jay Alammar: [ link ] The Annotated Transformer Tutorial by Sasha Rush: [ link ] Jorgen Schmidhuber: [ Deep Learning: Our Miraculous Year 1990-1991 ] [ Critique of Honda Prize for Dr. Hinton ]	Y.Y.
22/04/2024, Mon	Lecture 09: Transformer, GPT, and BERT [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Pytorch Twitter Sentiment Analysis: RNN, LSTM, BiLSTM, Multihead Self-Attention. [ RNN (ipynb) ] [ LSTM (ipynb) ] [ BiLSTM (ipynb) ] [ BiLSTM with Multihead Attention (ipynb) ] [ BERT embedding with BiLSTM (ipynb) ] [ Sentiment140 dataset ] Pytorch Sentiment Analysis with IMDB data: RNN, (bi)-LSTM, CNN, Transformer, BERT, etc. [ GitHub ] Illustrated Transformer by Jay Alammar: [ link ] The Annotated Transformer Tutorial by Sasha Rush: [ link ] BERT generation of Shakespeare's sonnet: [ BERT_shakespeare_gen.ipynb ] BERT next sentence generation of Shakespeare's sonnet and Chinese poems: [ BERT_shakespeare_nextsen.ipynb ] Chinese BERT (Whole-Word-Masking): [ link ]	Y.Y.
25/04/2024, Thu	Guest Lecture: An Invitation to Information Geometry [ Title ] An Invitation to Information Geometry [ Speaker ] Prof. Jun ZHANG, University of Michigan and SIMIS. [ Time and Venue ] 3-5pm, CYT LTL (CMA Lecture Theater) [ Abstract ] Information Geometry is the differential geometric study of the manifold of probability models, and promises to be a unifying geometric framework for investigating statistical inference, information theory, machine learning, etc. Central to such manifolds are divergence functions (in place of distance) for measuring proximity of two points, for instance Kullback-Leibler divergence, Bregman divergence, etc. Such divergence functions are known to induce a beautiful geometric structure of the set of parametric probability models. This talk will use two examples to introduce some basic ingredients of this geometric framework: the univariate normal distributions (a case with continuous support) and the probability simplex (a case with discrete support). The fundamental duality e/m duality is explained in terms of two most popular parametric statistical families: the exponential and the mixture families. This introduction is intended for an audience with little background in differentiable manifold; instead it only assumes the knowledge of multi-variable calculus.	Y.Y.
26/04/2024, Fri	Mathematics Colloquium. [ Title ] Information Geometry: Geometric Science of Information [ Speaker ] Prof. Jun ZHANG, University of Michigan, Ann Arbor and SIMIS. [ Time and Venue ] 3-4pm, Lecture Theater F (Lifts 25/26), with tea-time discussion 4-5pm at magic square [ Abstract ] Information geometry investigates parametric families of statistical model by representing probability density functions over a given sample space as points of a differentiable manifold M. Treating parameters as a local coordinate chart, M is endowed with a Riemannian metric g given by the Fisher-information (the well-known Fisher-Rao metric). However, in place of the Riemannian distance, information geometry uses a non-negative but non-symmetric divergence function (also called contrast function) for measuring proximity of two points, for instance Kullback-Leibler divergence, f-divergence, etc. Such divergence functions not only recovers the Fisher-Rao metric, but also a pair of dual connections with respect to the metric (equivalently Amari-Censov tensor). This talk will use two examples to introduce some basic ingredients of this geometric framework: the probability simplex (a case with discrete support) and the univariate normal distributions (a case with continuous support). In the former case, the application to the popular data-analytic method Compositional Data Analysis (CDA) is explained in terms of duality between exponential and mixture families. In the latter case, the construction of statistical mirror is briefly explained as an application of the concept of dual connections. This talk assumes some basic concepts of differentiable manifold (such as parallel transport and affine connection). [ Bio ] Jun Zhang is a Professor at the Shanghai Institute of Mathematics and Interdisciplinary Sciences (SIMIS) and one of its co-founders. He is currently on leave from the University of Michigan, Ann Arbor, where he has worked since 1992 as an Assistant, Associate, and Full Professor in the Department of Psychology, with adjunct appointments in the Department of Mathematics, Department of Statistics, and Michigan Institute of Data Sciences. He received his PhD in Neuroscience from the University of California, Berkeley in 1991. An elected fellow of Association for Psychological Sciences (APS) since 2012 and Psychonomic Society since 2016, Professor Jun Zhang's scholarly contributions have been in the various fields of computation neuroscience, cognition and behavior modeling, machine learning, statistical science, complex systems, etc, and is well known in the field of mathematical psychology. In recent years, his research has focused on the interdisciplinary subject of Information Geometry.	Y.Y.
26/04/2024, Sat	Guest Lecture: Information Beyond Shannon [ Title ] Information Beyond Shannon [ Speaker ] Prof. Jun ZHANG, University of Michigan and SIMIS. [ Time and Venue ] 3-5pm, Rm 2504 (Lift 25/26) [ Abstract ] Shannon's theory for source and channel coding (and the duality between capacity and rate-distortion) has been the hallmark for information science. Shannon entropy, and its associated exponential family of probability measures resulting from maximum entropy (MaxEnt) inference and the Kullback-Leibler divergence measuring the difference of any two probability densities, have found wide applications in statistical inference, machine learning, optimization, etc. Past research in Information Geometry has tied together the above concepts into a geometric structure called Hessian geometry, which is dually flat with biorthogonal coordinates. Given the deep mathematical understanding of Hessian geometry and its elegant picture, it is natural to ask whether it can be generalized (deformed, technically) to more broad settings that corresponds to generalize entropies and cross entropies (e.g., that is Tsallis and Renyi). This question has now been answered positively by a series of recent work on deformation theory. My talk will explain this recent development of information geometry, including the rho-tau deformation (which unifies the so-called phi-model and U-model known to information geometers) and the lambda-deformation theory (which unified Tsallis and Renyi deformation known to information theorists). This talk is intended for an audience with background in information theory and theoretical physics. (Joint work with Jan Naudts in the former case and with TKL Wong in the latter case).	Y.Y.
29/04/2024, Mon	Lecture 10: An Introduction to Reinforcement Learning with Applications [ slides ] [ Reference ]: Google DeepMind's Deep Q-learning playing Atari Breakout: [ youtube ] To view .ipynb files below, you may try [ Jupyter NBViewer] Deep Q-Learning Pytorch Tutorial: [ link ] A Tutorial of Reinforcement Learning for Quantitative Trading: [ Tutorial ] [ Replicate ] FinRL: Deep Reinforcement Learning for Quantitative Finance [ GitHub ] Reinforcement Learning and Supervised Learning for Quantitative Finance: [ link ] Prof. Michael Kearns, University of Pennsyvania, Algorithmic Trading and Machine Learning, Simons Institute at Berkeley [ link ]	Y.Y.
06/05/2024, Mon	Final Presentation. [ Final Report Collection ] Description of Final Project: [ pdf ] GitHub Repository for reports of Final Project [ GitHub ] [ Paper Replication: Empirical Asset Pricing via Machine Learning ] CHEN Yuxuan. [ report (pdf) ] [ slides (pptx) ] [ source (py) ] CHEN Qixu, Siqi HE, Zhiqiu XIA and Meiying ZHANG. [ report (pdf) ] [ source (ipynb) ] CUI Daorong, LI Meng, HE Jiayi. [ report (pdf) ] HE Haolin, Lingchong LIU, Ruizhao HUANG. [ report (pdf) ] [ source (zip) ] HOU Zhen, Jianda MAO, Xiaolong WANG. [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ presentation (youtube) ] ZHANG Jiaming. [ report (pdf) ] [ source (zip) ] - WU Linshan. [ report (pdf) ] [ source (ipynb) ] [ presentation (youtube) ] [ Paper Replication: (Re-)Imag(in)ing Price Trends ] LI Kaican, ZHAO Zening, LYU Yunhong. [ report (pdf) ] [ source (zip) ] WANG Wentao, Yingyue HAN, Yuhang JIN. [ report (pdf) ] [ slides (pptx) ] [ source (github) ] - HUANG Zhanmiao, Xuanyu SHEN. [ report (pdf) ] [ poster (pdf) ] [ Kaggle ] DING Zezhen. G-Research Crypto Forecasting. [ report (pdf) ] [ slides (pptx) ] [ source (ipynb) ] [ presentation (youtube) ] *LIN Hangyu. G-Research Crypto Forecasting. [ report (pdf) ] [ source (github) ] [ presentation (bilibili) ] LI Shihao. G-Research Crypto Forecasting. [ report (pptx) ] [ slides (pptx) ] [ source (ipynb) ] [ presentation (youtube) ] WU Shuang, Yuhao YAN, Haowei YANG. G-Research Crypto Forecasting. [ report (pptx) ] [ slides (pptx) ] [ source (github) ] - Ricky CHAN Tsz Wai. G-Research Crypto Forecasting. [ report (pptx) ] [ slides (pdf) ] - HU Bo, WU Hongfan, QIU wenxi and WANG Zetao. G-Research Crypto Forecasting. [ report (pdf) ] [ slides (pptx) ] [ source (kaggle) ] GUPTA Anchal, Dinusara Sasindu GAMAGE NANAYAKKARA, Minji SEO, ZHAO Hang. M5 Forecasting: Accuracy. [ report (pdf) ] [ source (zip) ] [ presentation (youtube) ] [Reading Material]: Shihao Gu, Bryan Kelly and Dacheng Xiu "Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, (2020), 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award. [ link ] Jingwen Jiang, Bryan Kelly and Dacheng Xiu "(Re-)Imag(in)ing Price Trends", Chicago Booth Report, Aug 2021 [ link ] [ Reference ]: Kaggle: Home Credit Default Risk [ link ] Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ] Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales. [ link ] Kaggle: Ubiquant Market Prediction - Make predictions against future market data. [ link ] Kaggle: G-Research Crypto Forecasting. [ link ]	Y.Y.