MATH 5470: Statistical Machine Learning

HKUST

MATH 5470: Statistical Machine Learning
Spring 2025

Course Information

Synopsis

This course covers several topics in statistical machine learning:

1. supervised learning (linear and nonlinear models, e.g. trees, support vector machines, deep neural networks),

2. unsupervised learning (dimensionality reduction, cluster trees, generative models, generative adversarial networks),

3. reinforcement learning (markov decision process, deep rl).

Prerequisite: Some preliminary course on (statistical) machine learning, applied statistics, and deep learning will be helpful.

Instructors:

Time and Place:

Mon 6:30-9:20pm, G009A, CYT Bldg, HKUST

Reference (参考教材)

An Introduction to Statistical Learning, with applications in R (ISLR). By James, Witten, Hastie, and Tibshirani

ISLR-python, By Jordi Warmenhoven.

ISLR-Python: Labs and Applied, by Matt Caudill.

Manning: Deep Learning with Python, by Francois Chollet [GitHub source in Python 3.6 and Keras 2.0.8]

MIT: Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Tutorials: preparation for beginners

Python-Numpy Tutorials by Justin Johnson

scikit-learn Tutorials: An Introduction of Machine Learning in Python

Jupyter Notebook Tutorials

PyTorch Tutorials

Deep Learning: Do-it-yourself with PyTorch, A course at ENS

Tensorflow Tutorials

MXNet Tutorials

Theano Tutorials

The Elements of Statistical Learning (ESL). 2nd Ed. By Hastie, Tibshirani, and Friedman

statlearning-notebooks, by Sujit Pal, Python implementations of the R labs for the StatLearning: Statistical Learning online course from Stanford taught by Profs Trevor Hastie and Rob Tibshirani.

Homework and Projects:

TBA (To Be Announced)

Schedule

Date	Topic	Instructor	Scriber
03/02/2025, Mon	Lecture 01: A Historic Overview. [ slides (pdf) ]	Y.Y.
07/02/2025, Fri	Seminar. [ Mathematics Colloquium ] Title: Theoretical Evaluation of Data Reconstruction Error and Induced Optimal Defenses [ announcement ] [ slides ] Speaker: Prof. Qi LEI, New York University Time: Friday Feb 7, 2025, 10:30am-noon Abstract: Data reconstruction attacks and defenses are crucial for understanding data leakage in machine learning and federated learning. However, previous research has largely focused on empirical observations of gradient inversion attacks, lacking a theoretical framework for quantitatively analyzing reconstruction errors based on model architecture and defense methods. In this talk, we propose framing the problem as an inverse problem, enabling a theoretical and systematic evaluation of data reconstruction attacks. For various defense methods, we derive the algorithmic upper bounds and matching information-theoretical lower bounds on reconstruction error for two-layer neural networks, accounting for feature and architecture dimensions as well as defense strength. We further propose two defense strategies — Optimal Gradient Noise and Optimal Gradient Pruning — that maximize reconstruction error while maintaining model performance. Bio: Qi Lei is an assistant professor of Mathematics and Data Science at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU. Previously she was an associate research scholar at the ECE department of Princeton University. She received her Ph.D. from Oden Institute for Computational Engineering & Sciences at UT Austin. She visited the Institute for Advanced Study (IAS)/Princeton for the Theoretical Machine Learning Program. Before that, she was a research fellow at Simons Institute for the Foundations of Deep Learning Program. Her research aims to develop mathematical groundings for trustworthy and (sample- and computationally) efficient machine learning algorithms. Qi has received several awards/recognitions, including Rising Stars in Machine Learning, in EECS, and in Statistics and Data Science, the Outstanding Dissertation Award, Computing Innovative Fellowship, and Simons-Berkeley Research Fellowship.. [ Relevant Reference ]: Zihan Wang, Jason D. Lee, Qi Lei. Reconstructing Training Data from Model Gradient, Provably [ link ] Sheng Liu, Zihan Wang, Yuxiao Chen, Qi Lei. Data Reconstruction Attacks and Defenses: A Systematic Evaluation. [ link ] Yuxiao Chen, Gamze Gürsoy, Qi Lei. Optimal Defenses Against Gradient Reconstruction Attacks. [ link ]	Y.Y.
10/02/2025, Mon	Lecture 02: Supervised Learning: linear regression and classification [ slides ] [ Reference ] To view .ipynb files below, you may try [ Jupyter NBViewer] Linear Regression Python Notebook [ MAFS6010_Regression.ipynb ] Linear Classification Python Notebook [ MAFS6010_Classification.ipynb ]	Y.Y.
17/02/2025, Mon	Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Model Selection (Subset, Ridge, Lasso, and Principal Component Regression) [ Selection.ipynb ]	Y.Y.
24/02/2025, Mon	Lecture 04: Moving beyond Linearity [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for ISLR Chapter 7 Lab [Ch7_Lab.ipynb ] [ Mathematics Colloquium ] Title: A new machine learning algorithm, complex systems and AI predictions [ announcement ] [ slides ] Speaker: Prof. Zhihong Xia, Greater Bay University and Northwestern University Time: Monday Feb 24, 2025, 4-5pm, Rm 4621 (Lift 31/32) Abstract: We propose a novel machine learning algorithm inspired by complex analysis. Our algorithm has a better mathematical formulation and can approximate universal functions much more efficiently. The algorithm can be implemented in two self-learning neural networks: The CauchyNet and the XNet. The CauchyNet is very efficient for low-dimensional problems such as extrapolation, imputation, numerical solutions of PDEs and ODEs. The XNet, on the other hand, works for large dimensional problems such as image and voice recognition, transformers and likely LLMs, often improving the current method by several orders of magnitude. In the context of modern AI, we also pose the following question: given data from a single observable g in a dynamical system, is it possible to recover the underlying system? For instance, with a large dataset of positional observations from an n-body system, can we predict its future motion without resorting to Newtonian mechanics? Surprisingly, the answer is yes for almost any typical observable. We introduce the principle of space-time swap: the absence of spatial information in a dynamical system can be compensated by leveraging temporal information. This principle is grounded in Takens’ Embedding Theorem (building upon Whitney’s embedding theorem). We believe this idea has broad potential for applications in the analysis and prediction of complex systems. Bio: Zhihong Jeff Xia received his PhD from Northwestern University in 1988. He held a Benjamin Pierce Lecturer and Assistant Professorship at Harvard University, and a tenured faculty position at the Georgia Institute of Technology before joining Northwestern University as a professor of mathematics in 1994. In 2000, Xia was appointed the Arthur and Gladys Pancoe Professor of Mathematics at Northwestern. He joined the Great Bay University in 2024. Xia’s field of research is Dynamical Systems, Solar system dynamics and Machine learning algorithms. He solved the century old Painleve conjecture in mathematics; discovered (jointly with Jian Li) that a large planet from outside of the solar system once flew by our solar system a few hundreds of million years ago; He also created an efficient machine learning algorithm. Xia was named an Alfred P. Sloan Fellow in 1989. He was awarded the Blumenthal Award for advancement of pure mathematics (1993), he was awarded the Monroe H. Martin Prize in applied mathematics (1995). He was NSF’s National Young Investigator. He was invited to speak at the 1998 International Congress of Mathematicians. Xia was the founding chair of the department of mathematics at the Southern University of Science and Technology. Xia is currently co-editor-in-chief of 《知识分子》, he is one of the founding members of the science committee of the Future Science Prize.	Y.Y.
03/03/2025, Mon	Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Decision Trees, Bagging, Random Forests and Boosting [ tree.ipynb ]	Y.Y.
10/03/2025, Mon	Lecture 06: Support Vector Machines [ YY's slides ] and Mini-Project Initialization [ project1.pdf ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Python Notebook for Support Vector Machines [ svm.ipynb ] Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data. [ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin. Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss leads to max margin. [ Reference ]: Kaggle: Home Credit Default Risk [ link ] Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ] [ Project 1 Collection ] FAN, Wenkai. Home Credit Default Risk. Mar 17, 2025. SUN, Mingyi. M5 Forecasting - Accuracy. Mar 19, 2025. WANG, Xinyu. Home Credit Default Risk. Mar 20, 2025. Su YAN. Home Credit Default Risk. Mar 22, 2025. Ruilin ZHANG, Xiao SHI. Home Credit Default Risk. Mar 22, 2025. GANZ Konstantin Georg Dominique (21181053) and DO CARMO SILVA Paul Albert Helder (21181065). Home Credit Default Risk. Mar 22, 2025. LI, Jiajun. Home Credit Default Risk. Mar 22, 2025. Biying HU, Bowei ZHANG, Xiaoheng MA, Yuyao LIU. M5 Forecasting - Accuracy. Mar 22, 2025. JIANG Peiqi, CHEN Yuheng, WU Jiacheng, LIU Fengkai. M5 Forecasting - Accuracy. Mar 22, 2025. Yitian DUAN, Wencan XIA, Rui LI, Tao YAO. Home Credit Default Risk. Mar 23, 2025. XIE Lifang, HAN Aixi, ZHANG Yalan, ZHAO Chenyu. M5 Forecasting - Accuracy. Mar 23, 2025. Yifan ZHAO, Zhiyuan ZHOU, Zihao ZOU and Han FANG. Home Credit Default Risk. Mar 23, 2025. Ding Ding, Zihao Zhang, Bingsong Gao, Wenshuo Zhao. Home Credit Default Risk. Mar 23, 2025. Zhiyi LI, Junjie HOU, Xiaonan SHANG, ChongTong CHOW. Home Credit Default Risk. Mar 23, 2025. LIU Haoyu, XIE Zijun, YU Yingzhe, ZHAO Zhenyu. Home Credit Default Risk. Mar 23, 2025. Changlin HE, Yang LI, Lingjie WEI, and Jingwei ZHANG. Random Forest and XGBoost for home credit data. Mar 23, 2025. QIN Xi, YU Jiamu.. Home Credit Default Risk. Mar 23, 2025. YEUNG Chun Sze. Forecasting Monthly Camping Gear Sales Using Facebook Prophet: A Case Study of Retail Sales Prediction. Mar 23, 2025. Yeqin ZENG, Rijiang ZHOU and Ziyue TAN. M5 Forecasting - Accuracy. Mar 23, 2025. Ziyun LIU. Sales Forecasting for Walmart Retail Products. Mar 23, 2025. Hei Chun LEUNG, Kin Chit Alexander O, Kai Chung Kenneth WU. Selecting Important Features in Credit Risk: The Case of Insufficient Credit Histories. Mar 23, 2025. Zhongchangfei LI. M5 Forecasting - Accuracy. Shuai Liu, Shuaiyin He, Yihan Mei, Sichen Wang, and Yi Zhou. April 16, 2025. Anastasiia KAZOVSKAIA. Comparative Analysis of Classical and Topological Forecasting Methods on Retail Sales: An M5 Forecasting Accuracy Benchmark Study. Apr 19, 2025. GUO Zhen, WANG Yizheng. Home Credit Default Risk. Apr 28, 2025. QIU Yu. Comparative Analysis of Statistical Models and Machine Learning Models on M5 Forecasting Accuracy Study. May 2, 2025. GUPTA, Pranav. M5 Forecasting Accuracy. May 5, 2025. Haoyu WANG. Empirical Asset Pricing via Machine Learning. May 10, 2025.	Y.Y.
17/03/2025, Mon	Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] LeNet5 for MNIST dataset in Pytorch Notebook [ LeNet5_mnist.ipynb ] LeNet5 for Cifar10 dataset in Pytorch Notebook [ LeNet5_cifar10.ipynb ] AlexNet for Cifar10 dataset in Pytorch Notebook [ AlexNet_cifar10.ipynb ] Original AlexNet source codes in Computer History Museum [ github ] ResNet for Cifar10 dataset in Pytorch Notebook [ ResNet_cifar10.ipynb ] Fine-tuning (transfer learning) of ResNet in Pytorch Notebook [ finetuning_resnet.ipynb ] Visualization of VGG16 in Pytorch Notebook [ vgg16-visualization.ipynb ] Class activation heatmap of VGG16 in Pytorch Notebook [ vgg16-heatmap.ipynb ] Neural Style of HKUST at Starry Night in Pytorch Notebook [ neural_style_starry-hkust.ipynb ] Adversarial examples of LeNet5 with MNIST [ LeNet5_mnist_fgsm.ipynb ]	Y.Y.
24/03/2025, Mon	Lecture 08: An Introduction to Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Attention and Transformer [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Character-level RNN, LSTM and GRU for Name Classification [ char_rnn_classification_tutorial.ipynb ] RNN for generating Shakespeare's Sonnet [ rnn.ipynb ] [ shakespeare.txt ] LSTM for generating Shakespeare's Sonnet [ rnn_lstm_shakespeare.ipynb ] Generating Shakespeare's Sonnet: RNN, LSTM, Bidirectional LSTM, and Momentum-LSTM [ rnn_lstm_BiLSTM_mlstm_shakespeare.ipynb ] [ rnn_lstm_biLSTM_shakespeare.ipynb ] [ shakespeare.txt ] Bidirectional RNN for MNIST in pytorch [ bidirection_lstm_mnist.ipynb ] [ Youtube ] Illustrated Transformer by Jay Alammar: [ link ] The Annotated Transformer Tutorial by Sasha Rush: [ link ] Nobel Prize in Physics 2024 Jorgen Schmidhuber's Critique: The #NobelPrizeinPhysics2024 for Hopfield & Hinton rewards plagiarism and incorrect attribution in computer science. It's mostly about Amari's "Hopfield network" and the "Boltzmann Machine." 1. The Lenz-Ising recurrent architecture with neuron-like elements was published in… — Jürgen Schmidhuber (@SchmidhuberAI) October 9, 2024 Geoff Hinton's response to Schmidhuber's critique: Geoff Hinton's response to Schmidhuber's critique on r/ML https://t.co/kUIznzbRBh https://t.co/joNcuzJnFk pic.twitter.com/3HWbfT5T9F — hardmaru (@hardmaru) April 23, 2020 Jorgen Schmidhuber: [ Deep Learning: Our Miraculous Year 1990-1991 ] [ Critique of Honda Prize for Dr. Hinton ]	Y.Y.
31/03/2025, Mon	Lecture 09: Seminar and Final Project Initialization [ project2.pdf ]. [ Seminar ] Title: Be aware of model capacity when talking about generalization in machine learning [ announcement ] [ slides ] Speaker: Prof. Fanghui LIU, University of Warwick Time: Monday March 31, 2025, 18:30 Abstract: Machine learning (ML) generally operates in high-dimensions, of which the performance is characterized by learning efficiency—both theoretically (statistical and computational efficiency) and empirically (practical efficient ML). A fundamental question in ML theory and practice is how the test error (generalization) evolves with sample size and model capcity (e.g., model size), shaping key concepts such as the bias-variance trade-offs, double descent, and scaling laws. In this talk, I will discuss how the test error will behave if a more suitable metric than model size for model capacity is used. To be specific, I will present a unified perspective on generalization by analyzing how norm-based model capacity control reshapes our understanding of these foundational concepts: there is no bias-variance trade-offs; phase transition exists from under-parameterized regimes to over-parameterized regimes while double descent doesn't exist; scaling law is formulated as a multiplication style under norm-based capacity. Additionally, I will briefly discuss which norm is suitable for neural networks and what are the fundamental limits of learning efficiency imposed by such norm-based capacity from the perspective of function space. Bio: Dr. Fanghui Liu is currently an assistant professor at University of Warwick, UK, a member of Centre for Discrete Mathematics and its Applications (DIMAP). His research interests include foundations of machine learning as well as efficient machine learning algorithm design. He was a recipient of AAAI'24 New Faculty Award, Rising Star in AI (KAUST 2023), co-founded the fine-tuning workshop at NeurIPS'24, and served as an area chair of ICLR and AISTATS. Besides, he has delivered three tutorials at ISIT’24, CVPR’23, and ICASSP’23, respectively. Prior to his current position, he worked as a postdoc researcher at EPFL (2021-2023) and KU Leuven (2019-2023), respectively. He received his PhD degree from Shanghai Jiao Tong University in 2019 with several Excellent Doctoral Dissertation Awards. [ Relevant Papers ]: Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu. Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence. [ arXiv:2502.01585 ] Fanghui Liu, Leello Dadi, Volkan Cevher. Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks. [ arXiv:2404.18769 ] [ Seminar and Project Description ] Title: Digital Historical Forensics: A Computational Approach to Wartime Media Cultures [ slides ] Speaker: Dr. Lin DU, National University of Singapore Abstract: This study examines the longstanding need and challenge of providing contextual analysis of historical images stored in digital visual archives and the accessibility of retrieving contextual information from these historical archives. Contextual analysis is essential for disciplines such as history and art history, as it allows for the contextualization of artwork and historical sources with historical narratives, which in turn enhances understanding of the artistic or political expression in the contents of cultural products. To address this challenge, a novel approach is proposed utilizing computer vision to trace the circulation and dissemination of historical photographs in their original contexts. This method involves first using YOLO v7 to crop historical images from pictorial magazines, then training machine learning models on the cropped printed images plus another large dataset of original historical photographs, and comparing the similarity of images between the datasets of printed images and original photographs. To ensure accuracy of image similarities between the two subsets with distinct image qualities, an ensemble of three machine learning models—Vision Transformer, EfficientNetv2, and Swin Transformer —--- was developed. Through this system, contexts in the circulation of historical photographs were discovered and new insights regarding the editing strategies of propaganda magazines in East Asia during WWII were uncovered. These outcomes offer supporting evidence for previous research in the history and art historical disciplines, and demonstrate the potential of computer vision for uncovering new information from digital visual archives. Our model achieves a 77.8% top-15 retrieval accuracy on our evaluation dataset. Further projects addressing these challenges are outlined, accompanied by relevant datasets. Bio: Lin Du is currently a Postdoctoral Fellow and will join as an assistant professor in July, jointly appointed in the Departments of Japanese Studies and Chinese Studies at the National University of Singapore. She completed her PhD at the Department of Asian Languages and Cultures at UCLA, where her dissertation, "Chinese Photojournalism 1937–1952: Materiality and the Institutionalization of Culture via a Computer Vision Approach," utilized advanced computer vision techniques to explore wartime visual media culture. Lin holds an MA from the Regional Studies East Asia Program at Harvard University and a BA in Chinese Language and Literature from Peking University. Her pioneering work in machine learning has been published in the ACM Journal on Computing and Cultural Heritage (JOCCH), and her contributions to humanities research are forthcoming in the Journal of Chinese Cinemas and Asia Pacific Perspectives. [ Kaggle Contests ]: Kaggle: Home Credit Default Risk [ link ] Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods. [ link ] Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales. [ link ] [Reference]: Shihao Gu, Bryan Kelly and Dacheng Xiu "Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, 2020, 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award. [ link ] Jingwen Jiang, Bryan Kelly and Dacheng Xiu "(Re-)Imag(in)ing Price Trends", The Journal of Finance, 78: 3193-3249, 2023. [ ssrn ][ https://doi.org/10.1111/jofi.13268 ]	Y.Y.
01/04/2025, Tue	Seminar Title: One-step full gradient can be sufficient to low-rank fine-tuning, provably and efficiently [ slides ] Speaker: Prof. Fanghui LIU, University of Warwick Time: Tuesday April 1, 2025, 11am-noon, Room 2463 (lift 25/26), HKUST Abstract: In this talk, I will discuss how to improve the performance of Low-Rank Adaption (LoRA) guided by our theory. Our theoretical results show that LoRA will align to the certain singular subspace of one-step gradient of full fine-tuning. Accordingly, alignment and generalization guarantees can be directly achieved by our theory-grounded spectral initialization strategy for both linear and nonlinear models, and the subsequent linear convergence can be also built. Our analysis leads to the LoRA-One algorithm, a theoretically grounded algorithm that achieves significant empirical improvement over vanilla LoRA and its variants on several benchmarks by fine-tuning Llama 2. Our theoretical analysis has independent interest for understanding matrix sensing and deep learning theory. Joint work with Yuanhe Zhang, Yudong Chen. [ Relevant Papers ]: Yuanhe Zhang, Fanghui Liu, Yudong Chen. One-step full gradient suffices for low-rank fine-tuning, provably and efficiently. [ arXiv:2502.01235 ]	Y.Y.
07/04/2025, Mon	Lecture 10: Transformer and Applications [ slides ] [Reference]: To view .ipynb files below, you may try [ Jupyter NBViewer] Pytorch Twitter Sentiment Analysis: RNN, LSTM, BiLSTM, Multihead Self-Attention. [ RNN (ipynb) ] [ LSTM (ipynb) ] [ BiLSTM (ipynb) ] [ BiLSTM with Multihead Attention (ipynb) ] [ BERT embedding with BiLSTM (ipynb) ] [ Sentiment140 dataset ] Pytorch Sentiment Analysis with IMDB data: RNN, (bi)-LSTM, CNN, Transformer, BERT, etc. [ GitHub ] Illustrated Transformer by Jay Alammar: [ link ] The Annotated Transformer Tutorial by Sasha Rush: [ link ] BERT generation of Shakespeare's sonnet: [ BERT_shakespeare_gen.ipynb ] BERT next sentence generation of Shakespeare's sonnet and Chinese poems: [ BERT_shakespeare_nextsen.ipynb ] Chinese BERT (Whole-Word-Masking): [ link ] [ Seminar ] Title: Advancements in Kernel Learning and Offline Reinforcement Learning through Generative Models [ slides I ] [ slides II ] Speaker: Prof. Wenjia Wang, HKUST-GZ Time: 8:00pm Abstract: In this talk, I will present two classes of my recent research. In Part I, I will talk about random smoothing data augmentation. Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data, encouraging the model to learn more generalized features. In this work, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. By using random smoothing regularization as novel convolution-based smoothing kernels, we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay. In Part II, I will talk about our recent series of works on offline reinforcement learning. Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points. Existing methods for addressing this issue either control policy to exclude the OOD action or make the Q-function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately. In this talk, I will be discussing our recent advancements in offline reinforcement learning, specifically focusing on the utilization of generative models such as GAN and diffusion models. Our proposed methods are evaluated on the D4RL benchmarks and have demonstrated significant improvements across numerous tasks. Theoretical results are provided for performance guarantee. Bio: Wenjia Wang is an assistant professor in the Data Science and Analysis Thrust at the Information Hub of the Hong Kong University of Science and Technology (Guangzhou). He obtained his Ph.D. in the School of Industrial & Systems Engineering at Georgia Institute of Technology. Wenjia Wang's research interests include uncertainty quantification, computer experiments, machine learning, stochastic simulation, and nonparametric statistics. [ Relevant Papers ]: Ding, L., Hu, T., Jiang, J., Li, D., Wang, W., & Yao, Y. (2024). Random smoothing regularization in kernel gradient descent learning. Journal of Machine Learning Research. [ arXiv:2305.03531 ] Fang, L., Liu, R., Zhang, J., Wang, W., & Jing, B. Y. (2025). Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning. The Thirteenth International Conference on Learning Representations (ICLR). Zhang, J., Fang, L., Shi, K., Wang, W., & Jing, B. Y. (2024). Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model. Neural Information Processing Systems (NeurIPS), 2024. Zhang, J., Zhang, C., Wang, W., & Jing, B. Y. (2023). Constrained Policy Optimization with Explicit Behavior Density For Offline Reinforcement Learning. Neural Information Processing Systems (NeurIPS), 2023.	Y.Y.
14/04/2025, Mon	Seminars. Title: Transformers As Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. [ slides ] [ video ] Speaker: Prof. Song MEI, University of California at Berkeley. Time: 6:40pm Abstract: Neural sequence models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving in-context algorithm selection, akin to what a statistician can do in real life -- A single transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures. Bio: Song Mei is an Assistant Professor in the Department of Statistics and the Department of Electrical Engineering and Computer Sciences at UC Berkeley. In June 2020, he received Ph.D. from Stanford, with Prof. Andrea Montanari. Song's research is motivated by data science and AI, and lies at the intersection of statistics, machine learning, information theory, and computer science. His current research interests include language models and diffusion models, theory of deep learning, theory of reinforcement learning, high dimensional statistics, quantum algorithms, and uncertainty quantification. Song received Sloan Research Fellowship in 2025 and NSF career award in 2024. [ Relevant Papers ]: Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. NeurIPS, 2023 (Oral). [ arXiv:2306.04637] Title: Introducing Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving. [ slides ] Speaker: Prof. Yong LIN, Princeton University. Time:8:00pm Abstract: In this talk, I will introduce Goedel-Prover (https://goedel-lm.github.io/), an open-source large language model (LLM) that achieves the state-of-the-art (SOTA) performance in automated formal proof generation for mathematical problems. The key challenge in this field is the scarcity of formalized math statements and proofs, which we tackle in the following ways. We train statement formalizers to translate the natural language math problems from Numina into formal language (Lean 4). We then iteratively build a large dataset of formal proofs by training a series of provers. Each prover succeeds in proving many statements that the previous ones could not, and these new proofs are added to the training set for the next prover. The final prover outperforms all existing open-source models in whole-proof generation. On the miniF2F benchmark, it achieves a 57.6% success rate (Pass@32), exceeding the previous best open-source model by 7.6%. On PutnamBench, Goedel-Prover successfully solves 7 problems (Pass@512), ranking first on the leaderboard. Furthermore, it generates 29.7K formal proofs for Lean Workbook problems, nearly doubling the 15.7K produced by earlier works. Bio: Yong Lin is a postdoctoral fellow at Princeton Language and Intelligence (PLI), collaborating with Chi Jin, Sanjeev Arora, and Danqi Chen. He completed his PhD in Tong Zhang's group at the Hong Kong University of Science and Technology (HKUST). His research focuses on the trustworthiness and applications of machine learning, with particular emphasis on verifiable generation, LLM alignment, and out-of-distribution generalization. Currently, he leads the Goedel-Prover project at Princeton, where he trains LLMs for automated theorem proving in LEAN. Prior to his PhD, Yong worked as a Senior Machine Learning Engineer at Alibaba for 4 years, a leading tech company in China. He has published over 30 papers in top-tier ML, CV, and NLP conferences and received the Outstanding Paper Award at NAACL 2024. Additionally, he was awarded the Apple AI/ML PhD Fellowship in 2023 and the Hong Kong PhD Fellowship in 2020. [ Relevant Papers ]: Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving. [ arXiv:2502.07640]	Y.Y.
28/04/2025, Mon	Final Presentation. [ Final Report Collection ] Description of Final Project: [ pdf ] JIANG Peiqi, CHEN Yuheng, WU Jiacheng, LIU Fengkai. (Re-)Imag(in)ing Price Trends Replication. April 10, 2025. LI, Jiajun. Replication of “(Re-)Imag(in)ing Price Trend”. April 13, 2025. FAN, Wenkai. Kaggle: M5 Forcasting competitions. April 14, 2025. SUN, Mingyi. Kaggle: Home Credit Default Risk. April 16, 2025. Anastasiia KAZOVSKAIA, Yu QIU, Haoyu WANG. Empirical Asset Pricing via Machine Learning. April 17, 2025. WANG, Xinyu. Jane Street Real-Time Market Data Forecasting. April 20, 2025. GANZ Konstantin Georg Dominique (21181053) and DO CARMO SILVA Paul Albert Helder (21181065). (Re-)Imag(in)ing Price Trends. April 21, 2025. XIE Lifang, HAN Aixi, ZHANG Yalan, ZHAO Chenyu. Kaggle Contest: M5 Forecasting. April 21, 2025. WANG Yizheng, GUO Zhen. Unsupervised Federated Multi-task Learning for Heterogeneous Tasks. April 21, 2025. Changlin HE, Yang LI, Lingjie WEI, and Jingwei ZHANG. Empirical Asset Pricing via Machine Learning. April 21, 2025. Yeqin ZENG, Rijiang ZHOU and Ziyue TAN. Replication Studies of (Re-)Imag(in)ing Price Trends. April 21, 2025. XIE Linghui. Data-driven Home Credit Default Risk Analysis. April 21, 2025. XIE Jize, WANG Qiaoqiao, and FU Erjia. Home Credit Default Risk. April 21, 2025. Su YAN. Paper Replication: (Re-)Imag(in)ing Price Trends. April 21, 2025. Yitian DUAN, Wencan XIA, Rui LI, Tao YAO. Paper Replication: (Re-)Imag(in)ing Price Trends. April 21, 2025. Yi Zhou, Shuai Liu, Shuaiyin He, Yihan Mei, and Sichen Wang. BiLSTM-Based Deeping Learning Model on M5 Forecasting: Accuracy and Uncertainty. April 21, 2025. LIU Haoyu, XIE Zijun, YU Yingzhe, ZHAO Zhenyu. CNN-Based Price Trend Prediction. April 21, 2025. Ruilin ZHANG, Xiao SHI. Kaggle Contest: M5 competition. April 21, 2025. Bingsong Gao, Ding Ding, Zihao Zhang, Wenshuo Zhao. Empirical Asset-Pricing via Machine Learning: A Reproducibility Study. April 21, 2025. Han FANG, Yifan ZHAO, Zhiyuan ZHOU and Zihao ZOU. Evaluating Temporal Models for Predictive Performance on Jane Street Real-Time Market Data. April 21, 2025. Zhiyi LI, Junjie HOU, Xiaonan SHANG, ChongTong CHOW. Kaggle Contest: 𝑀5 𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡𝑖𝑛𝑔 − 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦. April 21, 2025. Biying HU, Bowei ZHANG, Xiaoheng MA, Yuyao LIU. Home Credit Default Risk. April 21, 2025. Wan Hin MOK, Ruihe WANG, Weipeng XU, Weihong ZHANG. M5 Forecasting - Accuracy & Uncertainty Task. April 21, 2025. YEUNG Chun Sze. Kaggle: M5 Forecasting - Estimation of the Uncertainty. April 21, 2025. Hei Chun LEUNG, Kin Chit Alexander O, Kai Chung Kenneth WU. Classification of Iris Flower Dataset using Different Algorithms. April 21-22, 2025. Ziyun LIU. Jane Street Real-Time Market Data Forecasting. April 22, 2025. Zhongchangfei LI. Machine Learning Algorithms for Empirical Asset Pricing. April 22, 2025. QIU, Yu. Comparative Analysis of Statistical Models and Machine Learning Models on M5 Forecasting Accuracy Study. GUPTA, Pranav. Paper Review: Empirical Asset Pricing via Machine Learning. SHEN, Hengyu*. Paper Replication Studies: (Re-)Imag(in)ing Price Trends. April 29, 2025.	Y.Y.