| Date |
Topic |
Instructor |
Scriber |
| 09/02/2025, Tue |
Lecture 01: A Historic Overview of AI and Statistical Machine Learning. [ slides (pdf) ]
|
Y.Y. |
|
| 09/09/2025, Tue |
Lecture 02: Supervised Learning: linear regression and classification [ slides ]
|
Y.Y. |
|
| 09/11/2025, Thu |
Seminar.
- Title: PKU Quest: AI-Powered Math Education Practice at Peking University [ announcement ] [ slides ]
- Speaker: Leheng Chen and Zihao Liu, Peking University
- Time: Thursday Sep 11, 2025, 3:30pm
- Venue: Room 2612B (near Lift 31 & 32)
- Abstract:
The advent of Generative AI necessitates a paradigm shift in higher education, calling for new, diverse models of interaction between students, teachers, and AI. In response to this challenge, Peking University has developed PKU Quest, an AI-assisted platform designed to explore these new pedagogical frontiers. PKU Quest focuses on optimizing for the unique demands of mathematics education, and has developed the "Math Tutor," a tool specifically designed for math problem-solving support. Instead of providing direct answers, the Math Tutor engages students in a heuristic and exploratory dialogue, guiding them to develop independent thinking and problem-solving skills. This application has now been implemented across all foundational mathematics courses at Peking University.
This presentation will share our journey in developing PKU Quest, discussing the motivations, challenges, and practical outcomes of what we consider a first step in exploring the vast potential of AI in education.
- Bio:
Leheng Chen is a Ph.D. student at the Beijing International Center for Mathematical Research (BICMR), Peking University, advised by Professor Bin Dong. He has broad interests in the application of artificial intelligence. Previously, he explored research directions in AI for Science, such as thermodynamic modeling and foundation models for partial differential equations, with his work published in Physical Review E and at an ICLR Workshop. He has since shifted his research focus to the practical application of AI in Education, where he designed and developed "PKU Quest," an AI-assisted teaching and learning platform for Peking University.
Zihao Liu (Leo) is a Ph.D. student in Applied Mathematics and Artificial Intelligence at the School of Mathematical Sciences, Peking University. His interests span the application of AI to education and scientific understanding, with recent work focusing on improving the pedagogical effectiveness of AI-powered educational agents and building benchmark datasets for evaluating AI capabilities. As the founder and lead developer of PKU Quest and AKIS (AI Knowledge Intelligent Solution), he focuses on the practical deployment of AI-in-education systems and has helped design and develop “AIBOOKS,” an intelligent digital-textbook platform, and “Math Tutor,”
a guided problem-solving assistant for students. He is deeply committed to advancing the integration of AI and education.
|
Y.Y. |
|
| 09/16/2025, Tue |
Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ]
[ Seminar ]
- Speaker: QRT guest speakers [ poster ]
|
Y.Y. |
|
| 09/23/2025, Tue |
Lecture 04: Project 1 [ pdf ] via canvas zoom and on-site class is cancelled due to typhoon signal 8.
[ Reference ]:
- Kaggle: Home Credit Default Risk [ link ]
- Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods.
[ link ]
|
Y.Y. |
|
| 09/27/2025, Sat |
Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ]
|
Y.Y. |
|
| 09/30/2025, Tue |
Lecture 06: Support Vector Machines [ YY's slides ]
[Reference]:
- To view .ipynb files below, you may try [ Jupyter NBViewer]
- Python Notebook for Support Vector Machines
[ svm.ipynb ]
- Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data.
[ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin.
- Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss
leads to max margin.
- Yuan Yao, Lorenzo Rosasco and Andrea Caponnetto. On Early Stopping in Gradient Descent Learning. Constructive Approximation, 2007, 26 (2): 289-315.
[ link ]
- Jingfeng Wu, Peter L. Bartlett, Jason D. Lee, Sham M. Kakade, and Bin Yu. Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization. [ arXiv:2509.17251 ]
|
Y.Y. |
|
| 10/11/2025, Sat |
Lecture 06+: Advanced topics on early stopping regularization.
[ Seminar ]
- Title: A Statistical View on Implicit Regularization: Gradient Descent Dominates Ridge [ slides ]
- Speaker: Dr. Jingfeng WU, UC Berkeley
- Time: 10:30am, LTF
- Abstract: A key puzzle in deep learning is how simple gradient methods find generalizable solutions without explicit regularization. This talk discusses the implicit regularization of gradient descent (GD) through the lens of statistical dominance. Using linear regression as a clean proxy, we present three surprising findings.
First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with online stochastic gradient descent (SGD). While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems -- those with fast and continuously decaying covariance spectra -- which includes all problems satisfying the standard capacity condition.
This is joint work with Peter Bartlett, Sham Kakade, Jason Lee, and Bin Yu.
- Bio:
Jingfeng Wu is a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley. His research focuses on deep learning theory, optimization, and statistical learning. He earned his Ph.D. in Computer Science from Johns Hopkins University in 2023. Prior to that, he received a B.S. in Mathematics (2016) and an M.S. in Applied Mathematics (2019), both from Peking University. In 2023, he was recognized as a Rising Star in Data Science by the University of Chicago and UC San Diego.
|
Y.Y. |
|
| 10/14/2025, Tue |
Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ] and OpenReview Submission Instruction for Project 1 [ slides ]
|
Y.Y. |
|
| 10/28/2025, Tue |
Lecture 08: An Introduction to Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Attention and Transformer [ slides ]
|
Y.Y. |
|
| 11/04/2025, Tue |
Lecture 9: Transformer and Applications [ slides ]
[ Seminar ]
- Title: Transformers As Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. [ slides ] [ video ]
- Speaker: Prof. Song MEI, University of California at Berkeley.
- Abstract:
Neural sequence models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model.
This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression,
Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism,
our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences.
Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving in-context algorithm selection, akin to what a statistician can do in real life -- A single transformer can adaptively select different base ICL algorithms --
or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally.
In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging
task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.
- Bio: Song Mei is an Assistant Professor in the Department of Statistics and the Department of Electrical Engineering and Computer Sciences at UC Berkeley. In June 2020, he received Ph.D. from Stanford, with Prof. Andrea Montanari.
Song's research is motivated by data science and AI, and lies at the intersection of statistics, machine learning, information theory, and computer science. His current research interests include language models and diffusion models, theory of deep learning,
theory of reinforcement learning, high dimensional statistics, quantum algorithms, and uncertainty quantification. Song received Sloan Research Fellowship in 2025 and NSF career award in 2024.
- Reference: Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. NeurIPS, 2023 (Oral). [ arXiv:2306.04637]
|
Y.Y. |
|
| 11/11/2025, Tue |
Lecture 10: An Introduction to Reinforcement Learning with Applications [ slides ] and Final Project Initialization [ project2.pdf ]
[ Reference ]:
- Google DeepMind's Deep Q-learning playing Atari Breakout:
[ youtube ]
- To view .ipynb files below, you may try [ Jupyter NBViewer]
- Deep Q-Learning Pytorch Tutorial: [ link ]
- A Tutorial of Reinforcement Learning for Quantitative Trading:
[ Tutorial ]
[ Replicate ]
- FinRL: Deep Reinforcement Learning for Quantitative Finance
[ GitHub ]
- Reinforcement Learning and Supervised Learning for Quantitative Finance: [ link ]
- Prof. Michael Kearns, University of Pennsyvania, Algorithmic Trading and Machine Learning,
Simons Institute at Berkeley [ link ]
[ Kaggle Contests ]:
- Kaggle: Home Credit Default Risk [ link ]
- Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods.
[ link ]
- Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales.
[ link ]
- Kaggle: Ubiquant Market Prediction - Make predictions against future market data.
[ link ]
- Kaggle: G-Research Crypto Forecasting.
[ link ]
- Type-II diabetes and Alzheimer’s disease.
[ slides (pdf) ]
[ slides (pptx) ]
[Paper Replication]:
- Shihao Gu, Bryan Kelly and Dacheng Xiu
"Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, 2020, 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award.
[ link ]
- Jingwen Jiang, Bryan Kelly and Dacheng Xiu
"(Re-)Imag(in)ing Price Trends", The Journal of Finance, 78: 3193-3249, 2023.
[ ssrn ][ https://doi.org/10.1111/jofi.13268 ]
|
Y.Y. |
|