Date |
Topic |
Instructor |
Scriber |
03/02/2025, Mon |
Lecture 01: A Historic Overview. [ slides (pdf) ]
|
Y.Y. |
|
07/02/2025, Fri |
Seminar.
[ Mathematics Colloquium ]
- Title: Theoretical Evaluation of Data Reconstruction Error and Induced Optimal Defenses [ announcement ] [ slides ]
- Speaker: Prof. Qi LEI, New York University
- Time: Friday Feb 7, 2025, 10:30am-noon
- Abstract: Data reconstruction attacks and defenses are crucial for understanding data leakage in machine learning and federated learning. However, previous research has largely focused on empirical observations of gradient inversion attacks, lacking a theoretical framework for quantitatively analyzing reconstruction errors based on model architecture and defense methods.
In this talk, we propose framing the problem as an inverse problem, enabling a theoretical and systematic evaluation of data reconstruction attacks. For various defense methods, we derive the algorithmic upper bounds and matching information-theoretical lower bounds on reconstruction error for two-layer neural networks, accounting for feature and architecture dimensions as well as defense strength. We further propose two defense strategies — Optimal Gradient Noise and Optimal Gradient Pruning — that maximize reconstruction error while maintaining model performance.
- Bio:
Qi Lei is an assistant professor of Mathematics and Data Science at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU. Previously she was an associate research scholar at the ECE department of Princeton University. She received her Ph.D. from Oden Institute for Computational Engineering & Sciences at UT Austin. She visited the Institute for Advanced Study (IAS)/Princeton for the Theoretical Machine Learning Program. Before that, she was a research fellow at Simons Institute for the Foundations of Deep Learning Program. Her research aims to develop mathematical groundings for trustworthy and (sample- and computationally) efficient machine learning algorithms. Qi has received several awards/recognitions, including Rising Stars in Machine Learning, in EECS, and in Statistics and Data Science, the Outstanding Dissertation Award, Computing Innovative Fellowship, and Simons-Berkeley Research Fellowship..
[ Relevant Reference ]:
- Zihan Wang, Jason D. Lee, Qi Lei. Reconstructing Training Data from Model Gradient, Provably [ link ]
- Sheng Liu*, Zihan Wang*, Yuxiao Chen, Qi Lei. Data Reconstruction Attacks and Defenses: A Systematic Evaluation. [ link ]
- Yuxiao Chen, Gamze Gürsoy, Qi Lei. Optimal Defenses Against Gradient Reconstruction Attacks. [ link ]
|
Y.Y. |
|
10/02/2025, Mon |
Lecture 02: Supervised Learning: linear regression and classification [ slides ]
|
Y.Y. |
|
17/02/2025, Mon |
Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ]
|
Y.Y. |
|
24/02/2025, Mon |
Lecture 04: Moving beyond Linearity [ slides ]
[ Mathematics Colloquium ]
- Title: A new machine learning algorithm, complex systems and AI predictions [ announcement ] [ slides ]
- Speaker: Prof. Zhihong Xia, Greater Bay University and Northwestern University
- Time: Monday Feb 24, 2025, 4-5pm, Rm 4621 (Lift 31/32)
- Abstract: We propose a novel machine learning algorithm inspired by complex analysis. Our algorithm has a better mathematical formulation and can approximate universal functions much more efficiently. The algorithm can be implemented in two self-learning neural networks: The CauchyNet and the XNet. The CauchyNet is very efficient for low-dimensional problems such as extrapolation, imputation, numerical solutions of PDEs and ODEs. The XNet, on the other hand, works for large dimensional problems such as image and voice recognition, transformers and likely LLMs, often improving the current method by several orders of magnitude.
In the context of modern AI, we also pose the following question: given data from a single observable g in a dynamical system, is it possible to recover the underlying system? For instance, with a large dataset of positional observations from an n-body system, can we predict its future motion without resorting to Newtonian mechanics? Surprisingly, the answer is yes for almost any typical observable. We introduce the principle of space-time swap: the absence of spatial information in a dynamical system can be compensated by leveraging temporal information. This principle is grounded in Takens’ Embedding Theorem (building upon Whitney’s embedding theorem). We believe this idea has broad potential for applications in the analysis and prediction of complex systems.
- Bio:
Zhihong Jeff Xia received his PhD from Northwestern University in 1988. He held a Benjamin Pierce Lecturer and Assistant Professorship at Harvard University, and a tenured faculty position at the Georgia Institute of Technology before joining Northwestern University as a professor of mathematics in 1994. In 2000, Xia was appointed the Arthur and Gladys Pancoe Professor of Mathematics at Northwestern. He joined the Great Bay University in 2024.
Xia’s field of research is Dynamical Systems, Solar system dynamics and Machine learning algorithms. He solved the century old Painleve conjecture in mathematics; discovered (jointly with Jian Li) that a large planet from outside of the solar system once flew by our solar system a few hundreds of million years ago; He also created an efficient machine learning algorithm.
Xia was named an Alfred P. Sloan Fellow in 1989. He was awarded the Blumenthal Award for advancement of pure mathematics (1993), he was awarded the Monroe H. Martin Prize in applied mathematics (1995). He was NSF’s National Young Investigator. He was invited to speak at the 1998 International Congress of Mathematicians. Xia was the founding chair of the department of mathematics at the Southern University of Science and Technology.
Xia is currently co-editor-in-chief of 《知识分子》, he is one of the founding members of the science committee of the Future Science Prize.
|
Y.Y. |
|
03/03/2025, Mon |
Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ]
|
Y.Y. |
|
10/03/2025, Mon |
Lecture 06: Support Vector Machines [ YY's slides ] and Mini-Project Initialization [ project1.pdf ]
[Reference]:
- To view .ipynb files below, you may try [ Jupyter NBViewer]
- Python Notebook for Support Vector Machines
[ svm.ipynb ]
- Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data.
[ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin.
- Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss
leads to max margin.
[ Reference ]:
- Kaggle: Home Credit Default Risk [ link ]
- Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods.
[ link ]
|
Y.Y. |
|
17/03/2025, Mon |
Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ]
|
Y.Y. |
|
24/03/2025, Mon |
Lecture 08: An Introduction to Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Attention and Transformer [ slides ]
|
Y.Y. |
|
31/03/2025, Mon |
Lecture 09: Seminar and Final Project Initialization [ project2.pdf ].
[ Seminar ]
- Title: Be aware of model capacity when talking about generalization in machine learning [ announcement ] [ slides ]
- Speaker: Prof. Fanghui LIU, University of Warwick
- Time: Monday March 31, 2025, 18:30
- Abstract: Machine learning (ML) generally operates in high-dimensions, of which the performance is characterized by learning efficiency—both theoretically (statistical and computational efficiency) and empirically (practical efficient ML).
A fundamental question in ML theory and practice is how the test error (generalization) evolves with sample size and model capcity (e.g., model size), shaping key concepts such as the bias-variance trade-offs, double descent, and scaling laws.
In this talk, I will discuss how the test error will behave if a more suitable metric than model size for model capacity is used. To be specific, I will present a unified perspective on generalization by analyzing how norm-based model capacity control reshapes our understanding of
these foundational concepts: there is no bias-variance trade-offs; phase transition exists from under-parameterized regimes to over-parameterized regimes while double descent doesn't exist; scaling law is formulated as a multiplication style under norm-based capacity.
Additionally, I will briefly discuss which norm is suitable for neural networks and what are the fundamental limits of learning efficiency imposed by such norm-based capacity from the perspective of function space.
- Bio:
Dr. Fanghui Liu is currently an assistant professor at University of Warwick, UK, a member of Centre for Discrete Mathematics and its Applications (DIMAP). His research interests include foundations of machine learning as well as efficient machine learning algorithm design.
He was a recipient of AAAI'24 New Faculty Award, Rising Star in AI (KAUST 2023), co-founded the fine-tuning workshop at NeurIPS'24, and served as an area chair of ICLR and AISTATS. Besides, he has delivered three tutorials at ISIT’24, CVPR’23, and ICASSP’23, respectively.
Prior to his current position, he worked as a postdoc researcher at EPFL (2021-2023) and KU Leuven (2019-2023), respectively. He received his PhD degree from Shanghai Jiao Tong University in 2019 with several Excellent Doctoral Dissertation Awards.
[ Relevant Papers ]:
- Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu. Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence. [ arXiv:2502.01585 ]
- Fanghui Liu, Leello Dadi, Volkan Cevher. Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks. [ arXiv:2404.18769 ]
[ Seminar and Project Description ]
- Title: Digital Historical Forensics: A Computational Approach to Wartime Media Cultures [ slides ]
- Speaker: Dr. Lin DU, National University of Singapore
- Abstract: This study examines the longstanding need and challenge of providing contextual analysis of historical images stored in digital visual archives and the accessibility of retrieving contextual information from these historical archives.
Contextual analysis is essential for disciplines such as history and art history, as it allows for the contextualization of artwork and historical sources with historical narratives, which in turn enhances understanding of the artistic or political
expression in the contents of cultural products. To address this challenge, a novel approach is proposed utilizing computer vision to trace the circulation and dissemination of historical photographs in their original contexts. This method involves
first using YOLO v7 to crop historical images from pictorial magazines, then training machine learning models on the cropped printed images plus another large dataset of original historical photographs, and comparing the similarity of images between
the datasets of printed images and original photographs. To ensure accuracy of image similarities between the two subsets with distinct image qualities, an ensemble of three machine learning models—Vision Transformer, EfficientNetv2, and Swin Transformer
—--- was developed. Through this system, contexts in the circulation of historical photographs were discovered and new insights regarding the editing strategies of propaganda magazines in East Asia during WWII were uncovered. These outcomes offer supporting
evidence for previous research in the history and art historical disciplines, and demonstrate the potential of computer vision for uncovering new information from digital visual archives. Our model achieves a 77.8% top-15 retrieval accuracy on our evaluation
dataset. Further projects addressing these challenges are outlined, accompanied by relevant datasets.
- Bio: Lin Du is currently a Postdoctoral Fellow and will join as an assistant professor in July, jointly appointed in the Departments of Japanese Studies and Chinese Studies at the National University of Singapore. She completed her PhD at the
Department of Asian Languages and Cultures at UCLA, where her dissertation, "Chinese Photojournalism 1937–1952: Materiality and the Institutionalization of Culture via a Computer Vision Approach," utilized advanced computer vision techniques to
explore wartime visual media culture. Lin holds an MA from the Regional Studies East Asia Program at Harvard University and a BA in Chinese Language and Literature from Peking University. Her pioneering work in machine learning has been published
in the ACM Journal on Computing and Cultural Heritage (JOCCH), and her contributions to humanities research are forthcoming in the Journal of Chinese Cinemas and Asia Pacific Perspectives.
[ Kaggle Contests ]:
- Kaggle: Home Credit Default Risk [ link ]
- Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods.
[ link ]
- Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales.
[ link ]
[Reference]:
- Shihao Gu, Bryan Kelly and Dacheng Xiu
"Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, 2020, 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award.
[ link ]
- Jingwen Jiang, Bryan Kelly and Dacheng Xiu
"(Re-)Imag(in)ing Price Trends", The Journal of Finance, 78: 3193-3249, 2023.
[ ssrn ][ https://doi.org/10.1111/jofi.13268 ]
|
Y.Y. |
|
01/04/2025, Tue |
Seminar
- Title: One-step full gradient can be sufficient to low-rank fine-tuning, provably and efficiently [ slides ]
- Speaker: Prof. Fanghui LIU, University of Warwick
- Time: Tuesday April 1, 2025, 11am-noon, Room 2463 (lift 25/26), HKUST
- Abstract: In this talk, I will discuss how to improve the performance of Low-Rank Adaption (LoRA) guided by our theory. Our theoretical results show that LoRA will align to the certain singular subspace of one-step gradient of full fine-tuning.
Accordingly, alignment and generalization guarantees can be directly achieved by our theory-grounded spectral initialization strategy for both linear and nonlinear models, and the subsequent linear convergence can be also built.
Our analysis leads to the LoRA-One algorithm, a theoretically grounded algorithm that achieves significant empirical improvement over vanilla LoRA and its variants on several benchmarks by fine-tuning Llama 2.
Our theoretical analysis has independent interest for understanding matrix sensing and deep learning theory.
Joint work with Yuanhe Zhang, Yudong Chen.
[ Relevant Papers ]:
- Yuanhe Zhang, Fanghui Liu, Yudong Chen. One-step full gradient suffices for low-rank fine-tuning, provably and efficiently. [ arXiv:2502.01235 ]
|
Y.Y. |
|
07/04/2025, Mon |
Lecture 10: Transformer and Applications [ slides ]
[ Seminar ]
- Title: Advancements in Kernel Learning and Offline Reinforcement Learning through Generative Models [ slides I ] [ slides II ]
- Speaker: Prof. Wenjia Wang, HKUST-GZ
- Time: 8:00pm
- Abstract: In this talk, I will present two classes of my recent research.
In Part I, I will talk about random smoothing data augmentation. Random smoothing data augmentation is a unique form of regularization that can prevent overfitting by introducing noise to the input data,
encouraging the model to learn more generalized features. In this work, we aim to bridge this gap by presenting a framework for random smoothing regularization that can adaptively and
effectively learn a wide range of ground truth functions belonging to the classical Sobolev spaces. By using random smoothing regularization as novel convolution-based smoothing kernels,
we can attain optimal convergence rates in these cases using a kernel gradient descent algorithm, either with early stopping or weight decay.
In Part II, I will talk about our recent series of works on offline reinforcement learning.
Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points.
Existing methods for addressing this issue either control policy to exclude the OOD action or make the Q-function pessimistic. However, these methods can be overly conservative or fail to identify OOD areas accurately.
In this talk, I will be discussing our recent advancements in offline reinforcement learning, specifically focusing on the utilization of generative models such as GAN and diffusion models.
Our proposed methods are evaluated on the D4RL benchmarks and have demonstrated significant improvements across numerous tasks. Theoretical results are provided for performance guarantee.
- Bio: Wenjia Wang is an assistant professor in the Data Science and Analysis Thrust at the Information Hub of the Hong Kong University of Science and Technology (Guangzhou).
He obtained his Ph.D. in the School of Industrial & Systems Engineering at Georgia Institute of Technology. Wenjia Wang's research interests include uncertainty quantification, computer experiments, machine learning, stochastic simulation, and nonparametric statistics.
[ Relevant Papers ]:
- Ding, L., Hu, T., Jiang, J., Li, D., Wang, W., & Yao, Y. (2024). Random smoothing regularization in kernel gradient descent learning. Journal of Machine Learning Research.
[ arXiv:2305.03531 ]
- Fang, L., Liu, R., Zhang, J., Wang, W., & Jing, B. Y. (2025). Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning.
The Thirteenth International Conference on Learning Representations (ICLR).
- Zhang, J., Fang, L., Shi, K., Wang, W., & Jing, B. Y. (2024). Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model.
Neural Information Processing Systems (NeurIPS), 2024.
- Zhang, J., Zhang, C., Wang, W., & Jing, B. Y. (2023). Constrained Policy Optimization with Explicit Behavior Density For Offline Reinforcement Learning.
Neural Information Processing Systems (NeurIPS), 2023.
|
Y.Y. |
|
14/04/2025, Mon |
Seminars.
Title: Transformers As Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. [ slides ] [ video ]
- Speaker: Prof. Song MEI, University of California at Berkeley.
- Time: 6:40pm
- Abstract:
Neural sequence models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model.
This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression,
Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism,
our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences.
Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving in-context algorithm selection, akin to what a statistician can do in real life -- A single transformer can adaptively select different base ICL algorithms --
or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally.
In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging
task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.
- Bio: Song Mei is an Assistant Professor in the Department of Statistics and the Department of Electrical Engineering and Computer Sciences at UC Berkeley. In June 2020, he received Ph.D. from Stanford, with Prof. Andrea Montanari.
Song's research is motivated by data science and AI, and lies at the intersection of statistics, machine learning, information theory, and computer science. His current research interests include language models and diffusion models, theory of deep learning,
theory of reinforcement learning, high dimensional statistics, quantum algorithms, and uncertainty quantification. Song received Sloan Research Fellowship in 2025 and NSF career award in 2024.
[ Relevant Papers ]:
- Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection. NeurIPS, 2023 (Oral). [ arXiv:2306.04637]
Title: Introducing Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving. [ slides ]
- Speaker: Prof. Yong LIN, Princeton University.
- Time:8:00pm
- Abstract:
In this talk, I will introduce Goedel-Prover (https://goedel-lm.github.io/), an open-source large language model (LLM) that achieves the state-of-the-art (SOTA) performance in automated formal proof generation for mathematical problems.
The key challenge in this field is the scarcity of formalized math statements and proofs, which we tackle in the following ways. We train statement formalizers to translate the natural language math problems from Numina into formal language (Lean 4).
We then iteratively build a large dataset of formal proofs by training a series of provers. Each prover succeeds in proving many statements that the previous ones could not, and these new proofs are added to the training set for the next prover.
The final prover outperforms all existing open-source models in whole-proof generation. On the miniF2F benchmark, it achieves a 57.6% success rate (Pass@32), exceeding the previous best open-source model by 7.6%.
On PutnamBench, Goedel-Prover successfully solves 7 problems (Pass@512), ranking first on the leaderboard.
Furthermore, it generates 29.7K formal proofs for Lean Workbook problems, nearly doubling the 15.7K produced by earlier works.
- Bio: Yong Lin is a postdoctoral fellow at Princeton Language and Intelligence (PLI), collaborating with Chi Jin, Sanjeev Arora, and Danqi Chen. He completed his PhD in Tong Zhang's group at the Hong Kong University of Science and Technology (HKUST).
His research focuses on the trustworthiness and applications of machine learning, with particular emphasis on verifiable generation, LLM alignment, and out-of-distribution generalization. Currently, he leads the Goedel-Prover project at Princeton,
where he trains LLMs for automated theorem proving in LEAN. Prior to his PhD, Yong worked as a Senior Machine Learning Engineer at Alibaba for 4 years, a leading tech company in China.
He has published over 30 papers in top-tier ML, CV, and NLP conferences and received the Outstanding Paper Award at NAACL 2024. Additionally, he was awarded the Apple AI/ML PhD Fellowship in 2023 and the Hong Kong PhD Fellowship in 2020.
[ Relevant Papers ]:
- Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin. Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving. [ arXiv:2502.07640]
|
Y.Y. |
|