Date |
Topic |
Instructor |
Scriber |
03/02/2025, Mon |
Lecture 01: A Historic Overview. [ slides (pdf) ]
|
Y.Y. |
|
07/02/2025, Fri |
Seminar.
[ Mathematics Colloquium ]
- Title: Theoretical Evaluation of Data Reconstruction Error and Induced Optimal Defenses [ announcement ] [ slides ]
- Speaker: Prof. Qi LEI, New York University
- Time: Friday Feb 7, 2025, 10:30am-noon
- Abstract: Data reconstruction attacks and defenses are crucial for understanding data leakage in machine learning and federated learning. However, previous research has largely focused on empirical observations of gradient inversion attacks, lacking a theoretical framework for quantitatively analyzing reconstruction errors based on model architecture and defense methods.
In this talk, we propose framing the problem as an inverse problem, enabling a theoretical and systematic evaluation of data reconstruction attacks. For various defense methods, we derive the algorithmic upper bounds and matching information-theoretical lower bounds on reconstruction error for two-layer neural networks, accounting for feature and architecture dimensions as well as defense strength. We further propose two defense strategies — Optimal Gradient Noise and Optimal Gradient Pruning — that maximize reconstruction error while maintaining model performance.
- Bio:
Qi Lei is an assistant professor of Mathematics and Data Science at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU. Previously she was an associate research scholar at the ECE department of Princeton University. She received her Ph.D. from Oden Institute for Computational Engineering & Sciences at UT Austin. She visited the Institute for Advanced Study (IAS)/Princeton for the Theoretical Machine Learning Program. Before that, she was a research fellow at Simons Institute for the Foundations of Deep Learning Program. Her research aims to develop mathematical groundings for trustworthy and (sample- and computationally) efficient machine learning algorithms. Qi has received several awards/recognitions, including Rising Stars in Machine Learning, in EECS, and in Statistics and Data Science, the Outstanding Dissertation Award, Computing Innovative Fellowship, and Simons-Berkeley Research Fellowship..
[ Relevant Reference ]:
- Zihan Wang, Jason D. Lee, Qi Lei. Reconstructing Training Data from Model Gradient, Provably [ link ]
- Sheng Liu*, Zihan Wang*, Yuxiao Chen, Qi Lei. Data Reconstruction Attacks and Defenses: A Systematic Evaluation. [ link ]
- Yuxiao Chen, Gamze Gürsoy, Qi Lei. Optimal Defenses Against Gradient Reconstruction Attacks. [ link ]
|
Y.Y. |
|
10/02/2025, Mon |
Lecture 02: Supervised Learning: linear regression and classification [ slides ]
|
Y.Y. |
|
17/02/2025, Mon |
Lecture 03: Model Assessment and Selection: Subset, Ridge, Lasso, and PCR [ slides ]
|
Y.Y. |
|
24/02/2025, Mon |
Lecture 04: Moving beyond Linearity [ slides ]
[ Mathematics Colloquium ]
- Title: A new machine learning algorithm, complex systems and AI predictions [ announcement ] [ slides ]
- Speaker: Prof. Zhihong Xia, Greater Bay University and Northwestern University
- Time: Monday Feb 24, 2025, 4-5pm, Rm 4621 (Lift 31/32)
- Abstract: We propose a novel machine learning algorithm inspired by complex analysis. Our algorithm has a better mathematical formulation and can approximate universal functions much more efficiently. The algorithm can be implemented in two self-learning neural networks: The CauchyNet and the XNet. The CauchyNet is very efficient for low-dimensional problems such as extrapolation, imputation, numerical solutions of PDEs and ODEs. The XNet, on the other hand, works for large dimensional problems such as image and voice recognition, transformers and likely LLMs, often improving the current method by several orders of magnitude.
In the context of modern AI, we also pose the following question: given data from a single observable g in a dynamical system, is it possible to recover the underlying system? For instance, with a large dataset of positional observations from an n-body system, can we predict its future motion without resorting to Newtonian mechanics? Surprisingly, the answer is yes for almost any typical observable. We introduce the principle of space-time swap: the absence of spatial information in a dynamical system can be compensated by leveraging temporal information. This principle is grounded in Takens’ Embedding Theorem (building upon Whitney’s embedding theorem). We believe this idea has broad potential for applications in the analysis and prediction of complex systems.
- Bio:
Zhihong Jeff Xia received his PhD from Northwestern University in 1988. He held a Benjamin Pierce Lecturer and Assistant Professorship at Harvard University, and a tenured faculty position at the Georgia Institute of Technology before joining Northwestern University as a professor of mathematics in 1994. In 2000, Xia was appointed the Arthur and Gladys Pancoe Professor of Mathematics at Northwestern. He joined the Great Bay University in 2024.
Xia’s field of research is Dynamical Systems, Solar system dynamics and Machine learning algorithms. He solved the century old Painleve conjecture in mathematics; discovered (jointly with Jian Li) that a large planet from outside of the solar system once flew by our solar system a few hundreds of million years ago; He also created an efficient machine learning algorithm.
Xia was named an Alfred P. Sloan Fellow in 1989. He was awarded the Blumenthal Award for advancement of pure mathematics (1993), he was awarded the Monroe H. Martin Prize in applied mathematics (1995). He was NSF’s National Young Investigator. He was invited to speak at the 1998 International Congress of Mathematicians. Xia was the founding chair of the department of mathematics at the Southern University of Science and Technology.
Xia is currently co-editor-in-chief of 《知识分子》, he is one of the founding members of the science committee of the Future Science Prize.
|
Y.Y. |
|
03/03/2025, Mon |
Lecture 05: Decision Tree, Bagging, Random Forests and Boosting [ YY's slides ]
|
Y.Y. |
|
10/03/2025, Mon |
Lecture 06: Support Vector Machines [ YY's slides ] and Mini-Project Initialization [ project1.pdf ]
[Reference]:
- To view .ipynb files below, you may try [ Jupyter NBViewer]
- Python Notebook for Support Vector Machines
[ svm.ipynb ]
- Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro. The Implicit Bias of Gradient Descent on Separable Data.
[ arXiv:1710.10345 ]. ICLR 2018. Gradient descent on logistic regression leads to max margin.
- Matus Telgarsky. Margins, Shrinkage, and Boosting. [ arXiv:1303.4172 ]. ICML 2013. An older paper on gradient descent on exponential/logistic loss
leads to max margin.
[ Reference ]:
- Kaggle: Home Credit Default Risk [ link ]
- Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods.
[ link ]
|
Y.Y. |
|
17/03/2025, Mon |
Lecture 07: An Introduction to Convolutional Neural Networks [ YY's slides ]
|
Y.Y. |
|
24/03/2025, Mon |
Lecture 08: An Introduction to Recurrent Neural Networks (RNN), Long Short Term Memory (LSTM), Attention and Transformer [ slides ]
|
Y.Y. |
|
31/03/2025, Mon |
Lecture 09: Seminar and Final Project Initialization [ project2.pdf ].
[ Seminar ]
- Title: Be aware of model capacity when talking about generalization in machine learning [ announcement ]
- Speaker: Prof. Fanghui LIU, University of Warwick
- Time: Monday March 31, 2025, 18:30
- Abstract: Machine learning (ML) generally operates in high-dimensions, of which the performance is characterized by learning efficiency—both theoretically (statistical and computational efficiency) and empirically (practical efficient ML).
A fundamental question in ML theory and practice is how the test error (generalization) evolves with sample size and model capcity (e.g., model size), shaping key concepts such as the bias-variance trade-offs, double descent, and scaling laws.
In this talk, I will discuss how the test error will behave if a more suitable metric than model size for model capacity is used. To be specific, I will present a unified perspective on generalization by analyzing how norm-based model capacity control reshapes our understanding of
these foundational concepts: there is no bias-variance trade-offs; phase transition exists from under-parameterized regimes to over-parameterized regimes while double descent doesn't exist; scaling law is formulated as a multiplication style under norm-based capacity.
Additionally, I will briefly discuss which norm is suitable for neural networks and what are the fundamental limits of learning efficiency imposed by such norm-based capacity from the perspective of function space.
- Bio:
Dr. Fanghui Liu is currently an assistant professor at University of Warwick, UK, a member of Centre for Discrete Mathematics and its Applications (DIMAP). His research interests include foundations of machine learning as well as efficient machine learning algorithm design.
He was a recipient of AAAI'24 New Faculty Award, Rising Star in AI (KAUST 2023), co-founded the fine-tuning workshop at NeurIPS'24, and served as an area chair of ICLR and AISTATS. Besides, he has delivered three tutorials at ISIT’24, CVPR’23, and ICASSP’23, respectively.
Prior to his current position, he worked as a postdoc researcher at EPFL (2021-2023) and KU Leuven (2019-2023), respectively. He received his PhD degree from Shanghai Jiao Tong University in 2019 with several Excellent Doctoral Dissertation Awards.
[ Relevant Papers ]:
- Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu. Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence. [ arXiv:2502.01585 ]
- Fanghui Liu, Leello Dadi, Volkan Cevher. Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks. [ arXiv:2404.18769 ]
[ Seminar and Project Description ]
- Title: Digital Historical Forensics: A Computational Approach to Wartime Media Cultures [ slides ]
- Speaker: Dr. Lin DU, National University of Singapore
- Abstract: This study examines the longstanding need and challenge of providing contextual analysis of historical images stored in digital visual archives and the accessibility of retrieving contextual information from these historical archives.
Contextual analysis is essential for disciplines such as history and art history, as it allows for the contextualization of artwork and historical sources with historical narratives, which in turn enhances understanding of the artistic or political
expression in the contents of cultural products. To address this challenge, a novel approach is proposed utilizing computer vision to trace the circulation and dissemination of historical photographs in their original contexts. This method involves
first using YOLO v7 to crop historical images from pictorial magazines, then training machine learning models on the cropped printed images plus another large dataset of original historical photographs, and comparing the similarity of images between
the datasets of printed images and original photographs. To ensure accuracy of image similarities between the two subsets with distinct image qualities, an ensemble of three machine learning models—Vision Transformer, EfficientNetv2, and Swin Transformer
—--- was developed. Through this system, contexts in the circulation of historical photographs were discovered and new insights regarding the editing strategies of propaganda magazines in East Asia during WWII were uncovered. These outcomes offer supporting
evidence for previous research in the history and art historical disciplines, and demonstrate the potential of computer vision for uncovering new information from digital visual archives. Our model achieves a 77.8% top-15 retrieval accuracy on our evaluation
dataset. Further projects addressing these challenges are outlined, accompanied by relevant datasets.
- Bio: Lin Du is currently a Postdoctoral Fellow and will join as an assistant professor in July, jointly appointed in the Departments of Japanese Studies and Chinese Studies at the National University of Singapore. She completed her PhD at the
Department of Asian Languages and Cultures at UCLA, where her dissertation, "Chinese Photojournalism 1937–1952: Materiality and the Institutionalization of Culture via a Computer Vision Approach," utilized advanced computer vision techniques to
explore wartime visual media culture. Lin holds an MA from the Regional Studies East Asia Program at Harvard University and a BA in Chinese Language and Literature from Peking University. Her pioneering work in machine learning has been published
in the ACM Journal on Computing and Cultural Heritage (JOCCH), and her contributions to humanities research are forthcoming in the Journal of Chinese Cinemas and Asia Pacific Perspectives.
[ Kaggle Contests ]:
- Kaggle: Home Credit Default Risk [ link ]
- Kaggle: M5 Forecasting - Accuracy, Estimate the unit sales of Walmart retail goods.
[ link ]
- Kaggle: M5 Forecasting - Uncertainty, Estimate the uncertainty distribution of Walmart unit sales.
[ link ]
[Reference]:
- Shihao Gu, Bryan Kelly and Dacheng Xiu
"Empirical Asset Pricing via Machine Learning", Review of Financial Studies, Vol. 33, Issue 5, 2020, 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award.
[ link ]
- Jingwen Jiang, Bryan Kelly and Dacheng Xiu
"(Re-)Imag(in)ing Price Trends", The Journal of Finance, 78: 3193-3249, 2023.
[ ssrn ][ https://doi.org/10.1111/jofi.13268 ]
|
Y.Y. |
|
01/04/2025, Tue |
Seminar
- Title: One-step full gradient can be sufficient to low-rank fine-tuning, provably and efficiently [ slides ]
- Speaker: Prof. Fanghui LIU, University of Warwick
- Time: Tuesday April 1, 2025, 11am-noon, Room 2463 (lift 25/26), HKUST
- Abstract: In this talk, I will discuss how to improve the performance of Low-Rank Adaption (LoRA) guided by our theory. Our theoretical results show that LoRA will align to the certain singular subspace of one-step gradient of full fine-tuning.
Accordingly, alignment and generalization guarantees can be directly achieved by our theory-grounded spectral initialization strategy for both linear and nonlinear models, and the subsequent linear convergence can be also built.
Our analysis leads to the LoRA-One algorithm, a theoretically grounded algorithm that achieves significant empirical improvement over vanilla LoRA and its variants on several benchmarks by fine-tuning Llama 2.
Our theoretical analysis has independent interest for understanding matrix sensing and deep learning theory.
Joint work with Yuanhe Zhang, Yudong Chen.
[ Relevant Papers ]:
- Yuanhe Zhang, Fanghui Liu, Yudong Chen. One-step full gradient suffices for low-rank fine-tuning, provably and efficiently. [ arXiv:2502.01235 ]
|
Y.Y. |
|