HKUST

CSIC5011-Math5473: Topological and Geometric Data Reduction and Visualization
Spring 2025


Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics, statistics, and engineering, who are interested in learning from data. Students with other backgrounds such as life sciences are also welcome, provided you have certain maturity of mathematics. It will cover wide topics in geometric (principal component analysis and manifold learning, etc.) and topological data reduction (clustering and computational homology group, etc.).
Prerequisite: linear and abstract algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains), convex optimization; familiarity with Matlab, R, and/or Python, etc.

Reference (参考教材)

[pdf download]

Topological Data Analysis for Genomics and Evolution: Topology in Biology. By Raul Rabadan and Andrew J. Blumberg [ Amazon ]

Instructors:

Yuan YAO

Time and Venue:

Wed 03:00PM - 05:50PM, Rm 4503, Lift 25-26 (64)

Homework and Projects:

Weekly homeworks (no grader, but I'll read your submissions and give bonus credits), mini-projects, and a final major project. No final exam.
Email: datascience.hw (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date Topic Instructor Scriber
02/05/2025, Wed Lecture 01: Syllabus, Principal Component Analysis, and Multidimensional Scaling [ Class outline ] [ PCA-MDS slides ]
    [Homework 1]:
  • Homework 1 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits.
Y.Y.
02/07/2025, Fri Seminar.
    [ Mathematics Colloquium ]
  • Title: Theoretical Evaluation of Data Reconstruction Error and Induced Optimal Defenses [ announcement ]
  • Speaker: Prof. Qi LEI, New York University
  • Time: Friday Feb 7, 2025, 10:30am-noon
  • Abstract: Data reconstruction attacks and defenses are crucial for understanding data leakage in machine learning and federated learning. However, previous research has largely focused on empirical observations of gradient inversion attacks, lacking a theoretical framework for quantitatively analyzing reconstruction errors based on model architecture and defense methods. In this talk, we propose framing the problem as an inverse problem, enabling a theoretical and systematic evaluation of data reconstruction attacks. For various defense methods, we derive the algorithmic upper bounds and matching information-theoretical lower bounds on reconstruction error for two-layer neural networks, accounting for feature and architecture dimensions as well as defense strength. We further propose two defense strategies — Optimal Gradient Noise and Optimal Gradient Pruning — that maximize reconstruction error while maintaining model performance.
  • Bio: Qi Lei is an assistant professor of Mathematics and Data Science at the Courant Institute of Mathematical Sciences and the Center for Data Science at NYU. Previously she was an associate research scholar at the ECE department of Princeton University. She received her Ph.D. from Oden Institute for Computational Engineering & Sciences at UT Austin. She visited the Institute for Advanced Study (IAS)/Princeton for the Theoretical Machine Learning Program. Before that, she was a research fellow at Simons Institute for the Foundations of Deep Learning Program. Her research aims to develop mathematical groundings for trustworthy and (sample- and computationally) efficient machine learning algorithms. Qi has received several awards/recognitions, including Rising Stars in Machine Learning, in EECS, and in Statistics and Data Science, the Outstanding Dissertation Award, Computing Innovative Fellowship, and Simons-Berkeley Research Fellowship..
    [ Relevant Reference ]:
  • Zihan Wang, Jason D. Lee, Qi Lei. Reconstructing Training Data from Model Gradient, Provably [ link ]
  • Sheng Liu*, Zihan Wang*, Yuxiao Chen, Qi Lei. Data Reconstruction Attacks and Defenses: A Systematic Evaluation. [ link ]
  • Yuxiao Chen, Gamze Gürsoy, Qi Lei. Optimal Defenses Against Gradient Reconstruction Attacks. [ link ]
Y.Y.
02/08/2025, Sat Lecture 02: Horn's Parallel Analysis and Random Matrix Theory for PCA, Sufficient Dimensionality Reduction and Supervised PCA (Chap 2: 3-5) [ slides ] [ Sufficient Dimensionality Reduction and Supervised PCA ]
    [Time and Venue]:
  • 3:00PM - 5:50PM, G009A, CYT Bldg (80)
    [Homework 2]:
  • Homework 2 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits
Y.Y. LI, Zhen

Datasets (to-be-updated)


by YAO, Yuan.