CSIC5011-Math5473: Topological and Geometric Data Reduction and Visualization

HKUST

CSIC5011-Math5473: Topological and Geometric Data Reduction and Visualization
Spring 2023

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics, statistics, and engineering, who are interested in learning from data. Students with other backgrounds such as life sciences are also welcome, provided you have certain maturity of mathematics. It will cover wide topics in geometric (principal component analysis and manifold learning, etc.) and topological data reduction (clustering and computational homology group, etc.).
Prerequisite: linear and abstract algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains), convex optimization; familiarity with Matlab, R, and/or Python, etc.

Reference (参考教材)

[pdf download]

Topological Data Analysis for Genomics and Evolution: Topology in Biology. By Raul Rabadan and Andrew J. Blumberg [ Amazon ]

Instructors:

Yuan YAO

Time and Place:

Monday 15:00am-17:50pm, Rm 4503, Lift 25-26

Homework and Projects:

Weekly homeworks (no grader, but I'll read your submissions and give bonus credits), mini-projects, and a final major project. No final exam.
Email: datascience.hw (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date	Topic	Instructor	Scriber
02/06/2023, Mon	Lecture 01: Syllabus, Principal Component Analysis, and Multidimensional Scaling [ Class outline ] [ PCA-MDS slides ] [Reference]: To view .jpynb files below, you may try [ Jupyter NBViewer] PCA in iPython Notebook [ pca.ipynb ] [ pca.py ] MDS in Python [ scikit-learn MDS] PCA with Logistic regression for digit classification: [ pca_logistic.ipynb ] [ pca_logistic.py ] [Homework 1]: Homework 1 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits.	Y.Y.
02/13/2023, Mon	Lecture 02: Horn's Parallel Analysis and Random Matrix Theory for PCA, Sufficient Dimensionality Reduction and Supervised PCA (Chap 2: 3-5) [ slides ] [ Sufficient Dimensionality Reduction and Supervised PCA ] [Reference]: Horn's Parallel Analysis in R: [ paran.R ] Parallel Analysis in Matlab: [ papca.m ] Parallel Analysis in Python by LI, Zhen: [ paPCA_curve.py ] [ paPCA_image.py ] Marcenko-Pastur Law of Wishart matrices in Matlab: [ mp.m ] S&P500 dataset in class: [ snp500.Rda ] [ snp452-data.mat ] [ snp500.txt ] [Johnstone06] High dimensional statistical inference and random matrices, ICM2006. [KN08 for multi-rank signal] S. Kritchman and B. Nadler, Determining the number of components in a factor model from limited noisy data, Chemometrics and Intelligent Laboratory Systems 94(1):19-32, 2008. [ NB10: multi-rank subspace ] R. R. Nadakuditi and F. Benaych-Georges, The breakdown point of signal sub- space estimation, IEEE Sensor Array and Multichannel Signal Processing Workshop (2010), Jerusalem, Israel, 2010, pp. 177-180, doi: 10.1109/SAM.2010.5606726. Florent Benaych-Georges and Raj Rao Nadakuditi (2009) The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. [Parallel Analysis: Horn (1965) original paper] [Parallel Analysis: Buja-Eyuboglu (1992) with random permutation] [Raul Rabadan (2018)]: applications of RMT in single cell data analysis [Homework 2]: Homework 2 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.	LI, Zhen
02/20/2023, Mon	Lecture 03: High Dimensional Sample Mean: Inadmissibility of MLE and James-Stein Estimators (Chap 2: 1-2) [ slides ] [Reference]: Comparing Maximum Likelihood Estimator and James-Stein Estimator in R: [ JSE.R ] James-Stein estimator via multi-task ridge regression: Yaqi Duan, Kaizheng Wang (2022) Adaptive and Robust Multi-task Learning, [ arXiv:2202.05250 ] [Homework 3]: Homework 3 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.
02/27/2023, Mon	Lecture 04: Random Projections, Johnson-Lindenstrauss Lemma, and Applications in Compressed Sensing etc. (Chap 3) [ slides ] [Reference]: Joseph Salmon's lecture on Johnson-Lindenstrauss Theory [ JLlemma.pdf ] Random Projections in Scikit-learn: [ link ] Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp. Living on the edge: Phase transitions in convex programs with random data. [ arXiv:1303.6672 ] [Homework 4]: Homework 4 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.
03/06/2023, Mon	Lecture 05: Robust PCA, Sparse PCA, and Graph Realization (MDS) with Uncertainty -- SDP relaxations [ slides ] [Reference]: You need Matlab CVX optimization toolbox to run the following demo codes. Robust PCA demo: [ testRPCA.m ] Robust PCA via ADMM by Stephen Boyd et al.: [ link ] Robust PCA via ADMM in Python [ weblink ] Sparse PCA demo: [ testSPCA.m ] Sparse PCA in Python [ sklearn ] Sensor Network Localization in Matlab: an old package collected by Kirill Konovalov from previous CSIC5011 (2017) [ SNLSDP-0 ] Sensor Network Localization with Facial Reduction in Matlab: [ SNLSDPclique ] Teng ZHANG's Tyler's M-estimator [ Matlab: tyler_m_estimator.m ] Duembgen, Nordhausen and Schuhmacher (2016): R package for M-scatter estimates [ R package fastM ] [Homework 5]: Homework 5 [pdf]. Just for fun, no grading; but I'll read and give bonus credits if you submitted.	Y.Y.
03/13/2023, Mon	Lecture 06: Introduction to Manifold Learning: ISOMAP and LLE (Modified LLE, Hessian LLE, and LTSA) [ slides ] [Reference]: [ISOMAP]: Tenenbaum's website on science paper with datasets; [LLE]: Roweis' website on science paper; Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003). [doi: 10.1073/pnas.1031596100] Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. [ NIPS 2006 ] Zhang, Z. & Zha, H. (2005) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing. 26 (1): 313-338. [doi:10.1137/s1064827502419154] [Python]: plot_mani_digits.ipynb : demo of digits in class plot_compare_methods.ipynb : demo of Swiss roll in class scikit-learn manifold module [Matlab]: IsomapR1 : isomap codes by Tennenbaum, de Silva (isomapII.m with sparsity, fast mex with dijkstra.cpp and fibheap.h lle.m : lle with k-nearest neighbors kcenter.m : k-center algorithm to find 'landmarks' in a metric space	Y.Y.
03/20/2023, Mon	Lecture 07: Manifold Learning II: Laplacian Eigenmap, Diffusion Map, and Stochastic Neighbor Embedding [ slides ] and [ project1.pdf ] [Reference]: Mikhail Belkin, Partha Niyogi. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing Systems (NIPS) 14, 2001, p. 586-691, MIT Press [nips link] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102 (21):7426-7431, 2005 [doi: 10.1073/pnas.0500334102] Nadler, Boaz; Stephane Lafon; Ronald R. Coifman; Ioannis G. Kevrekidis (2005). "Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators" (PDF) in Advances in Neural Information Processing Systems (NIPS) 18, 2005. Coifman, R.R.; S. Lafon. (2006). "Diffusion maps". Applied and Computational Harmonic Analysis. 21: 5鈥�30. 10.1016/j.acha.2006.04.006. Stochastic Neighbor Embedding [ .pdf ] Visualizing Data using t-SNE [ .pdf ] A paper that relates SNE to Laplacian Eigenmaps [ .pdf ] A helpful website: How to use t-SNE effectively? [ link ] [Matlab] Matlab code to compare manifold learning algorithms [ mani.m ] : PCA, MDS, ISOMAP, LLE, Hessian LLE, LTSA, Laplacian, Diffusion (no SNE!) [Python]: plot_compare_methods.ipynb : demo in class plot_mani_digits.ipynb : demo of digits in class scikit-learn manifold LLE : PCA/MDS, ISOMAP, LLE/MLLE, Hessian, LTSA, Laplacian (Spectral), t-SNE (no Diffusion map) Laurens van der Maaten's website for t-SNE codes [Homework 6]: Homework 6 [pdf]. Just for fun, no grading. [Project 1 Report Repository] Description of Project 1: [ project1.pdf ] Peer Review Guide of Project 1. Peer review reports of Project 1. Rebuttal Guide of Project 1. GitHub Repository for reports of Project 1 [ GitHub ] 01. ZHONG Guangzheng. A distance function for ordinal regression dimensional reduction. [ report (pdf) ] [ source (github) ] 02. ZHOU Qiqi. Image Inpainting with PCA. [ poster (pdf) ] [ source ] [ rebuttal (txt) ] 03. YAN, Ningyu; YAN, Bokai; LAI, Yanming. The statistic explanation of the US crime data. [ report (pdf) ] [ source (Rmd) ] [ rebuttal (docx) ] 04. LIU Chen and LIN Hangyu. Understanding US Crime Data by Feature Analysis. [ report (pdf) ] [ source ] 05. XU Yingxue; Jiaxin ZHUANG; Fengtao ZHOU. Visualization and Dimensionality Reduction Techniques for US Crime Data. [ poster (pdf) ] [ source ] 06. CUI Yiran. Explore and Play with SNPs Data for Fun. [ report (pdf) ] [ source (zip) ] 07. SHEN, Lue. Human Migration History: A Single Nucleotide Polymorphisms Perspective. [ report ] [ source (github) ] [ rebuttal (md) ] 08. MA Ruochen; Jihong TANG; Yuyan RUAN; Zhi HUANG. Ancestry Prediction via Dimensionality Reduction Techniques on SNPs Data. [ poster (pdf) ] [ source ] [ rebuttal (pdf) ] 09. HUANG, Zhanmiao; Wencan XIA; Yuanhui LUO. Explore Geographical Variation of Human Gene with SNPs Data. [ poster (pdf) ] [ source ] [ rebuttal (txt)] 10. ZHANG Fa and XIA Ruizhe. Application of Shrinkage in Portfolio Optimization. [ report (pdf) ] [ source (matlab) ] 11. HAO, Yifan; Yueying Hu; Yakun Li; Yonglin Liu. Visualization and Forecasting on Finance Data. [ report (pdf) ] [ source (github) ] 12. DUNDA, Gerry Windiarto Mohamad. Statistical Analysis on Authors and Word Trend of NIPS Papers from 1987 to 2017. [ report (pdf) ] [ source (zip) ] 13. LI Haobo, CHEN Zixin, TENG fei, SHENG Rui. NIPS Conference Papers 1987-2015 Data Set. [ poster (pdf) ] [ source (github) ] [ rebuttal (pdf)] 14. CAI Bibi; QIU Zhenyu; WANG Zhiwei. Topic Modeling For NIPS Words. [ report (pdf) ] [ source (ipynb) ] 15. MENG, Xiao; Ziyu ZHONG; Shunpeng YANG; Tianshu JIANG. Four variants of Principal Component Analysis on Video Background/Foreground Separation. [ poster (pdf) ] [ source (github) ] 16. Chris HC Nguyen and James M Shihua. Robust PCA for Moving Object Detection in Video. [ poster (pdf) ] [ source (github) ] 17. LI Qichao and HUANG Haohan. Exploration of PCA family for handwritten digit classification. [ report (pdf) ] [ source (ipynb) ] 18. JIA Guangnan. EXPLORING THE EFFECTIVENESS OF PCA ON HANDWRITTEN DIGIT DATASET. [ report (pdf) ] [ source ]	Y.Y.
03/27/2023, Mon	Lecture 08: Random Walk on Graphs and Spectral Graph Theory: Perron-Frobenius (PageRank), Fiedler (Algebraic Connectivity), Cheeger Inequality (Spectral bi-partition), Lumpability (Spectral Clustering) and Transition Path Theory (Semi-supervised Learning) [ slides ] [Reference]: Amy N. Langville and Carl D. Meyer's book: Google's PageRank and Beyond Jim Demmel's courseweb at UC Berkeley for Fiedler Theory and Graph Bipartition: [ link ] T. Buehler, M. Hein. Spectral Clustering based on the graph p-Laplacian. Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 81-88. James R. Lee, Shayan Oveis Gharan, Luca Trevisan. Multi-way spectral partitioning and higher-order Cheeger inequalities. Proceeding STOC'12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing, Pages 1117-1130. arXiv:1111.1055. Weinan E, Jianfeng Lu, and Yuan Yao. The Landscape of Complex Networks: Critical Nodes and A Hierarchical Decomposition. Methods and Applications of Analysis, special issue in honor of Professor Stanley Osher on his 70th birthday, 20(4):383-404, 2013. [Homework 7]: Homework 7 [pdf]. Just for fun, no grading.	Y.Y.
04/03/2023, Mon	Lecture 09: Introduction to Topological Data Analysis. [ slides ] [Reference]: Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways. Yuan Yao, Jian Sun, Xuhui Huang, Gregory Bowman, Gurjeet Singh, Michael Lesnick, Vijay Pande, Leonidas Guibas and Gunnar Carlsson. J. Chem. Phys. 130, 144115 (2009). [pdf][Online Publication][SimTK Link: Data and Mapper Matlab Codes] [Selected by Virtual Journal of Biological Physics Research, 04/15/2009]. Structural insight into RNA hairpin folding intermediates. Bowman, Gregory R., Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas Guibas and Vijay Pande. Journal of American Chemistry Society, 2008, 130 (30): 9676-9678. [link] Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Abbas H Rizvi, Pablo G Camara, Elena K Kandror, Thomas J Roberts, Ira Schieren, Tom Maniatis & Raul Rabadan. Nature Biotechnology. 2017 May. doi:10.1038/nbt.3854 Spatiotemporal genomic architecture informs precision oncology in glioblastoma. Lee JK, Wang J, Sa JK, Ladewig E, Lee HO, Lee IH, Kang HJ, Rosenbloom DS, Camara PG, Liu Z, van Nieuwenhuizen P, Jung SW, Choi SW, Kim J, Chen A, Kim KT, Shin S, Seo YJ, Oh JM, Shin YJ, Park CK, Kong DS, Seol HJ, Blumberg A, Lee JI, Iavarone A, Park WY, Rabadan R, Nam DH. Nat Genet. 2017 Apr. doi: 10.1038/ng.3806. A Python Implementation of Mapper [ sakmapper ] in single cell data analysis. Single Cell TDA [ scTDA ] with [ tutorial in html ] A Java package for persistent homology and barcodes: Javaplex Tutorial. Persistent Homology Analysis of Biomolecular Data Guo-Wei Wei. SIAM News 2017 Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination. Pablo G. Camara, Daniel I.S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan. Cell Systems. 2016 June. doi: 10.1016/j.cels.2016.05.008. Topology of viral evolution. Chan JM, Carlsson G, Rabadan R. Proc Natl Acad Sci USA 2013 Oct 29. doi: 10.1073/pnas.1313480110. Robert Ghrist's monograph on applied Topology Elementary Applied Topology [Homework 8]: Homework 8 [pdf]. Just for fun, no grading.	Y.Y.
04/16/2023, Mon	Lecture 10: Hodge Theory and Applications: Social Choice, Crowdsourced Ranking, and Game Theory [ slides ] [ Reference ]: Statistical Ranking and Combinatorial Hodge Theory. Xiaoye Jiang, Lek-Heng Lim, Yuan Yao and Yinyu Ye. Mathematical Programming, Volume 127, Number 1, Pages 203-244, 2011. [pdf][ arxiv.org/abs/0811.1067][ Matlab Codes] Flows and Decompositions of Games: Harmonic and Potential Games Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A. Parrilo Mathematics of Operations Research, 36(3): 474 - 503, 2011 [arXiv.org/abs/1005.2405][ doi:10.1287/moor.1110.0500 ] HodgeRank on Random Graphs for Subjective Video Quality Assessment. Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao. IEEE Transactions on Multimedia, 14(3):844-857, 2012 [pdf][ Matlab codes in zip ] Robust Evaluation for Quality of Experience in Crowdsourcing. Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao ACM Multimedia 2013. [pdf] Online HodgeRank on Random Graphs for Crowdsourceable QoE Evaluation. Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao IEEE Transactions on Multimedia, 16(2):373-386, Feb. 2014. [pdf] Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs Braxton Osting, Jiechao Xiong, Qianqian Xu, and Yuan Yao Applied and Computational Harmonic Analysis, 41 (2): 540-560, 2016 [ arXiv:1503.00164 ] [ ACHA online ] [Matlab codes to reproduce our results] False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao Proceedings of The 33rd International Conference on Machine Learning (ICML), New York, June 19-24, 2016. [ arXiv:1605.05860 ] [ pdf ] [ supplementary ] Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao ACM Multimedia Conference (ACMMM), Amsterdam, Netherlands, October 15-19, 2016. [ arXiv:1607.03401 ] [ pdf ] HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation Qianqian Xu, Jiechao Xiong, Xi Chen, Qingming Huang, Yuan Yao The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA, February 27, 2018. [ arXiv:1711.05957 ] [ Matlab Source Codes ] From Social to Individuals: a Parsimonious Path of Multi-level Models for Crowdsourced Preference Aggregation Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan Yao IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4):844-856, 2019. Extended from MM'16 in [ arXiv:1607.03401 ]. [ arXiv:1804.11177 ] [ doi: 10.1109/TPAMI.2018.2817205 ][ GitHub source] Evaluating Visual Properties via Robust HodgeRank Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang and Yuan Yao International Journal of Computer Vision, 129: 1732-1753, 2021. [ arXiv:1408.3467 ] [ DOI: 10.1007/s11263-021-01438-y ] Professor Don Saari: [ UCI homepage ] [ Book Info: Disposing Dictators, Demstifying Voting Paradoxes ] [ Amazon link ] Quantum algorithms for topological and geometric analysis of data Lloyd, Seth and Garnerone, Silvano and Zanardi, Paolo Nature Communications, 7(1): 10138, 2016. [ DOI: 10.1038/ncomms10138 ] [Homework 9]: Homework 9 [pdf]. Just for fun, no grading. [Peer Review] Peer Review Guide of Project 1.	Y.Y.
04/24/2023, Mon	Lecture 11: Seminars [ link ] [ Title ]: Optimal Transport: Theory and Applications [ slides ][ link ] [ Speaker ]: Nian SI, University of Chicago [ Abstract ]: Optimal transport has gained increasing attention in recent years due to the modeling power and computational tractability. In this talk, I will first study the duality of optimal transport for discrete probability measures and extend to continuous probability measures. Then, I will talk about the optimization to solve optimal transport problems via the Sinkhorn methods. I will also study the statistical properties of optimal transport: the curse of dimensionality. I will present two ideas to beat the curse of dimensionality: projection and smoothing. Finally, I will discuss two applications: Wasserstein GANs and distributionally robust optimization. [ Bio ]: Nian Si recently joined the University of Chicago, Booth School of Business as a postdoctoral principal researcher and he will join HKUST IEDA as an assistant professor in 2024. He finished PhD in Operations Research in the Department of Management Science and Engineering (MS&E) at Stanford University. He received a B.A. degree in Economics and a B.S. degree in Mathematics from Peking University in 2017. His research lies at the interface of applied probability, simulation, and machine learning and he is also interested in real-world problems arising from online platforms. [Title]: Robust Statistical Learning and Generative Adversarial Networks [ slides ] [ Abstract ]: Robust learning under Huber's contamination model has become an important topic in statistics and theoretical computer science. Statistically optimal procedures such as Tukey's median and other estimators based on depth functions are impractical because of their computational intractability. In this talk, we present an intriguing connection between f-GANs and various depth functions through the lens of f-Learning. Similar to the derivation of f-GANs, we show that these depth functions that lead to statistically optimal robust estimators can all be viewed as variational lower bounds of the total variation distance in the framework of f-Learning. This connection opens the door of computing robust estimators using tools developed for training GANs. In particular, we show in both theory and experiments that some appropriate structures of discriminator networks with hidden layers in GANs lead to statistically optimal robust location estimators for both Gaussian distribution and general elliptical distributions where first moment may not exist. Some applications are discussed on robust denoising of Cryo-EM images. [Reference]: Chao Gao, Jiyi Liu, Yuan Yao, Weizhi Zhu, Robust Estimate and Generative Adversarial Networks, ICLR 2019. [arXiv:1810.02030] Chao Gao, Yuan Yao, Weizhi Zhu, Generative Adversarial Nets for Robust Scatter Estimation: A Proper Scoring Rule Perspective. Journal of Machine Learning Research, 21(160):1-48, 2020. [ arXiv:1903.01944 ] Hanlin Gu, Yin Xian, Ilona Christy Unarta, Yuan Yao, Generative Adversarial Networks for Robust Cryo-EM Image Denoising. Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging, by Ke Chen (Editor), Carola-Bibiane Sch\"{o}nlieb (Editor), Xue-Cheng Tai (Editor), Laurent Younes (Editor), Springer, 2022. [ arXiv:2008.07307 ] [ DOI:10.1007/978-3-030-03009-4_126-1 ] [Python]: Robust_GAN.ipynb : Jupyter Notebook for demonstration Robust-GAN-Center : robust center (mean) estimate via GANs Robust-GAN-Scatter : robust scatter (covariance) estimate via GANs Rubust Denoising GAN in Cryo-EM [ Title ]: A Tutorial on Single-Cell Topological Data Analysis. [ slides ] [ Speaker ]: Dr. MU, Quanhua, HKUST [ Abstract ] In this tutorial, we first give an introduction to single cell RNA sequencing. Then we introduced scTDA, a python toolbox for topological data analysis (Mapper) in single cell data. A case study is illustrated using human embryo development at single-cell resolution.	Y.Y.
05/08/2023, Monday	Lecture 12: Seminar and Final Project. [ project2.pdf ] [ Title ]: Mathematical Optimization in Machine Learning/Decision Making [ slides ] [ Speaker ]: Prof. YE Yinyu, K.T. Li Chair Professor of Engineering, Stanford University [ Venue ]: Kaisa Group Lecture Theater (IAS LT), HKUST IAS [ Abstract ]: The speaker will present a few recent mathematical optimization real world case studies in Data Science and Machine-Learning. He will show how newly developed optimization models and numerical algorithms/solvers can be effectively used to achieve learning and decision-making efficiency and optimality in many fields, including Policy Training in Reinforcement Learning, Wireless Ad-Hoc Sensor Network Localization/Tracking, Service Covering and Partitioning, Energy Management and Optimal PEV Charging/Discharging, Dynamic Unit-Commitment (to balance demand and supply) for Power-Grid, and High-Speed Railway Scheduling. [ Bio ] Prof. YE Yinyu is currently the K.T. Li Professor of Engineering at Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering, Stanford University. He received the BS degree in System Engineering from the Huazhong University of Science and Technology, China, and the MS and PhD. degrees in Engineering-Economic Systems and Operations Research from Stanford University. Prof. Ye's current research interests include Continuous and Discrete Optimization, Data Science and Applications, Computational Algorithm Design and Analyses, Algorithmic Game/Market Equilibrium; and he was one of the pioneers of Interior-Point Methods, Conic Linear Programming, Distributionally Robust Optimization, Online Linear Programming, Algorithm Analyses for Reinforcement Learning and Markov Decision Process, and etc. Prof. Ye has received several academic awards including: the inaugural 2006 Farkas Prize on Optimization, the 2009 IBM Faculty Award, the 2009 John van Neumann Theory Prize for fundamental sustained contributions to theory in Operations Research and the Management Sciences, the inaugural 2012 ISMP Tseng Lectureship Prize for outstanding contribution to continuous optimization (every three years), the winner of the 2014 SIAM Optimization Prize awarded (every three years), the 2015 SPS Signal Processing Magazine Best Paper Award, etc.. He has supervised numerous doctoral students at Stanford who received various prizes such as INFORMS Nicholson Prize, Student Paper Competition, the INFORMS Computing Society Prize, the INFORMS Optimization Prize for Young Researchers. According to Google Scholar, his publications have been cited 52,000 times. [Final Project Report Repository] Description of Final Project: [ project2.pdf ] GitHub Repository for reports of Final Project [ GitHub ] 1. DUNDA, Gerry Windiarto Mohamad. Clustering Analysis on Les Mis閞ables Social Network. [ report (pdf) ] [ slides (pptx) ] [ source (ipynb) ] [ video ] 2. Qichao LI, Haohan HUANG, Qiqi ZHOU, Jin WU. Exploration of Manifold Learning methods for face dataset [ report (pdf) ] [ source (ipynb) ] [ video ] 3. Yingxue XU, Jiaxin ZHUANG, Fengtao ZHOU. Order the faces by Manifold Learning for Face dataset. [ report (pdf) ] [ slides (pptx) ] [ source (py) ] [ video ] 4. YANG, Shunpeng. Sequential Face Ordering With Manifold Learning Algorithms [ report (pdf) ] [ poster (pdf) ] [ slides (pptx) ] [ source (ipynb) ] 5. ZHONG, Guangzheng. Dimensional reduction methods for face dataset. [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ video ] 6. LI Haobo, CHEN Zixin, SHENG Rui, TENG Fei. . MATH 5473 Project 2: NIPS accepted papers. [ report (pdf) ] [ slides (pptx) ] [ Interactive Visualization link ] [ source (github) ] [ video ] 7. HUANG Zhanmiao, LUO Yuanhui, XIA Wencan. The Application of PageRank and HITS in Chinese and UK University Ranking [ report (pdf) ] [ source (github) ] [ video ] 8. Hangyu LIN and Chen LIU. HodgeRank-based Analysis of Crowdsourced Data [ report (pdf) ] [ slides (pptx) ] [ source (github) ] [ video ] 9. ZHONG, Ziyu. Robust Outlier Detection with GAN [ report (pdf) ] [ source (ipynb) ] [ video ] 10. CAI Bibi, QIU Zhenyu, and WANG Zhiwei. Recommender System Based on Matrix Factorization with Incorporation of Movie Genre Information [ report (pdf) ] [ slides (pptx) ] [ source (github) ] [ video (youtube)] 11. CHAU, Yu Hei. Investigating semi-supervised methods for enhancing text transformers with dimension reduction techniques [ report (pdf) ] [ slides ] [ source (github) ] [ video ] 12. CUI, Yiran. Clustering (ab-)normal pressure signals of pipelines [ report (pdf) ] [ slides (pptx) ] [ source (google drive) ] [ video (10m) ] [ video (30m) ] 13. Tianshu JIANG, Xiao MENG. Topological Data Analysis on 3-Dimensional Patella Shape as New Biomarker for Knee Arthroplasty Type Classification. [ report (pdf) ] [ source (github) ] [ video ] 14. Yanming LAI, Bokai YAN, Ningyu YAN. The statistic explanation of the human prefrontal cortex development data [ report (pdf) ] [ slides (pdf) ] [ source (ipynb) ] [ video ] 15. Ruochen MA, Jihong TANG, Yuyan RUAN, Zhi HUANG. Developmental Trajectory Analysis of Single Cells in the Human Embryonic Prefrontal Cortex [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ video ] 16. Lue SHEN, Lei SUN, Zhenhang SHANG. Topological Methods for Visualization and Analysis of Human Prefrontal Cortex Development Data [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ video ] 17. Li Yakun, Liu Yonglin, Hao Yifan, Hu Yueying. EXPLORE FRED-QD: A QUARTERLY DATABASE FOR MACROECONOMIC RESEARCH [ report (pdf) ] [ source (github) ] [ video ] 18. Chris HC Nguyen, James M. Shihua. Latent Space Topology of Several Machine Learning Algorithms. [ report (pdf) ] [ slides (pdf) ] [ video ] 19. Ruizhe XIA, Fa ZHANG. Node Clustering with Traditional and Neural Network Approaches [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ video ]	Y.Y.

Datasets (to-be-updated)

[Animal Sleep Data] Animal species sleeping hours vs. other features

[Anzhen Heart Data] Heart Operation Effect Prediction, provided by Dr. Jinwen Wang, Anzhen Hospital

[Beer Data] 877 beers dataset, provided by Mr. Richard Sun, Shanghai

[Crime Data] Crime rates in 59 US cities during 1970-1992

[Real-Time-Bidding Algorithm Competition Data] Contest Website

[红楼梦人物事件矩阵] a 376-by-475 matrix (374-by-475 updated by WAN, Mengting) for character-event appearance in A Dream of Red Mansion (Xueqin Cao) [ Dataset in Github ] [374 Characters dream.RData (for R load)] [dream.Rd (for R manual)] [HongLouMeng374.txt] [HongLouMeng376.csv] [.mat] [readme.m]

[西游记] characters-scene occurance matrices for 100 chapters [ Dataset in GitHub ] [data in RData] [data in matlab (302-by-408 matrix)]

chap001-005	chap006-009	chap010-013	chap014-017	chap018-021	chap022-025
chap026-029	chap030-033	chap034-037	chap038-041	chap042-045	chap046-049
chap050-053	chap054-057	chap058-061	chap062-065	chap066-069	chap070-073
chap074-077	chap078-081	chap082-085	chap086-088	chap089-091	chap092-094
chap095-097	chap098-100	All in TXT	readData.m

[Keywords Pricing] Keywords and profit index in paid search advertising, by Hansheng Wang (Guanghua, PKU). [sample file] [readme.txt] [data in csv]

[Radon Data] Radon measurements of 12,687 houses in US

[Wells Data] Switch unsafe wells for arsenic pollution in Bangladesh

to-be-done...

by YAO, Yuan.