CSIC 5011: Topological and Geometric Data Reduction and Visualization

HKUST

CSIC 5011: Topological and Geometric Data Reduction and Visualization
Spring 2021

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics, statistics, and engineering, who are interested in learning from data. Students with other backgrounds such as life sciences are also welcome, provided you have certain maturity of mathematics. It will cover wide topics in geometric (principal component analysis and manifold learning, etc.) and topological data reduction (clustering and computational homology group, etc.).
Prerequisite: linear and abstract algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains), convex optimization; familiarity with Matlab, R, and/or Python, etc.

Reference (参考教材)

[pdf download]

Topological Data Analysis for Genomics and Evolution: Topology in Biology. By Raul Rabadan and Andrew J. Blumberg [ Amazon ]

Instructors:

Yuan YAO

Time and Place:

Friday 10:30am-1:20pm, Zoom [ link ]

Homework and Projects:

Weekly homeworks (no grader, but I'll read your submissions and give bonus credits), mini-projects, and a final major project. No final exam.
Email: datascience.hw (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date	Topic	Instructor	Scriber
02/05/2021, Fri	Lecture 01: Syllabus, Principal Component Analysis, and Multidimensional Scaling [ syllabus ] [ PCA-MDS in keynote slides ] [Reference]: To view .jpynb files below, you may try [ Jupyter NBViewer] PCA in iPython Notebook [ pca.ipynb ] [ pca.py ] MDS in Python [ scikit-learn MDS] PCA with Logistic regression for digit classification: [ pca_logistic.ipynb ] [ pca_logistic.py ] [Homework 1]: Homework 1 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits.	Y.Y.
02/19/2021, Fri	Lecture 02: Horn's Parallel Analysis and Random Matrix Theory for PCA (Chap 3: 3) [ slides ] [Reference]: Horn's Parallel Analysis in R: [ paran.R ] Parallel Analysis in Matlab: [ papca.m ] Parallel Analysis in Python by LI, Zhen: [ paPCA_curve.py ] [ paPCA_image.py ] Marcenko-Pastur Law of Wishart matrices in Matlab: [ mp.m ] S&P500 dataset in class: [ snp500.Rda ] [ snp452-data.mat ] [ snp500.txt ] [Johnstone06] High dimensional statistical inference and random matrices, ICM2006. Florent Benaych-Georges and Raj Rao Nadakuditi (2009) The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. [Parallel Analysis: Horn (1965) original paper] [Parallel Analysis: Buja-Eyuboglu (1992) with random permutation] [Raul Rabadan (2018)]: applications of RMT in single cell data analysis [Homework 2]: Homework 2 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.	LI, Zhen
02/26/2021, Fri	Lecture 03: High Dimensional Sample Mean: Inadmissibility of MLE and James-Stein Estimators (Chap 3: 1-2) [ slides ] [Reference]: Comparing Maximum Likelihood Estimator and James-Stein Estimator in R: [ JSE.R ] [Homework 3]: Homework 3 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.
03/05/2021, Fri	Lecture 04: Random Projections, Johnson-Lindenstrauss Lemma, and Applications in Compressed Sensing etc. (Chap 2) [ Lecture04.pdf ] [Reference]: Joseph Salmon's lecture on Johnson-Lindenstrauss Theory [ JLlemma.pdf ] Random Projections in Scikit-learn: [ link ] Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp. Living on the edge: Phase transitions in convex programs with random data. [ arXiv:1303.6672 ] [Homework 4]: Homework 4 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.
03/12/2021, Fri	Lecture 05: Robust PCA, Sparse PCA, and Graph Realization (MDS) with Uncertainty -- SDP relaxations [ slides ] [Reference]: You need Matlab CVX optimization toolbox to run the following demo codes. Robust PCA demo: [ testRPCA.m ] Robust PCA via ADMM by Stephen Boyd et al.: [ link ] Robust PCA via ADMM in Python [ weblink ] Sparse PCA demo: [ testSPCA.m ] Sparse PCA in Python [ sklearn ] Sensor Network Localization in Matlab: an old package collected by Kirill Konovalov from previous CSIC5011 (2017) [ SNLSDP-0 ] Sensor Network Localization with Facial Reduction in Matlab: [ SNLSDPclique ] Teng ZHANG's Tyler's M-estimator [ Matlab: tyler_m_estimator.m ] Duembgen, Nordhausen and Schuhmacher (2016): R package for M-scatter estimates [ R package fastM ] [Homework 5]: Homework 5 [pdf]. Just for fun, no grading; but I'll read and give bonus credits if you submitted.	Y.Y.
03/19/2021, Friday	Lecture 06: Seminars and Mini-Project 1 [ project1.pdf ] [ Title ]: Accelerated Outlier Detection in Low-Rank and Structured Data: Robust PCA and Extensions [ slides ] [ Speaker ]: HanQin CAI, University of California at Los Angeles [ Abstract ]: We study robust PCA for the fully observed setting, which is about separating a low rank matrix L and a sparse matrix S from their sum D=L+S. In this talk, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing non-convex algorithms. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations confirm the advantage of our algorithm over other state-of-the-art algorithms for robust PCA. Furthermore, we extend our method to the low-rank Hankel matrix, with its application to the spectrally sparse signals. [ Bio ]: HanQin Cai is an Assistant Adjunct Professor in the Department of Math at UCLA. He earned his Ph.D. degree from the University of Iowa in 2018, under guidance of Professor Jian-Feng Cai and Prof. Weiyu Xu. His research interests lie at image processing, data analysis, optimization, and machine learning. Currently, he is focusing the projects of outlier detection and zero-order optimization. [ Title ]: Online robust matrix factorization for dependent data streams [ slides ] [ Speaker ]: Hanbaek Lyu, University of California at Los Angeles [ Abstract ]: Online Robust Matrix Factorization (ORMF) algorithms seek to learn a reduced number of latent features as well as outliers from streaming data sets. It is important to understand stability of online algorithms for dependent data streams since these are often generated by Markov chain Monte Carlo (MCMC) algorithms, but rigorous convergence analysis of most online algorithms were limited to independently obtained data samples. In this talk, we propose an algorithm for ORMF and prove its almost sure convergence to the set of critical points of the expected loss function, even when the data matrices are functions of some underlying Markov chain satisfying a mild mixing condition. We illustrate our results through dictionary learning and outlier detection problem for images and networks. [ Bio ]: Hanbaek Lyu is a Hedrick Assistant Professor in the Department of Math at UCLA. He earned his Ph.D. degree from the Ohio State University in 2018, under guidance of Professor David Sivakoff. His research interests lie at probability, combinatorics, complex systems, and machine learning. Recently, he is focusing the projects on online optimization algorithms and dictionary learning problems on networks. [Project 1 Report Repository] GitHub Repository for reports of Project 1 [ GitHub ] 01. ZHA, Mengyue and HUANG, Chutian. NIPS Paper Explained. [ poster (pdf) ] [ report (pdf) ] [ Review (txt) ] 02. MA, Chiyu and WU, Yuqia. Character Analysis of Dream of the Red Chamber: A Dimension Reduction Perspective. [ report (pdf) ] [ report revision (pdf) ] [ source (zip) ] [ review (txt) ] [ rebuttal (pdf) ] 03. MARADESA ADELEKE and OGEDENGBE IKEOLUWA IREOLUWA. PRINCIPAL COMPONENT ANALYSIS OF ANIMAL SLEEPING AND HEART DISEASE DATASET. [ report (pdf) ] [ report (docx) ] [ source (.py) ] [ Review (txt) ] 04. CHEN Yingshu, Wing Hei SUM, Ho Pan IP and Zipeng WU. Story Exploration of Mammal Data. [ report (pdf) ] [ source (github) ] [ review (txt) ] [ rebuttal (txt) ] 05. DU Qing and HE Jiahao. Portfolio Comparison with PCA. [ report ] [ Review (txt) ] 06. DONG Hanze, Li Donghao, and WU Jiamin. Robust PCA on video surveillance. [ poster (pdf) ] [ source (zip) ] [ review (txt) ] [ rebuttal (txt) ] 07. Amartansh Dubey, Samruddhi Deshmukh and Nilesh Kumar Jha. Tuning Regularization Parameter for Solving ill-posed Inverse Problems using Subspace Methods. [ poster (pptx) ] [ source (zip) ] [ review (txt) ] [ rebuttal (txt) ] 08. FAN Junyi and XIAN Zhuozhi. Explore SNPs Data. [ poster (pdf) ] [ source (ipynb) ] [ review (txt) ] [ rebuttal (pdf) ] 09. YU Changlong and FANG Tianqing. Identifying authors using word/phrase embedding of NIPS paper dataset. [ poster (pptx) ] [ source (py) ] [ review (txt) ] [ rebuttal (txt) ] 10. Jack Chun Kit KOT, Sarah Suet Ling CHOW, Bernice bingxin HUANG. Visualization and Dimensionality Reduction of US Crime Data. [ poster (pdf) ] [ source (zip) ] [ Review (txt) ] 11. Fu Xiaowen and Li Jiayi. Character Relationship Analysis in Journey to the West by Principle Analysis and Multidimensional Scaling. [ report (pdf) ] [ source (zip) ] [ Review (txt) ] 12. Wayne Chi Wai NG. Principal component analysis (PCA) on SNPs datasets with different single nucleotide polymorphism (SNPs) assignments. [ report (pdf) ] [ source (zip) ] [ Review (txt) ] 13. NG Yui Hong. MNIST Hand Written Digit Data Visualization. [ poster (pptx) ] [ source (py) ] [ review (txt) ] [ rebuttal (txt) ] 14. Zhiyuan YU, Xueyang QUAN, Haoyi CHENG. Robust PCA and Its Video Applications. [ poster (pdf) ] [ Review (txt) ] 15. Zhihan ZHU, Dong SONG, Jiabao LI and Zongchao MO. Tracking Human Migration History via PCA on SNPs data. [ poster (pdf) ] [ source (zip) ] [ Review (txt) ]	Y.Y.
03/26/2021, Friday	Lecture 07: Manifold Learning I: ISOMAP and LLE (with Modified LLE, LTSA) [ slides ] [Reference]: [ISOMAP]: Tenenbaum's website on science paper with datasets; [LLE]: Roweis' website on science paper; Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. [ NIPS 2006 ] Zhang, Z. & Zha, H. (2005) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing. 26 (1): 313-338. [doi:10.1137/s1064827502419154] [Python]: plot_mani_digits.ipynb : demo of digits in class plot_compare_methods.ipynb : demo of Swiss roll in class scikit-learn manifold module [Matlab]: IsomapR1 : isomap codes by Tennenbaum, de Silva (isomapII.m with sparsity, fast mex with dijkstra.cpp and fibheap.h lle.m : lle with k-nearest neighbors kcenter.m : k-center algorithm to find 'landmarks' in a metric space	Y.Y.
04/09/2021, Fri	Lecture 08: Manifold Learning II: Hessian LLE, Laplacian Eigenmap, Diffusion Map, and Stochastic Neighbor Embedding [ slides ] [Reference]: Mikhail Belkin & Partha Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing Systems (NIPS) 14, 2001, p. 586-691, MIT Press [nips link] Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003). [doi: 10.1073/pnas.1031596100] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102 (21):7426-7431, 2005 [doi: 10.1073/pnas.0500334102] Nadler, Boaz; Stéphane Lafon; Ronald R. Coifman; Ioannis G. Kevrekidis (2005). "Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker–Planck Operators" (PDF) in Advances in Neural Information Processing Systems (NIPS) 18, 2005. Coifman, R.R.; S. Lafon. (2006). "Diffusion maps". Applied and Computational Harmonic Analysis. 21: 5–30. 10.1016/j.acha.2006.04.006. Stochastic Neighbor Embedding [ .pdf ] Visualizing Data using t-SNE [ .pdf ] A paper that relates SNE to Laplacian Eigenmaps [ .pdf ] A helpful website: How to use t-SNE effectively? [ link ] [Matlab] Matlab code to compare manifold learning algorithms [ mani.m ] : PCA, MDS, ISOMAP, LLE, Hessian LLE, LTSA, Laplacian, Diffusion (no SNE!) [Python]: plot_compare_methods.ipynb : demo in class plot_mani_digits.ipynb : demo of digits in class scikit-learn manifold LLE : PCA/MDS, ISOMAP, LLE/MLLE, Hessian, LTSA, Laplacian (Spectral), t-SNE (no Diffusion map) Laurens van der Maaten's website for t-SNE codes [Peer Review] Review Guide of Project 1. [Homework 6]: Homework 6 [pdf]. Just for fun, no grading.	Y.Y.
04/16/2021, Fri	Lecture 09: Random Walk on Graphs and Spectral Graph Theory: Perron-Frobenius (PageRank), Fiedler (Algebraic Connectivity), Cheeger Inequality (Spectral bi-partition), Lumpability (Spectral Clustering) and Transition Path Theory (Semi-supervised Learning) [ slides ] [Reference]: Amy N. Langville and Carl D. Meyer's book: Google's PageRank and Beyond Jim Demmel's courseweb at UC Berkeley for Fiedler Theory and Graph Bipartition: [ link ] T. Buehler, M. Hein. Spectral Clustering based on the graph p-Laplacian. Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 81-88. James R. Lee, Shayan Oveis Gharan, Luca Trevisan. Multi-way spectral partitioning and higher-order Cheeger inequalities. Proceeding STOC'12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing, Pages 1117-1130. arXiv:1111.1055. Weinan E, Jianfeng Lu, and Yuan Yao. The Landscape of Complex Networks: Critical Nodes and A Hierarchical Decomposition. Methods and Applications of Analysis, special issue in honor of Professor Stanley Osher on his 70th birthday, 20(4):383-404, 2013. [Author Rebuttal] Rebuttal Guide of Project 1. [Homework 7]: Homework 7 [pdf]. Just for fun, no grading.	Y.Y.
04/23/2021, Fri	Lecture 10: Introduction to Topological Data Analysis. [ slides ] [Reference]: Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways. Yuan Yao, Jian Sun, Xuhui Huang, Gregory Bowman, Gurjeet Singh, Michael Lesnick, Vijay Pande, Leonidas Guibas and Gunnar Carlsson. J. Chem. Phys. 130, 144115 (2009). [pdf][Online Publication][SimTK Link: Data and Mapper Matlab Codes] [Selected by Virtual Journal of Biological Physics Research, 04/15/2009]. Structural insight into RNA hairpin folding intermediates. Bowman, Gregory R., Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas Guibas and Vijay Pande. Journal of American Chemistry Society, 2008, 130 (30): 9676-9678. [link] Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Abbas H Rizvi, Pablo G Camara, Elena K Kandror, Thomas J Roberts, Ira Schieren, Tom Maniatis & Raul Rabadan. Nature Biotechnology. 2017 May. doi:10.1038/nbt.3854 Spatiotemporal genomic architecture informs precision oncology in glioblastoma. Lee JK, Wang J, Sa JK, Ladewig E, Lee HO, Lee IH, Kang HJ, Rosenbloom DS, Camara PG, Liu Z, van Nieuwenhuizen P, Jung SW, Choi SW, Kim J, Chen A, Kim KT, Shin S, Seo YJ, Oh JM, Shin YJ, Park CK, Kong DS, Seol HJ, Blumberg A, Lee JI, Iavarone A, Park WY, Rabadan R, Nam DH. Nat Genet. 2017 Apr. doi: 10.1038/ng.3806. A Python Implementation of Mapper [ sakmapper ] in single cell data analysis. Single Cell TDA [ scTDA ] with [ tutorial in html ] A Java package for persistent homology and barcodes: Javaplex Tutorial. Persistent Homology Analysis of Biomolecular Data Guo-Wei Wei. SIAM News 2017 Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination. Pablo G. Camara, Daniel I.S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan. Cell Systems. 2016 June. doi: 10.1016/j.cels.2016.05.008. Topology of viral evolution. Chan JM, Carlsson G, Rabadan R. Proc Natl Acad Sci USA 2013 Oct 29. doi: 10.1073/pnas.1313480110. Robert Ghrist's monograph on applied Topology Elementary Applied Topology [Homework 8]: Homework 8 [pdf]. Just for fun, no grading.	Y.Y.
04/30/2021, Fri	Lecture 11: Hodge Theory and Applications: Social Choice, Crowdsourced Ranking, and Game Theory [ slides ] [ Reference ]: Statistical Ranking and Combinatorial Hodge Theory. Xiaoye Jiang, Lek-Heng Lim, Yuan Yao and Yinyu Ye. Mathematical Programming, Volume 127, Number 1, Pages 203-244, 2011. [pdf][ arxiv.org/abs/0811.1067][ Matlab Codes] Flows and Decompositions of Games: Harmonic and Potential Games Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A. Parrilo Mathematics of Operations Research, 36(3): 474 - 503, 2011 [arXiv.org/abs/1005.2405][ doi:10.1287/moor.1110.0500 ] HodgeRank on Random Graphs for Subjective Video Quality Assessment. Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao. IEEE Transactions on Multimedia, 14(3):844-857, 2012 [pdf][ Matlab codes in zip ] Robust Evaluation for Quality of Experience in Crowdsourcing. Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao ACM Multimedia 2013. [pdf] Online HodgeRank on Random Graphs for Crowdsourceable QoE Evaluation. Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao IEEE Transactions on Multimedia, 16(2):373-386, Feb. 2014. [pdf] Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs Braxton Osting, Jiechao Xiong, Qianqian Xu, and Yuan Yao Applied and Computational Harmonic Analysis, 41 (2): 540-560, 2016 [ arXiv:1503.00164 ] [ ACHA online ] [Matlab codes to reproduce our results] False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao Proceedings of The 33rd International Conference on Machine Learning (ICML), New York, June 19-24, 2016. [ arXiv:1605.05860 ] [ pdf ] [ supplementary ] Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao ACM Multimedia Conference (ACMMM), Amsterdam, Netherlands, October 15-19, 2016. [ arXiv:1607.03401 ] [ pdf ] HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation Qianqian Xu, Jiechao Xiong, Xi Chen, Qingming Huang, Yuan Yao The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018. [ arXiv:1711.05957 ] [ Matlab Source Codes ] From Social to Individuals: a Parsimonious Path of Multi-level Models for Crowdsourced Preference Aggregation Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan Yao IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4):844-856, 2019. Extended from MM'16 in [ arXiv:1607.03401 ]. [ arXiv:1804.11177 ] [ doi: 10.1109/TPAMI.2018.2817205 ][ GitHub source] Professor Don Saari: [ UCI homepage ] [ Book Info: Disposing Dictators, Demstifying Voting Paradoxes ] [ Amazon link ] [Homework 9]: Homework 9 [pdf]. Just for fun, no grading.	Y.Y.
05/07/2021, Fri	Lecture 12: Seminar and Final Project. [ project2.pdf ] [ Title ]: Clustering via Uncoupled REgression (CURE) [ slides ] [ Speaker ]: Prof. WANG, Kaizheng, Columbia University [ Abstract ] In this talk, we first consider a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives. Paper: [ arXiv:2003.09960 ] [ Bio ] Kaizheng Wang got his PhD Degree in Operations Research and Financial Engineering at Princeton University and joined Columbia University as Assistant Professor in the fall of 2020. His research interests lie at the intersection of statistics, machine learning and optimization, with special focus on development and analysis of efficient algorithms for unsupervised learning. [ Title ]: A Tutorial on Single-Cell Topological Data Analysis. [ slides ] [ Speaker ]: Dr. MU, Quanhua, HKUST [ Abstract ] In this tutorial, we first give an introduction to single cell RNA sequencing. Then we introduced scTDA, a python toolbox for topological data analysis (Mapper) in single cell data. A case study is illustrated using human embryo development at single-cell resolution. [Project 2 Report Repository] GitHub Repository for reports of Project 2 [ GitHub ] 01. HUANG, Chutian and ZHA, Mengyue. PageRank and Primary Eigenvectors. [ poster (pdf) ] [ source (ipynb) ] [ video (bilibili) ] 02. MA, Chiyu and WU, Yuqia. Ranking of Network Data and its Application to Chinese Mainland University Data. [ report (pdf) ] [ slides (pdf) ] [ source (zip) ] [ video (bilibili) ] 03. MARADESA ADELEKE and OGEDENGBE IKEOLUWA IREOLUWA. PRINCIPAL COMPONENT ANALYSIS AND QSAR MODELING OF SELECTED LIGAND AGAINST PROSTATE CANCER PROTEIN TARGET. [ report (pdf) ] [ slides (pptx) ] [ source (.py) ] [ video (hkust) ] 04. CHEN Yingshu, Wing Hei SUM, Ho Pan IP and Zipeng WU. Imaging’s Potential to Assist in COVID-19 Crisis. [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ video (youtube) ] 05. DU Qing and HE Jiahao. Analysis of Karate Club Network. [ report (pdf) ] [ source (ipynb) ] [ video (zoom) ] 06. DONG Hanze, Li Donghao, and WU Jiamin. Visualization Comparison on Fashion MNIST. [ poster (pdf) ] [ source (zip) ] [ slides (pptx) ] [ video (youtube) ] 07. Amartansh Dubey, Samruddhi Deshmukh and Nilesh Kumar Jha. Utilizing Random Matrix Theory and Subspace Methods for Solving Ill-posed Inverse Problems. [ report (pdf) ] [ slides (pptx) ] [ source (zip) ] [ video (youtube) ] 08. FAN Junyi and XIAN Zhuozhi. Fashion-MNIST Classification based on Manifold Learning. [ poster (pdf) ] [ source (ipynb) ] [ slides (pdf) ] [ video (zoom) ] 09. YU Changlong and FANG Tianqing. Predicting Antiviral Drugs over COVID-19 Knowledge Graph. [ report (pdf) ] [ slides (pptx) ] [ source (zip) ] [ video (youtube) ] 10. Jack Chun Kit KOT, Sarah Suet Ling CHOW, Bernice bingxin HUANG. Mouse Kidney Glomerulus Classification. [ slides (pptx) ] [ report (pdf) ] [ source (link) ] [ video (youtube) ] 11. Fu Xiaowen and Li Jiayi. Manifold Learning on MNIST Dataset. [ report (pdf) ] [ video (zoom) ] 12. Wayne Chi Wai NG. Principal component analysis (PCA) on single cell gene expression (scRNAseq) Human Prefrontal Cortex Development Data. [ report (pdf) ] [ source (R in pdf) ] [ slides (pptx) ] [ video (youtube) ] 13. NG Yui Hong. Manifold Learning Comparison for Face Ordering. [ poster (pdf) ] [ source (ipynb) ] [ slides (pdf) ] [ video (youtube) ] 14. Haoyi CHENG, Xueyang QUAN, Zhiyuan YU . DIMENSIONALITY REDUCTION OF FACE ORDER PROBLEM USING NONLINEAR EMBEDDING METHODS. [ report (pdf) ] [ slides (pdf) ] [ source (github) ] [ video (bilibili) ] 15. Zhihan ZHU, Dong SONG, Jiabao LI and Zongchao MO. Uncover Key Genes Regulating Human Embryonic Prefrontal Cortex Development by Single Cell Sequencing. [ poster (pdf) ] [ source (zip) ] [ slides (pdf) ] [ video (youtube) ]	Y.Y.

Datasets (to-be-updated)

[Animal Sleep Data] Animal species sleeping hours vs. other features

[Anzhen Heart Data] Heart Operation Effect Prediction, provided by Dr. Jinwen Wang, Anzhen Hospital

[Beer Data] 877 beers dataset, provided by Mr. Richard Sun, Shanghai

[Crime Data] Crime rates in 59 US cities during 1970-1992

[Real-Time-Bidding Algorithm Competition Data] Contest Website

[红楼梦人物事件矩阵] a 376-by-475 matrix (374-by-475 updated by WAN, Mengting) for character-event appearance in A Dream of Red Mansion (Xueqin Cao) [ Dataset in Github ] [374 Characters dream.RData (for R load)] [dream.Rd (for R manual)] [HongLouMeng374.txt] [HongLouMeng376.csv] [.mat] [readme.m]

[西游记] characters-scene occurance matrices for 100 chapters [ Dataset in GitHub ] [data in RData] [data in matlab (302-by-408 matrix)]

chap001-005	chap006-009	chap010-013	chap014-017	chap018-021	chap022-025
chap026-029	chap030-033	chap034-037	chap038-041	chap042-045	chap046-049
chap050-053	chap054-057	chap058-061	chap062-065	chap066-069	chap070-073
chap074-077	chap078-081	chap082-085	chap086-088	chap089-091	chap092-094
chap095-097	chap098-100	All in TXT	readData.m

[Keywords Pricing] Keywords and profit index in paid search advertising, by Hansheng Wang (Guanghua, PKU). [sample file] [readme.txt] [data in csv]

[Radon Data] Radon measurements of 12,687 houses in US

[Wells Data] Switch unsafe wells for arsenic pollution in Bangladesh

to-be-done...

by YAO, Yuan.