CSIC5011-Math5473: Topological and Geometric Data Reduction and Visualization

HKUST

CSIC5011-Math5473: Topological and Geometric Data Reduction and Visualization
Spring 2024

Course Information

Synopsis (摘要)

This course is open to graduates and senior undergraduates in applied mathematics, statistics, and engineering, who are interested in learning from data. Students with other backgrounds such as life sciences are also welcome, provided you have certain maturity of mathematics. It will cover wide topics in geometric (principal component analysis and manifold learning, etc.) and topological data reduction (clustering and computational homology group, etc.).
Prerequisite: linear and abstract algebra, basic probability and multivariate statistics, basic stochastic process (Markov chains), convex optimization; familiarity with Matlab, R, and/or Python, etc.

Reference (参考教材)

[pdf download]

Topological Data Analysis for Genomics and Evolution: Topology in Biology. By Raul Rabadan and Andrew J. Blumberg [ Amazon ]

Instructors:

Yuan YAO

Time and Place:

Wednesday 15:00am-17:50pm, Rm 1032, LSK Bldg

Homework and Projects:

Weekly homeworks (no grader, but I'll read your submissions and give bonus credits), mini-projects, and a final major project. No final exam.
Email: datascience.hw (add "AT gmail DOT com" afterwards)

Schedule (时间表)

Date	Topic	Instructor	Scriber
01/31/2024, Wed	Lecture 01: Syllabus, Principal Component Analysis, and Multidimensional Scaling [ Class outline ] [ PCA-MDS slides ] [Reference]: To view .jpynb files below, you may try [ Jupyter NBViewer] PCA in iPython Notebook [ pca.ipynb ] [ pca.py ] MDS in Python [ scikit-learn MDS] PCA with Logistic regression for digit classification: [ pca_logistic.ipynb ] [ pca_logistic.py ] [Homework 1]: Homework 1 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits.	Y.Y.
02/07/2024, Wed	Lecture 02: Horn's Parallel Analysis and Random Matrix Theory for PCA, Sufficient Dimensionality Reduction and Supervised PCA (Chap 2: 3-5) [ slides ] [ Sufficient Dimensionality Reduction and Supervised PCA ] [Reference]: Horn's Parallel Analysis in R: [ paran.R ] Parallel Analysis in Matlab: [ papca.m ] Parallel Analysis in Python by LI, Zhen: [ paPCA_curve.py ] [ paPCA_image.py ] Marcenko-Pastur Law of Wishart matrices in Matlab: [ mp.m ] S&P500 dataset in class: [ snp500.Rda ] [ snp452-data.mat ] [ snp500.txt ] [Johnstone06] High dimensional statistical inference and random matrices, ICM2006. [KN08 for multi-rank signal] S. Kritchman and B. Nadler, Determining the number of components in a factor model from limited noisy data, Chemometrics and Intelligent Laboratory Systems 94(1):19-32, 2008. [ NB10: multi-rank subspace ] R. R. Nadakuditi and F. Benaych-Georges, The breakdown point of signal sub- space estimation, IEEE Sensor Array and Multichannel Signal Processing Workshop (2010), Jerusalem, Israel, 2010, pp. 177-180, doi: 10.1109/SAM.2010.5606726. Florent Benaych-Georges and Raj Rao Nadakuditi (2009) The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. [Parallel Analysis: Horn (1965) original paper] [Parallel Analysis: Buja-Eyuboglu (1992) with random permutation] [Raul Rabadan (2018)]: applications of RMT in single cell data analysis [Homework 2]: Homework 2 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.	LI, Zhen
02/14/2024, Wed	Rescheduled to later class. Happy Lunar New Year!
02/21/2024, Wed	Rescheduled to later for COVID'19.
02/28/2024, Wed	Lecture 03: High Dimensional Sample Mean: Inadmissibility of MLE and James-Stein Estimators (Chap 2: 1-2) [ slides ] [Reference]: Comparing Maximum Likelihood Estimator and James-Stein Estimator in R: [ JSE.R ] James-Stein estimator via multi-task ridge regression: Yaqi Duan, Kaizheng Wang (2022) Adaptive and Robust Multi-task Learning, [ arXiv:2202.05250 ] [Homework 3]: Homework 3 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.
03/06/2024, Wed	Lecture 04: Random Projections, Johnson-Lindenstrauss Lemma, and Applications in Compressed Sensing etc. (Chap 3) [ slides ] [Reference]: Joseph Salmon's lecture on Johnson-Lindenstrauss Theory [ JLlemma.pdf ] Random Projections in Scikit-learn: [ link ] Dennis Amelunxen, Martin Lotz, Michael B. McCoy, Joel A. Tropp. Living on the edge: Phase transitions in convex programs with random data. [ arXiv:1303.6672 ] [Homework 4]: Homework 4 [pdf]. Just for fun, no grading; but I'll read your submissions and give your bonus credits	Y.Y.
03/13/2024, Wed	Lecture 05: Robust PCA, Sparse PCA, and Graph Realization (MDS) with Uncertainty -- SDP relaxations [ slides ] [Reference]: You need Matlab CVX optimization toolbox to run the following demo codes. Robust PCA demo: [ testRPCA.m ] Robust PCA via ADMM by Stephen Boyd et al.: [ link ] Robust PCA via ADMM in Python [ weblink ] Sparse PCA demo: [ testSPCA.m ] Sparse PCA in Python [ sklearn ] Sensor Network Localization in Matlab: an old package collected by Kirill Konovalov from previous CSIC5011 (2017) [ SNLSDP-0 ] Sensor Network Localization with Facial Reduction in Matlab: [ SNLSDPclique ] Teng ZHANG's Tyler's M-estimator [ Matlab: tyler_m_estimator.m ] Duembgen, Nordhausen and Schuhmacher (2016): R package for M-scatter estimates [ R package fastM ] [Homework 5]: Homework 5 [pdf]. Just for fun, no grading; but I'll read and give bonus credits if you submitted.	Y.Y.
03/20/2024, Wed	Lecture 06: Introduction to Manifold Learning: ISOMAP and LLE (Modified LLE, Hessian LLE, and LTSA) [ slides ] [Reference]: [ISOMAP]: Tenenbaum's website on science paper with datasets; [LLE]: Roweis' website on science paper; Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003). [doi: 10.1073/pnas.1031596100] Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. [ NIPS 2006 ] Zhang, Z. & Zha, H. (2005) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing. 26 (1): 313-338. [doi:10.1137/s1064827502419154] [Python]: plot_mani_digits.ipynb : demo of digits in class plot_compare_methods.ipynb : demo of Swiss roll in class scikit-learn manifold module [Matlab]: IsomapR1 : isomap codes by Tennenbaum, de Silva (isomapII.m with sparsity, fast mex with dijkstra.cpp and fibheap.h lle.m : lle with k-nearest neighbors kcenter.m : k-center algorithm to find 'landmarks' in a metric space	Y.Y.
03/27/2024, Wed	Lecture 07: Manifold Learning II: Laplacian Eigenmap, Diffusion Map, and Stochastic Neighbor Embedding [ slides ] and Final Project Kickoff [ project.pdf ] [Reference]: Mikhail Belkin, Partha Niyogi. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing Systems (NIPS) 14, 2001, p. 586-691, MIT Press [nips link] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102 (21):7426-7431, 2005 [doi: 10.1073/pnas.0500334102] Nadler, Boaz; Stephane Lafon; Ronald R. Coifman; Ioannis G. Kevrekidis (2005). "Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators" (PDF) in Advances in Neural Information Processing Systems (NIPS) 18, 2005. Coifman, R.R.; S. Lafon. (2006). "Diffusion maps". Applied and Computational Harmonic Analysis. 21: 5-30. 10.1016/j.acha.2006.04.006. Stochastic Neighbor Embedding [ .pdf ] Visualizing Data using t-SNE [ .pdf ] A paper that relates SNE to Laplacian Eigenmaps [ .pdf ] A helpful website: How to use t-SNE effectively? [ link ] [Matlab] Matlab code to compare manifold learning algorithms [ mani.m ] : PCA, MDS, ISOMAP, LLE, Hessian LLE, LTSA, Laplacian, Diffusion (no SNE!) [Python]: plot_compare_methods.ipynb : demo in class plot_mani_digits.ipynb : demo of digits in class scikit-learn manifold LLE : PCA/MDS, ISOMAP, LLE/MLLE, Hessian, LTSA, Laplacian (Spectral), t-SNE (no Diffusion map) Laurens van der Maaten's website for t-SNE codes [Homework 6]: Homework 6 [pdf]. Just for fun, no grading.	Y.Y.
04/10/2024, Mon	Lecture 08: Random Walk on Graphs and Spectral Graph Theory: Perron-Frobenius (PageRank), Fiedler (Algebraic Connectivity), Cheeger Inequality (Normalized Cut), Lumpability (Spectral Clustering) and Transition Path Theory (Semi-supervised Learning) [ slides ] [Reference]: Amy N. Langville and Carl D. Meyer's book: Google's PageRank and Beyond Jim Demmel's courseweb at UC Berkeley for Fiedler Theory and Graph Bipartition: [ link ] T. Buehler, M. Hein. Spectral Clustering based on the graph p-Laplacian. Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 81-88. James R. Lee, Shayan Oveis Gharan, Luca Trevisan. Multi-way spectral partitioning and higher-order Cheeger inequalities. Proceeding STOC'12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing, Pages 1117-1130. arXiv:1111.1055. Weinan E, Jianfeng Lu, and Yuan Yao. The Landscape of Complex Networks: Critical Nodes and A Hierarchical Decomposition. Methods and Applications of Analysis, special issue in honor of Professor Stanley Osher on his 70th birthday, 20(4):383-404, 2013. [Homework 7]: Homework 7 [pdf]. Just for fun, no grading.	Y.Y.
04/17/2024, Wed	Lecture 09: Introduction to Topological Data Analysis. [ slides ] [Reference]: Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways. Yuan Yao, Jian Sun, Xuhui Huang, Gregory Bowman, Gurjeet Singh, Michael Lesnick, Vijay Pande, Leonidas Guibas and Gunnar Carlsson. J. Chem. Phys. 130, 144115 (2009). [pdf][Online Publication][SimTK Link: Data and Mapper Matlab Codes] [Selected by Virtual Journal of Biological Physics Research, 04/15/2009]. Structural insight into RNA hairpin folding intermediates. Bowman, Gregory R., Xuhui Huang, Yuan Yao, Jian Sun, Gunnar Carlsson, Leonidas Guibas and Vijay Pande. Journal of American Chemistry Society, 2008, 130 (30): 9676-9678. [link] Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Abbas H Rizvi, Pablo G Camara, Elena K Kandror, Thomas J Roberts, Ira Schieren, Tom Maniatis & Raul Rabadan. Nature Biotechnology. 2017 May. doi:10.1038/nbt.3854 Spatiotemporal genomic architecture informs precision oncology in glioblastoma. Lee JK, Wang J, Sa JK, Ladewig E, Lee HO, Lee IH, Kang HJ, Rosenbloom DS, Camara PG, Liu Z, van Nieuwenhuizen P, Jung SW, Choi SW, Kim J, Chen A, Kim KT, Shin S, Seo YJ, Oh JM, Shin YJ, Park CK, Kong DS, Seol HJ, Blumberg A, Lee JI, Iavarone A, Park WY, Rabadan R, Nam DH. Nat Genet. 2017 Apr. doi: 10.1038/ng.3806. A Python Implementation of Mapper [ sakmapper ] in single cell data analysis. Single Cell TDA [ scTDA ] with [ tutorial in html ] A Java package for persistent homology and barcodes: Javaplex Tutorial. Persistent Homology Analysis of Biomolecular Data Guo-Wei Wei. SIAM News 2017 Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination. Pablo G. Camara, Daniel I.S. Rosenbloom, Kevin J. Emmett, Arnold J. Levine, Raul Rabadan. Cell Systems. 2016 June. doi: 10.1016/j.cels.2016.05.008. Topology of viral evolution. Chan JM, Carlsson G, Rabadan R. Proc Natl Acad Sci USA 2013 Oct 29. doi: 10.1073/pnas.1313480110. Robert Ghrist's monograph on applied Topology Elementary Applied Topology [Homework 8]: Homework 8 [pdf]. Just for fun, no grading.	Y.Y.
04/23/2024, Tue	Seminars: HKUST Red Bird Visiting Scholars Lecture Series [ Title ]: From Descriptive to Predictive: Universality and Language Models in Cancer Genomics [ Speaker ]: Raul RABADAN, Columbia University [ Time ]: April 23, 2024 (Tue) / 4:00 - 5:30 pm [ Venue ]: Kaisa Group Lecture Theater (IAS LT), Lo Ka Chung Building, Lee Shau Kee Campus, HKUST [ Abstract ]: Biology has traditionally been a descriptive science. Starting from the notion of universality in physics and mathematics, in this talk the speaker will explore a couple of approaches to transcend the descriptive nature of biology research, with a special focus on cancer genomics. Cancers are seeded by mutations, both inherited and acquired during a lifetime. While the speaker?s research group and many other groups have extensively studied the role of protein mutations, the functional impact of most mutations found in cancer remains unknown. Foundation models have demonstrated significant potential for integrating vast amounts of data and exerting considerable impact across various scientific domains. Initially, the speaker will introduce the Evolutionary and Structure (ES) score, which builds upon protein language models and AlphaFold. The ES score amalgamates evolutionary data with protein structure prediction to prioritize functional regions in proteins. This approach is empirically validated within the context of relapsed pediatric leukemias. Simultaneously, they present GET, a foundation model trained on chromatin accessibility data spanning 235 human cell types. GET excels at predicting gene expression in previously unseen cell types and identifies both universal and cell-specific transcription factor interaction networks. Noteworthy applications encompass the discovery of distant regulatory regions in fetal erythroblasts and the elucidation of the regulatory impact of germline coding mutations in lymphoma-associated transcription factors such as PAX5. Collectively, these computational methodologies exemplify how foundation models can facilitate the study of biological data, yielding functionally relevant insights. [ Bio ]: Prof. Raul RABADAN is a Gerald and Janet Carrus Professor in the Departments of Systems Biology, Biomedical Informatics and Surgery at Columbia University. He is currently the Director of the Program for Mathematical Genomics at Columbia University and previously the Director of the NCI Center for Topology of Cancer Evolution and Heterogeneity at Columbia University (2015-2021). From 2001 to 2003, he was a Fellow at the Theoretical Physics Division at CERN, the European Organization for Nuclear Research, in Geneva, Switzerland. In 2003 he joined the Physics Group of the School of Natural Sciences at the Institute for Advanced Study in Princeton. In 2005, he became a Martin A. and Helen Chooljian Member at The Simons Center for Systems Biology at the Institute for Advanced Study in Princeton. Prof. Rabadan?s current interest focuses on uncovering patterns of evolution in biological systems through the lens of genomics. His recent interests include the development of mathematical approaches to uncover the evolution of cancer and infectious diseases, including topological data analysis and Random Matrix Theory, among others. Prof. Rabadan has been named one of Popular Science's Brilliant 10 (2010), a Stewart Trust Fellow (2013), and he received the Harold and Golden Lamport Award at Columbia University (2014) and the Diz Pintado award (2018). He received the 2021 Outstanding Investigator Award by the National Cancer Institute. He is also a member of the Cancer Convergence Team by Stand Up to Cancer.	Y.Y.
04/24/2024, Wed	Lecture 10: Hodge Theory and Applications: Social Choice, Crowdsourced Ranking, and Game Theory [ slides ] [ Reference ]: Statistical Ranking and Combinatorial Hodge Theory. Xiaoye Jiang, Lek-Heng Lim, Yuan Yao and Yinyu Ye. Mathematical Programming, Volume 127, Number 1, Pages 203-244, 2011. [pdf][ arxiv.org/abs/0811.1067][ Matlab Codes] Flows and Decompositions of Games: Harmonic and Potential Games Ozan Candogan, Ishai Menache, Asuman Ozdaglar, and Pablo A. Parrilo Mathematics of Operations Research, 36(3): 474 - 503, 2011 [arXiv.org/abs/1005.2405][ doi:10.1287/moor.1110.0500 ] HodgeRank on Random Graphs for Subjective Video Quality Assessment. Qianqian Xu, Qingming Huang, Tingting Jiang, Bowei Yan, Weisi Lin, and Yuan Yao. IEEE Transactions on Multimedia, 14(3):844-857, 2012 [pdf][ Matlab codes in zip ] Robust Evaluation for Quality of Experience in Crowdsourcing. Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao ACM Multimedia 2013. [pdf] Online HodgeRank on Random Graphs for Crowdsourceable QoE Evaluation. Qianqian Xu, Jiechao Xiong, Qingming Huang, and Yuan Yao IEEE Transactions on Multimedia, 16(2):373-386, Feb. 2014. [pdf] Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs Braxton Osting, Jiechao Xiong, Qianqian Xu, and Yuan Yao Applied and Computational Harmonic Analysis, 41 (2): 540-560, 2016 [ arXiv:1503.00164 ] [ ACHA online ] [Matlab codes to reproduce our results] False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao Proceedings of The 33rd International Conference on Machine Learning (ICML), New York, June 19-24, 2016. [ arXiv:1605.05860 ] [ pdf ] [ supplementary ] Parsimonious Mixed-Effects HodgeRank for Crowdsourced Preference Aggregation Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao ACM Multimedia Conference (ACMMM), Amsterdam, Netherlands, October 15-19, 2016. [ arXiv:1607.03401 ] [ pdf ] HodgeRank with Information Maximization for Crowdsourced Pairwise Ranking Aggregation Qianqian Xu, Jiechao Xiong, Xi Chen, Qingming Huang, Yuan Yao The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA, February 27, 2018. [ arXiv:1711.05957 ] [ Matlab Source Codes ] From Social to Individuals: a Parsimonious Path of Multi-level Models for Crowdsourced Preference Aggregation Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan Yao IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4):844-856, 2019. Extended from MM'16 in [ arXiv:1607.03401 ]. [ arXiv:1804.11177 ] [ doi: 10.1109/TPAMI.2018.2817205 ][ GitHub source] Evaluating Visual Properties via Robust HodgeRank Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang and Yuan Yao International Journal of Computer Vision, 129: 1732-1753, 2021. [ arXiv:1408.3467 ] [ DOI: 10.1007/s11263-021-01438-y ] Professor Don Saari: [ UCI homepage ] [ Book Info: Disposing Dictators, Demstifying Voting Paradoxes ] [ Amazon link ] Quantum algorithms for topological and geometric analysis of data Lloyd, Seth and Garnerone, Silvano and Zanardi, Paolo Nature Communications, 7(1): 10138, 2016. [ DOI: 10.1038/ncomms10138 ] [Homework 9]: Homework 9 [pdf]. Just for fun, no grading.	Y.Y.
25/04/2024, Thu	Guest Lecture: An Invitation to Information Geometry [ Title ] An Invitation to Information Geometry [ Speaker ] Prof. Jun ZHANG, University of Michigan and SIMIS. [ Time and Venue ] 3-5pm, CYT LTL (CMA Lecture Theater) [ Abstract ] Information Geometry is the differential geometric study of the manifold of probability models, and promises to be a unifying geometric framework for investigating statistical inference, information theory, machine learning, etc. Central to such manifolds are divergence functions (in place of distance) for measuring proximity of two points, for instance Kullback-Leibler divergence, Bregman divergence, etc. Such divergence functions are known to induce a beautiful geometric structure of the set of parametric probability models. This talk will use two examples to introduce some basic ingredients of this geometric framework: the univariate normal distributions (a case with continuous support) and the probability simplex (a case with discrete support). The fundamental duality e/m duality is explained in terms of two most popular parametric statistical families: the exponential and the mixture families. This introduction is intended for an audience with little background in differentiable manifold; instead it only assumes the knowledge of multi-variable calculus.	Y.Y.
26/04/2024, Fri	Mathematics Colloquium. [ Title ] Information Geometry: Geometric Science of Information [ Speaker ] Prof. Jun ZHANG, University of Michigan, Ann Arbor and SIMIS. [ Time and Venue ] 3-4pm, Lecture Theater F (Lifts 25/26), with tea-time discussion 4-5pm at magic square [ Abstract ] Information geometry investigates parametric families of statistical model by representing probability density functions over a given sample space as points of a differentiable manifold M. Treating parameters as a local coordinate chart, M is endowed with a Riemannian metric g given by the Fisher-information (the well-known Fisher-Rao metric). However, in place of the Riemannian distance, information geometry uses a non-negative but non-symmetric divergence function (also called contrast function) for measuring proximity of two points, for instance Kullback-Leibler divergence, f-divergence, etc. Such divergence functions not only recovers the Fisher-Rao metric, but also a pair of dual connections with respect to the metric (equivalently Amari-Censov tensor). This talk will use two examples to introduce some basic ingredients of this geometric framework: the probability simplex (a case with discrete support) and the univariate normal distributions (a case with continuous support). In the former case, the application to the popular data-analytic method Compositional Data Analysis (CDA) is explained in terms of duality between exponential and mixture families. In the latter case, the construction of statistical mirror is briefly explained as an application of the concept of dual connections. This talk assumes some basic concepts of differentiable manifold (such as parallel transport and affine connection). [ Bio ] Jun Zhang is a Professor at the Shanghai Institute of Mathematics and Interdisciplinary Sciences (SIMIS) and one of its co-founders. He is currently on leave from the University of Michigan, Ann Arbor, where he has worked since 1992 as an Assistant, Associate, and Full Professor in the Department of Psychology, with adjunct appointments in the Department of Mathematics, Department of Statistics, and Michigan Institute of Data Sciences. He received his PhD in Neuroscience from the University of California, Berkeley in 1991. An elected fellow of Association for Psychological Sciences (APS) since 2012 and Psychonomic Society since 2016, Professor Jun Zhang's scholarly contributions have been in the various fields of computation neuroscience, cognition and behavior modeling, machine learning, statistical science, complex systems, etc, and is well known in the field of mathematical psychology. In recent years, his research has focused on the interdisciplinary subject of Information Geometry.	Y.Y.
27/04/2024, Sat	Guest Lecture: Information Beyond Shannon [ Title ] Information Beyond Shannon [ Speaker ] Prof. Jun ZHANG, University of Michigan and SIMIS. [ Time and Venue ] 3-5pm, Rm 2504 (Lift 25/26) [ Abstract ] Shannon's theory for source and channel coding (and the duality between capacity and rate-distortion) has been the hallmark for information science. Shannon entropy, and its associated exponential family of probability measures resulting from maximum entropy (MaxEnt) inference and the Kullback-Leibler divergence measuring the difference of any two probability densities, have found wide applications in statistical inference, machine learning, optimization, etc. Past research in Information Geometry has tied together the above concepts into a geometric structure called Hessian geometry, which is dually flat with biorthogonal coordinates. Given the deep mathematical understanding of Hessian geometry and its elegant picture, it is natural to ask whether it can be generalized (deformed, technically) to more broad settings that corresponds to generalize entropies and cross entropies (e.g., that is Tsallis and Renyi). This question has now been answered positively by a series of recent work on deformation theory. My talk will explain this recent development of information geometry, including the rho-tau deformation (which unifies the so-called phi-model and U-model known to information geometers) and the lambda-deformation theory (which unified Tsallis and Renyi deformation known to information theorists). This talk is intended for an audience with background in information theory and theoretical physics. (Joint work with Jan Naudts in the former case and with TKL Wong in the latter case).	Y.Y.
08/05/2024, Wed	Final Project Presentation. [Final Project Report Repository] Description of Final Project: [ project.pdf ] GitHub Repository for reports of Final Project [ GitHub ] CHAU Waiming. Dimension Reduction Techniques for MNIST Clustering. [ report (pdf) ] [ source (matlab) ] LIU Yunqin. Manifold Learning and Classification methods for MNIST dataset. [ report (pdf) ] [ source (.zip) ] Inam Ulhaqazad. MNIST data reduction and performance analysis. [ report (pdf) ] [ source (ipynb) ] CHEN Yuxuan. Application and Analysis of HodgeRank on Crowdsourced Datasets. [ report (pdf) ] [ slides (pptx) ] [ source (py) ] ECHAVARRIA GALINDO Martin. Topological Exploration of Human Prefrontal Cortex Development Single-Cell RNA-seq Data. [ report (pdf) ] LEE Jooran, WANG Minghao. Topological analysis of the Human Prefrontal cortex. [ report (pdf) ] [ slides (pdf) ] [ source (ipynb, R) ] LEE Young Kyu, KIM Jaehyeok, MA Jiabo and JIN Cheng. Exploration and Analysis of the Human Prefrontal Cortex Data. [ report (pdf) ] [ source (ipynb) ] * HAMDAOUI Najoua. Condition assessment of pressurized tubular structure using wave data. [ report (pdf) ] HU, Mingyun. Network node ranking methods: A case study with Chinese University Weblink. [ report (pdf) ] [ source (ipynb) ] Anastasiia KAZOVSKAIA, Songjie XIE and Meiying ZHANG. Exploring Time-Series Data: Visualization, Clustering, and Topological Analysis with an SNP500 Case Study. [ report (pdf) ] LI, Ang. Compressed Sensing for Structure Health Monitoring Data: Case in Tunnel Monitoring. [ report (pdf) ] [ slides (pptx) ] [ source (matlab) ] LI Mingchen, Shihong ZHANG, Zinan LIN, Lige ZHAO. Applying manifold learning for automatic sorting of facial datasets. [ report (pdf) ] [ slides (pptx) ] [ source (ipynb) ] [ presentation (bilibili) ] ZHANG Chen, BHATTACHARYA Saheli, and PENG Han. Ordering face dataset using mainfold learning. [ report (pdf) ] [ source (github) ] - XIONG, Kewei. Exploring Genetic Variations Among Samples by Dimensional Reduction. [ report (pdf) ] [ source (R) ]	Y.Y.

Datasets (to-be-updated)

[Animal Sleep Data] Animal species sleeping hours vs. other features

[Anzhen Heart Data] Heart Operation Effect Prediction, provided by Dr. Jinwen Wang, Anzhen Hospital

[Beer Data] 877 beers dataset, provided by Mr. Richard Sun, Shanghai

[Crime Data] Crime rates in 59 US cities during 1970-1992

[Real-Time-Bidding Algorithm Competition Data] Contest Website

[红楼梦人物事件矩阵] a 376-by-475 matrix (374-by-475 updated by WAN, Mengting) for character-event appearance in A Dream of Red Mansion (Xueqin Cao) [ Dataset in Github ] [374 Characters dream.RData (for R load)] [dream.Rd (for R manual)] [HongLouMeng374.txt] [HongLouMeng376.csv] [.mat] [readme.m]

[西游记] characters-scene occurance matrices for 100 chapters [ Dataset in GitHub ] [data in RData] [data in matlab (302-by-408 matrix)]

chap001-005	chap006-009	chap010-013	chap014-017	chap018-021	chap022-025
chap026-029	chap030-033	chap034-037	chap038-041	chap042-045	chap046-049
chap050-053	chap054-057	chap058-061	chap062-065	chap066-069	chap070-073
chap074-077	chap078-081	chap082-085	chap086-088	chap089-091	chap092-094
chap095-097	chap098-100	All in TXT	readData.m

[Keywords Pricing] Keywords and profit index in paid search advertising, by Hansheng Wang (Guanghua, PKU). [sample file] [readme.txt] [data in csv]

[Radon Data] Radon measurements of 12,687 houses in US

[Wells Data] Switch unsafe wells for arsenic pollution in Bangladesh

to-be-done...

by YAO, Yuan.