Summary High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets. We demonstrated its utility for researchers to identify exciting patterns and trajectories within enormous datasets in biological sciences. We found that the algorithm parameters also play a crucial role and must be tuned carefully to utilize the algorithm’s potential fully. We also discussed key points to remember and directions for future extensions of Aligned-UMAP. Further, we made our code open source to enhance the reproducibility and applicability of our work. We believe our benchmarking study becomes more important as more and more high-dimensional longitudinal data in biomedical research become available. Graphical abstract Highlights • We explore the utility of Aligned-UMAP in longitudinal biomedical datasets • This overview offers insights on optimal uses for the technique • We make recommendations for best practices when using Aligned-UMAP The bigger picture Longitudinal multi-dimensional biological datasets are ubiquitous and highly abundant. These datasets are essential to understanding disease progression, identifying subtypes, and discovering drugs. Discovering meaningful patterns or disease pathophysiologies in these datasets is challenging due to their high dimensionality, making it difficult to visualize hidden patterns. In this work, we applied Aligned-UMAP on a broad spectrum of clinical, imaging, proteomics, and single-cell datasets. Aligned-UMAP reveals time-dependent hidden patterns when color coded with the metadata. Altogether, based on its ease of use and our evaluation of its performance on different modalities, we anticipate that Aligned-UMAP will be a valuable tool for the biomedical community. High-dimensional longitudinal data are prevalent yet understudied in biological literature. Discovering meaningful patterns from these datasets is an important task. Though few methods are available for visualizing high-dimensional longitudinal data, they are not studied extensively in real-world biological datasets. A recently developed nonlinear dimensionality reduction technique, Aligned-UMAP, analyzes sequential data. Here, we give an overview of applications of Aligned-UMAP on various biomedical datasets.
【저자키워드】 proteomics, machine learning, Genomics, Alzheimer's disease, unsupervised learning, Parkinson's Disease, Longitudinal data, Time-series, Clinical data, IPSC,