Abstract
This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically single-cell RNA sequencing (scRNASeq). By leveraging p-adic-valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic-valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1400 observations shows that p-ClustVal improves performance in 91% of cases and boosts the performance of classical and state-of-the-art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal’s efficacy. The study concludes with insights into the limitations of p-ClustVal and future research directions.
| Original language | English |
|---|---|
| Pages (from-to) | 4051-4066 |
| Number of pages | 16 |
| Journal | International Journal of Data Science and Analytics |
| Volume | 20 |
| Issue number | 4 |
| Early online date | Jan 2025 |
| DOIs | |
| Publication status | Published - Oct 2025 |
Keywords
- Clustering high-dimensional data
- Data-centric AI
- Single-cell RNA sequencing
- Unsupervised learning
- p-Adic numbers
Fingerprint
Dive into the research topics of 'p-clustval: a novel p-adic approach for enhanced clustering of high-dimensional single-cell RNASeq data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver