We explore the application of a Deep Autoencoding Gaussian Mixture Model for identifying aberrant genes in RNA sequencing data, aiming to advance the understanding of Mendelian disease through genomics. By implementing energy-based anomaly detection and clustering techniques, we evaluate the model's performance across various configurations, including different latent space dimensions and the integration of different features. Our research findings suggest that model configuration has a significant influence on the detection of anomalies.
Employing two distinct approaches revealed notable insights. The first approach, focusing on energy-based detection, underscored the model's capability to identify anomalies through extreme energy values, with the known anomalous gene MGST1 detected within the highest energy percentiles (95%). Similarly, the second approach, which deployed clustering-based detection, showed an enhanced capacity for isolating anomaly genes. Specifically, a trial employing a more refined model configuration demonstrated the model's precision in clustering three known anomalies within a significantly smaller subset (5%) of the data, suggesting a robust framework for anomaly detection in genomics.
These observations highlight the potential of Deep Autoencoding Gaussian Mixture Models in genomic anomaly detection, while also pointing to the complexities of model configuration and the necessity for further research to optimize detection accuracy. Our work contributes valuable insights into the application of machine learning techniques for genetic analysis, marking a significant step towards utilizing the power of computational methods in unraveling the complexities of genetic diseases.
| Date of Award | 2024 |
|---|
| Original language | American English |
|---|
| Awarding Institution | - HBKU College of Science and Engineering
|
|---|
ADAPTIVE APPROACHES TO ANOMALY DETECTION IN RNA-SEQ DATA USING DEEP AUTOENCODING GAUSSIAN MIXTURE MODELS
Amin, R. (Author). 2024
Student thesis: Master's Dissertation