Recently, RNA sequencing (RNA-Seq) gene expression (GE) count data has started being used to help diagnose rare genetic diseases in patients, complementing whole genome sequencing (WGS) which is unable to capture tissue-level genetic information that is of key importance in detecting genetic disorders that are tissue specific. As producing RNA-Seq GE data is costly, while at the same time it contains significant statistical noise and distortion, researchers have applied machine learning (ML) technologies to help analyze it. Existing ML solutions require a large number of samples, are designed in an ad-hoc manner or do not provide proper statistical significance testing.
We present several advancements that we have contributed in this field. 1. OutPyR: a Bayesian model for identifying abnormal RNA-Seq gene expression counts in datasets, particularly those with a small number of samples, based on recent advancements in estimating the dispersion parameter of the negative binomial distribution (NBD); 2. OutPyRX: a novel model that extends OutPyR with novel derived parameters and a novel outlier score, while improving computational complexity and performance; 3. DEzy: a novel non-parametric model based on normalizing the data through Kernel Density Estimation using the Fast Fourier Transform and automatic bandwidth selection, thus having significantly better time complexity than OutPyRX, while approaching its performance; 4. SViDdly: a novel approach to confounder control in RNA-Seq GE count data using Singular Value Decomposition; and 5. AEZy and 6. AEZier: two novel approaches to confounder control RNA-Seq GE count data using two different AE models.
| Date of Award | 2021 |
|---|
| Original language | American English |
|---|
| Awarding Institution | - HBKU College of Science and Engineering
|
|---|
- Bayesian modeling
- machine learning
- outlier detection
- RNA-Seq
Advancing RNA-Seq Gene Expression Outlier Detection to Improve Aberrant Gene Discovery
Salkovic, E. (Author). 2021
Student thesis: Doctoral Dissertation