A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data

Edin Salkovic*, Halima Bensmail

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

The Negative Binomial distribution (NBD) is used for modeling many types of count data, including gene expression counts obtained by RNA sequencing technologies (RNA-Seq). Finding outliers in this type of data has been shown in recent research to help in identifying rare genetic disorders in humans. Existing Bayesian approaches to detecting outliers in data following the NBD are either computationally inefficient or too general and hence do not leverage the NBD's specificities in an optimal way. We present a novel Bayesian outlier score for data following the NBD, relying on recent advances in the inference of its dispersion parameter through a special method of Gibbs sampling. The novel Bayesian model on which our score is based - OutPyRX (Outlier detection in Python for RNA-Seq, eXtended version) - improves the model of its predecessor OutPyR by introducing novel parameters that are derived from OutPyR's. These novel parameters allow more than 6 times faster convergence of the novel outlier score compared to OutPyR's while having a negligible computational impact on the Gibbs sampling procedure. We show that, in terms of area under precision-recall curve (AUC) values, the novel score outcompetes existing scores on 21 out of 24 datasets that we derived from 4 real datasets by injecting artificial outliers. However, OutPyRX does not perform confounder control which is required for some datasets containing biological outliers. The model is general and can be applied to other similar count data. The code for our model is available at https://github.com/esalkovic/outpyrx.

Original languageEnglish
Article number9437176
Pages (from-to)75789-75800
Number of pages12
JournalIEEE Access
Volume9
DOIs
Publication statusPublished - 2021

Keywords

  • Bayesian model
  • Bayesian outlier score
  • Gibbs sampling
  • Mendelian disorder
  • RNA-Seq
  • gene expression
  • negative binomial distribution
  • outlier detection
  • rare disease

Fingerprint

Dive into the research topics of 'A Novel Bayesian Outlier Score Based on the Negative Binomial Distribution for Detecting Aberrantly Expressed Genes in RNA-Seq Gene Expression Count Data'. Together they form a unique fingerprint.

Cite this