The ClinVar dataset is a huge dataset that depends on the degree of pathogenicity. The T-Distributed Neighbor Embedding (T-SNE) methodology is one of the most powerful methods for clustering this data due to its inherent high dimensionality and a large number of samples. It is one of the most popular visualization methods that employ dimensionality reduction other than principal component analysis. However, because every multivariate data point in large dimensions requires paired calculation, this technique is intrinsically sluggish. To shorten t-SNe running time and improve visualization quality, the GPU was the best alternative.
In this thesis, we used the Graphics Processing Unit (GPU) to apply the t-SNE algorithm on the ClinVar data in order to speed up the calculation while maintaining good classification performance. We employed a number of procedures to accomplish both high-performance classification and reduced computing time. Several classifiers were used, and the Shapley additive explanations scores were fed to t-SNE to generate the classifier's performance on the data. After each phase, the pathogenicity grouping improved dramatically, according to the comparison.
| Date of Award | 2021 |
|---|
| Original language | American English |
|---|
| Awarding Institution | - HBKU College of Science and Engineering
|
|---|
SPEEDING UP T-SNE WITH GPU TO DISSECT GENETIC VARIANTS PATHOGENICITY CLASSIFICATION USING SEVERAL MACHINE LEARNING ALGORITHMS
Abujazar, M. (Author). 2021
Student thesis: Master's Dissertation