Centroid Initialization Method for t-Distributed Stochastic Neighbour Embedding (t-SNE)

Sara Husam Nassar, Samir Belhaouari, Mebarka Allaoui, Mohammed Lamine Kherfi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

This paper tackles the challenge of initializing points in t-Distributed Stochastic Neighbor Embedding (t-SNE), a widely used non-linear dimensionality reduction (DR) method. Standard initialization methods, such as randomization and Principal Component Analysis (PCA), often lead to suboptimal embeddings, particularly with complex or high-dimensional data. To address these limitations, two novel techniques are proposed: centroid t-SNE (ct-SNE) and weighted centroid t-SNE (wct-SNE). Both methods leverage data clustering for initialization. By over-clustering high-dimensional data and applying t-SNE to the centroids, ct-SNE improves the initialization process, resulting in more accurate and flexible low-dimensional embeddings, which enhance K-Nearest Neighbors (KNN) classification accuracy. wct-SNE extends this approach by introducing a weighted Kullback-Leibler (WKL) divergence that accounts for cluster sizes, assigning greater importance to larger clusters. This modification promotes faster convergence and reduces initialization variability. Comprehensive evaluations against a range of DR techniques, including t-SNE, PCA, NMF, Laplacian Eigenmaps, and UMAP, demonstrate the superior visualization quality, accuracy, and computational efficiency of ct-SNE and wct-SNE.

Original languageEnglish
Title of host publicationMLNLP 2024 - 2024 International Conference on Machine Learning and Natural Language Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350354973
DOIs
Publication statusPublished - 20 Dec 2024
Event7th International Conference on Machine Learning and Natural Language Processing, MLNLP 2024 - Chengdu, China
Duration: 18 Oct 202420 Oct 2024

Publication series

NameMLNLP 2024 - 2024 International Conference on Machine Learning and Natural Language Processing

Conference

Conference7th International Conference on Machine Learning and Natural Language Processing, MLNLP 2024
Country/TerritoryChina
CityChengdu
Period18/10/2420/10/24

Keywords

  • centroid of data
  • dimensionality reduction
  • Initialization of t-SNE
  • optimization
  • visualization

Fingerprint

Dive into the research topics of 'Centroid Initialization Method for t-Distributed Stochastic Neighbour Embedding (t-SNE)'. Together they form a unique fingerprint.

Cite this