Skip to main navigation Skip to search Skip to main content

Centroid Initialization Method for t-Distributed Stochastic Neighbour Embedding (t-SNE)

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper tackles the challenge of initializing points in t-Distributed Stochastic Neighbor Embedding (t-SNE), a widely used non-linear dimensionality reduction (DR) method. Standard initialization methods, such as randomization and Principal Component Analysis (PCA), often lead to suboptimal embeddings, particularly with complex or high-dimensional data. To address these limitations, two novel techniques are proposed: centroid t-SNE (ct-SNE) and weighted centroid t-SNE (wct-SNE). Both methods leverage data clustering for initialization. By over-clustering high-dimensional data and applying t-SNE to the centroids, ct-SNE improves the initialization process, resulting in more accurate and flexible low-dimensional embeddings, which enhance K-Nearest Neighbors (KNN) classification accuracy. wct-SNE extends this approach by introducing a weighted Kullback-Leibler (WKL) divergence that accounts for cluster sizes, assigning greater importance to larger clusters. This modification promotes faster convergence and reduces initialization variability. Comprehensive evaluations against a range of DR techniques, including t-SNE, PCA, NMF, Laplacian Eigenmaps, and UMAP, demonstrate the superior visualization quality, accuracy, and computational efficiency of ct-SNE and wct-SNE.

Original languageEnglish
Title of host publication2024 7th International Conference On Machine Learning And Natural Language Processing, Mlnlp 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9798350354973
ISBN (Print)979-8-3503-5498-0
DOIs
Publication statusPublished - 20 Dec 2024
Event7th International Conference on Machine Learning and Natural Language Processing, MLNLP 2024 - Chengdu, China
Duration: 18 Oct 202420 Oct 2024

Publication series

NameMLNLP 2024 - 2024 International Conference on Machine Learning and Natural Language Processing

Conference

Conference7th International Conference on Machine Learning and Natural Language Processing, MLNLP 2024
Country/TerritoryChina
CityChengdu
Period18/10/2420/10/24

Keywords

  • Centroid of data
  • Dimensionality reduction
  • Initialization of t-SNE
  • Optimization
  • Visualization

Fingerprint

Dive into the research topics of 'Centroid Initialization Method for t-Distributed Stochastic Neighbour Embedding (t-SNE)'. Together they form a unique fingerprint.

Cite this