TY - GEN
T1 - Centroid Initialization Method for t-Distributed Stochastic Neighbour Embedding (t-SNE)
AU - Nassar, Sara Husam
AU - Belhaouari, Samir
AU - Allaoui, Mebarka
AU - Kherfi, Mohammed Lamine
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/12/20
Y1 - 2024/12/20
N2 - This paper tackles the challenge of initializing points in t-Distributed Stochastic Neighbor Embedding (t-SNE), a widely used non-linear dimensionality reduction (DR) method. Standard initialization methods, such as randomization and Principal Component Analysis (PCA), often lead to suboptimal embeddings, particularly with complex or high-dimensional data. To address these limitations, two novel techniques are proposed: centroid t-SNE (ct-SNE) and weighted centroid t-SNE (wct-SNE). Both methods leverage data clustering for initialization. By over-clustering high-dimensional data and applying t-SNE to the centroids, ct-SNE improves the initialization process, resulting in more accurate and flexible low-dimensional embeddings, which enhance K-Nearest Neighbors (KNN) classification accuracy. wct-SNE extends this approach by introducing a weighted Kullback-Leibler (WKL) divergence that accounts for cluster sizes, assigning greater importance to larger clusters. This modification promotes faster convergence and reduces initialization variability. Comprehensive evaluations against a range of DR techniques, including t-SNE, PCA, NMF, Laplacian Eigenmaps, and UMAP, demonstrate the superior visualization quality, accuracy, and computational efficiency of ct-SNE and wct-SNE.
AB - This paper tackles the challenge of initializing points in t-Distributed Stochastic Neighbor Embedding (t-SNE), a widely used non-linear dimensionality reduction (DR) method. Standard initialization methods, such as randomization and Principal Component Analysis (PCA), often lead to suboptimal embeddings, particularly with complex or high-dimensional data. To address these limitations, two novel techniques are proposed: centroid t-SNE (ct-SNE) and weighted centroid t-SNE (wct-SNE). Both methods leverage data clustering for initialization. By over-clustering high-dimensional data and applying t-SNE to the centroids, ct-SNE improves the initialization process, resulting in more accurate and flexible low-dimensional embeddings, which enhance K-Nearest Neighbors (KNN) classification accuracy. wct-SNE extends this approach by introducing a weighted Kullback-Leibler (WKL) divergence that accounts for cluster sizes, assigning greater importance to larger clusters. This modification promotes faster convergence and reduces initialization variability. Comprehensive evaluations against a range of DR techniques, including t-SNE, PCA, NMF, Laplacian Eigenmaps, and UMAP, demonstrate the superior visualization quality, accuracy, and computational efficiency of ct-SNE and wct-SNE.
KW - centroid of data
KW - dimensionality reduction
KW - Initialization of t-SNE
KW - optimization
KW - visualization
UR - https://www.scopus.com/pages/publications/85215569873
U2 - 10.1109/MLNLP63328.2024.10800182
DO - 10.1109/MLNLP63328.2024.10800182
M3 - Conference contribution
AN - SCOPUS:85215569873
T3 - MLNLP 2024 - 2024 International Conference on Machine Learning and Natural Language Processing
BT - MLNLP 2024 - 2024 International Conference on Machine Learning and Natural Language Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Machine Learning and Natural Language Processing, MLNLP 2024
Y2 - 18 October 2024 through 20 October 2024
ER -