TY - JOUR
T1 - ClassiMap
T2 - A New Dimension Reduction Technique for Exploratory Data Analysis of Labeled Data
AU - Lespinats, Sylvain
AU - Aupetit, Michaël
AU - Meyer-Baese, Anke
N1 - Publisher Copyright:
© 2015 World Scientific Publishing Company.
PY - 2015/9/14
Y1 - 2015/9/14
N2 - Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.
AB - Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.
KW - Multidimensional scaling
KW - dimensionality reduction
KW - distance preservation
KW - exploratory data analysis
KW - labeled data
KW - mapping evaluation
UR - https://www.scopus.com/pages/publications/84939258526
U2 - 10.1142/S0218001415510088
DO - 10.1142/S0218001415510088
M3 - Article
AN - SCOPUS:84939258526
SN - 0218-0014
VL - 29
JO - International Journal of Pattern Recognition and Artificial Intelligence
JF - International Journal of Pattern Recognition and Artificial Intelligence
IS - 6
M1 - 1551008
ER -