Abstract
In this paper, a novel non-parametric clustering algorithm which is based on the concept of divide-and-merge is proposed. The proposed algorithm is based on two primary phases, after data cleaning: (i) the Division phase and (ii) the Merging phase. In the initial phase of division, the data is divided into an optimized number of small sub-clusters utilizing all the dimensions of the data. In the second phase of merging, the small sub-clusters obtained as a result of division are merged according to an advanced statistical metric to form the actual clusters in the data. The proposed algorithm has the following mer-its: (i) ability to discover both convex and non-convex shaped clusters, (ii) ability to discover clusters different in densities, (iii) ability to detect and remove outliers/noise in the data (iv) easily tunable or fixed hyperparameters (v) and its usability for high dimensional data. The proposed algorithm is exten-sively tested on 20 benchmark datasets including both, the synthetic and the real datasets and is found better/competing to the existing state-of-the-art parametric and non-parametric clustering algorithms. (c) 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
| Original language | English |
|---|---|
| Article number | 108305 |
| Number of pages | 18 |
| Journal | Pattern Recognition |
| Volume | 122 |
| DOIs | |
| Publication status | Published - Feb 2022 |
Keywords
- Clustering
- Data projection
- Joint probability density estimation
- Non-parametric techniques