Geometric-k-means: a bound free approach to fast and eco-friendly k-means

  • Parichit Sharma*
  • , Marcin Malec
  • , Hasan Kurban
  • , Oguzhan Kulekci
  • , Mehmet Dalkilic
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces Geometric-k-means (or Gk-means for short), a novel approach that significantly enhances the efficiency and energy economy of the widely utilized k-means algorithm, which, despite its inception over five decades ago, remains a cornerstone in machine learning applications. The essence of Gk-means lies in its active utilization of geometric principles, specifically scalar projection, to significantly accelerate the algorithm without sacrificing solution quality. This geometric strategy enables a more discerning focus on data points that are most likely to influence cluster updates, which we call as high expressive data (HE). In contrast, low expressive data (LE), does not impact clustering outcome, is effectively bypassed, leading to considerable reductions in computational overhead. Experiments spanning synthetic, real-world and high-dimensional datasets, demonstrate Gk-means is significantly better than traditional and state of the art (SOTA) k-means variants in runtime and distance computations (DC). Moreover, Gk-means exhibits better resource efficiency, as evidenced by its reduced energy footprint, placing it as more sustainable alternative. The software code and data for our algorithm is available at https://github.com/parichit/Geometric-k-means.
Original languageEnglish
Article number30
Number of pages33
JournalMachine Learning
Volume115
Issue number2
DOIs
Publication statusPublished - 27 Jan 2026

Keywords

  • AI & sustainability
  • Big data
  • Data-centric AI
  • Fast k-means
  • Single-cell RNASeq

Fingerprint

Dive into the research topics of 'Geometric-k-means: a bound free approach to fast and eco-friendly k-means'. Together they form a unique fingerprint.

Cite this