Мақола

Next-Generation K-Means Clustering: Mojo-Driven Performance for Big Data

Touhidul SeyamDepartment of Computer Science and Engineering, Begum Gul Chemonara Trust University Bangladesh, Chattogram, BangladeshMd. HossainFaculty of Intelligent Systems and Computer Technologies, Samarkand State University, Samarkand, UzbekistanRajib GhoseMekhriddin NurmamatovFaculty of Artificial Intelligence and Information Systems, Samarkand State University, Samarkand, UzbekistanNazarov FayzulloFaculty of Artificial Intelligence and Information Systems, Samarkand State University, Samarkand, UzbekistanZarin HadikaAbhijit Pathak

International Journal of Intelligent Information Systemsjournal2025en

ABI

Аннотация

K-means clustering, a fundamental unsupervised machine learning technique, is widely used in anomaly detection, image recognition, and customer segmentation. Traditional Python implementations, especially those using NumPy, face performance challenges with large, high-dimensional datasets due to Python’s interpreted nature and dynamic typing. This paper introduces an innovative approach using the Mojo programming language, designed for AI development, to significantly improve the performance of the k-means clustering. Mojo combines Python’s usability with the performance of system programming languages by offering features like vectorization, parallelization, and strong typing. We compare a NumPy-based Python implementation with an optimized Mojo implementation, detailing the translation process and optimization techniques, including Mojo’s support for Single Instruction, Multiple Data (SIMD) operations, explicit memory management, and efficient data structures. These features significantly accelerate distance calculations crucial to the k-means algorithm. Benchmarks on synthetic datasets with varying sample sizes, feature counts, and cluster numbers demonstrate that the Mojo implementation consistently outperforms both the standard Python implementation and the highly optimized sci-kit-learn k-means, achieving speedups of 6x to 250x. These results highlight Mojo’s potential as a powerful tool for high-performance data analysis, particularly for computationally demanding algorithms like k-means clustering, and contribute to high-performance computing in machine learning. This research sets the stage for further exploration of Mojo’s applicability to other algorithms and hardware-specific optimizations for modern computing architectures.

Ҳали таржима қилинмаган

Мавзулар

Anomaly Detection Techniques and Applications Advanced Clustering Algorithms Research Data Stream Mining Techniques

Идентификаторлар

DOI: 10.11648/j.ijiis.20251401.12

Иқтибослар ва манбалар

0 та иқтибос16 та фойдаланилган манба

Кўрсаткичлар — AkademScholar