Acta Informatica Pragensia 2023, 12(2), 379-399 | DOI: 10.18267/j.aip.2233543

ck-means and fck-means: Two Deterministic Initialization Procedures for k-means Algorithm Using a Modified Crowding Distance

Abdesslem Layeb ORCID...
LISIA Laboratory, NTIC Faculty, University of Constantine 2, Constantine, Algeria

This paper presents two novel deterministic initialization procedures for k-means clustering based on a modified crowding distance. The procedures, named ck-means and fck-means, use more crowded points as initial centroids. Experimental studies on multiple datasets demonstrate that the proposed approach outperforms k-means and k-means++ in terms of clustering accuracy. The effectiveness of ck-means and fck-means is attributed to their ability to select better initial centroids based on the modified crowding distance. Overall, the proposed approach provides a promising alternative for improving k-means clustering.

Keywords: Clustering; Kmeans; Kmeans++; Initialization; Crowding distance; Heuristics.

Received: May 23, 2023; Revised: August 7, 2023; Accepted: September 3, 2023; Prepublished online: September 11, 2023; Published: October 10, 2023  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Layeb, A. (2023). ck-means and fck-means: Two Deterministic Initialization Procedures for k-means Algorithm Using a Modified Crowding Distance. Acta Informatica Pragensia12(2), 379-399. doi: 10.18267/j.aip.223
Download citation

References

  1. Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, (pp. 1027-1035). ACM.
  2. Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems With Applications, 40(1), 200-210. https://doi.org/10.1016/j.eswa.2012.07.021 Go to original source...
  3. Chowdhury, K., Chaudhuri, D., & Pal, A. K. (2021). An entropy-based initialization method of K-means clustering on the optimal number of clusters. Neural Computing and Applications, 33(12), 6965-6982. https://doi.org/10.1007/s00521-020-05471-9 Go to original source...
  4. Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182-197. https://doi.org/10.1109/4235.996017 Go to original source...
  5. Faridi, H., Srinivasagopalan, S., & Verma, R. (2018). Performance evaluation of features and clustering algorithms for malware. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 13-22). IEEE. https://doi.org/10.1109/ICDMW.2018.00010 Go to original source...
  6. Fränti, P., & Sieranoja, S. (2019). How much can k-means be improved by using better initialization and repeats? Pattern Recognition, 93, 95-112. https://doi.org/10.1016/j.patcog.2019.04.014 Go to original source...
  7. Hamerly, G., & Elkan, C. (2003). Learning the k in k-means. In Advances in neural information processing systems (pp. 281-288). NeurIPS Proceedings. https://proceedings.neurips.cc/paper_files/paper/2003/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf
  8. Hastie, T., Tibshirani, R., & Friedman, J. (2009). Unsupervised learning. In The Elements of Statistical Learning (pp. 485-585). Springer. https://doi.org/10.1007/978-0-387-84858-7_14 Go to original source...
  9. Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Jia, H. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139 Go to original source...
  10. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
  11. Kumar, A., & Kumar, S. (2017). Density based initialization method for K-Means clustering algorithm. International Journal of Intelligent Systems and Applications, 9(10), 40-48. https://doi.org/10.5815/ijisa.2017.10.05 Go to original source...
  12. Liu, H., Chen, F., Wu, Y., Xu, K., & Tian, D. (2015). Improved K-means Algorithm with the Pretreatment of PCA Dimension Reduction. International Journal of Hybrid Information Technology, 8(6), 195-204. https://doi.org/10.14257/ijhit.2015.8.6.19 Go to original source...
  13. Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650-1654. https://doi.org/10.1109/tpami.2002.1114856 Go to original source...
  14. Tzortzis, G., & Likas, A. (2014). The MinMax k-Means clustering algorithm. Pattern Recognition, 47(7), 2505-2516. https://doi.org/10.1016/j.patcog.2014.01.015 Go to original source...
  15. Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. In Proceedings of The 33rd International Conference on Machine Learning, (pp. 478-487). PMLR. https://proceedings.mlr.press/v48/xieb16.html
  16. Xu, R., & Wunsch, D. C. (2005). Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 16(3), 645-678. https://doi.org/10.1109/tnn.2005.845141 Go to original source...
  17. Yang, B., Fu, X., Sidiropoulos, N. D., & Hong, M. (2017). Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In Proceedings of the 34th International Conference on Machine Learning, (pp. 3861-3870). ACM.
  18. Yu, S., Chu, S. W., Wang, C., Chan, Y., & Chang, T. (2018). Two improved k-means algorithms. Applied Soft Computing, 68, 747-755. https://doi.org/10.1016/j.asoc.2017.08.032 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.