Similarity Ranking-Based Instance Selection for Enhancing <i>k</i>-NN Classification Performances

doi:10.18267/j.aip.310

Acta Informatica Pragensia X:X | DOI: 10.18267/j.aip.31085

Similarity Ranking-Based Instance Selection for Enhancing k-NN Classification Performances

Abdul Muqtasid bin Rushdi¹, Mohammad bin Hossin ORCID...¹, Suhaila binti Saee ORCID...¹, Norita binti Md Norwawi ORCID...²: ¹ Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia; ² Department of Information Security and Assurance, Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai, Negeri Sembilan, Malaysia

Background: The k-nearest neighbours (k-NN) is a well-established classifier in machine learning. Yet, its performance drops and computational costs rise with extensive or redundant datasets. Furthermore, current instance selection (IS) approaches often face scalability problems and are sensitive to parameter settings.

Objective: This study seeks to design a straightforward and efficient IS algorithm that reduces both dataset size and computational demands, yet preserves or enhances the accuracy of k-NN classification.

Methods: We propose Euclidean ranking-based instance selection (ERbIS), a novel IS approach that prioritises samples based on their Euclidean distance from a single anchor point. In this study, two anchor points are introduced: the first data anchor point (FD) and the mean of each column anchor point (MEC). Both ERbIS models (ERbIS-FD and ERbIS-MEC) are evaluated across 21 KEEL datasets. For performance comparison, the ERbIS models are benchmarked against current and state-of-the-art methods, including condensed nearest neighbour rule (CNN), edited nearest neighbour rule (ENN), adaptive threshold-based instance selection algorithm (ATISA1), decremental reduction optimization procedure (DROP3) and ranking-based instance selection (RIS1). The evaluation focuses on reduction speed, reduction rate and k-NN classification accuracy.

Results: The ERbIS models reduce dataset size by an average of 35 to 40% without compromising accuracy compared to the original k-NN and state-of-the-art IS models. Both ERbIS models also demonstrate superior computational efficiency in the reduction process relative to ENN and CNN. Notably, the ERbIS-MEC variant, which utilises the mean of each column as the anchor point, achieves the highest generalisation accuracy among all current and state-of-the-art models.

Conclusion: ERbIS offers an efficient and scalable approach for instance selection in k-NN classification, achieving significant data reduction and enhanced predictive accuracy with minimal parameter tuning. The model demonstrates strong potential for application to large datasets and may be further improved by investigating alternative distance metrics or integrating hybrid instance selection strategies.

Keywords: Instance selection; k-nearest neighbours; Data reduction; Euclidean distance; Data classification.

Received: October 23, 2025; Revised: March 6, 2026; Accepted: March 6, 2026; Prepublished online: June 13, 2026

Download citation

References

Abdalla, H. I., Altaf, A., & Hamzah, A. A. (2025). A threefold-ensemble k-nearest neighbour algorithm. International Journal of Computers and Applications, 47(1), 70-83. https://doi.org/10.1080/1206212X.2024.2446896 Go to original source...
Abdelhay, H. K., Benameur, Z., & Younes, G. (2025). NCBIS: A Novel Clustering-Based Approach for effective Instance Selection. In 2025 7th International Conference on Pattern Analysis and Intelligent Systems, (pp. 1-7). IEEE. https://doi.org/10.1109/PAIS66004.2025.11126049 Go to original source...
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., & García, S. (2011). KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2-3), 255-287.
Alfeilat, H. a. A., Hassanat, A. B., Lasassmeh, O., Tarawneh, A. S., Alhasanat, M. B., Salman, H. S. E., & Prasath, V. S. (2019). Effects of distance measure choice on K-Nearest Neighbor Classifier performance: a review. Big Data, 7(4), 221-248. https://doi.org/10.1089/big.2018.0175 Go to original source...
Amer, A. A., Ravana, S. D., & Habeeb, R. a. A. (2025). Effective k-nearest neighbor models for data classification enhancement. Journal of Big Data, 12(1), Article 86. https://doi.org/10.1186/s40537-025-01137-2 Go to original source...
An, S., Hu, Q., Wang, C., Guo, G., & Li, P. (2021). Data reduction based on NN-kNN measure for NN classification and regression. International Journal of Machine Learning and Cybernetics, 13, 765-781. https://doi.org/10.1007/s13042-021-01327-3 Go to original source...
Blachnik, M. (2019). Ensembles of instance selection methods: A comparative study. International Journal of Applied Mathematics and Computer Science, 29, 151-168. https://doi.org/10.2478/amcs-2019-0012 Go to original source...
Blachnik, M., & Kordos, M. (2020). Comparison of Instance Selection and Construction Methods with Various Classifiers. Applied Sciences, 10(11), Article 3933. https://doi.org/10.3390/app10113933 Go to original source...
Boateng, E. Y., Otoo, J., & Abaye, D. A. (2020). Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, random Forest and Neural network: a review. Journal of Data Analysis and Information Processing, 8(4), 341-357. https://doi.org/10.4236/jdaip.2020.84020 Go to original source...
Carbonera, J. L. (2021). A Global Density-based Approach for Instance Selection. In Proceedings of the 23rd International Conference on Enterprise Information Systems, (pp. 402-409). ScitePress. https://doi.org/10.5220/0010402104020409 Go to original source...
Cavalcanti, G. D., & Soares, R. J. (2020). Ranking-based instance selection for pattern classification. Expert Systems With Applications, 150, 113269. https://doi.org/10.1016/j.eswa.2020.113269 Go to original source...
Eleftheriadis, S., Evangelidis, G., Ougiaroglou, S. (2024). An Empirical Analysis of Data Reduction Techniques for k-NN Classification. In Artificial Intelligence Applications and Innovations, (pp. 83-97). Springer. https://doi.org/10.1007/978-3-031-63223-5_7 Go to original source...
Fang, C., Wang, M., Tsai, C., Lin, W., & Liao, P. (2023). Instance selection using one-versus-all and one-versus-one decomposition approaches in multiclass classification datasets. Expert Systems, 40(6), e13217. https://doi.org/10.1111/exsy.13217 Go to original source...
Feng, Z., & Zhang, J. (2025). Research on AI based Personalized Recommendation System for Foreign Language Learning using k-Nearest Neighbours. In 2025 International Conference on Intelligent Systems and Computational Networks, (pp. 1-7). IEEE. https://doi.org/10.1109/ICISCN64258.2025.10934182 Go to original source...
Gupta, S., Thakar, U., & Tokekar, S. (2025). A comprehensive survey on techniques for numerical similarity measurement. Expert Systems With Applications, 277, 127235. https://doi.org/10.1016/j.eswa.2025.127235 Go to original source...
Halder, R. K., Uddin, M. N., Uddin, M. A., Aryal, S., & Khraisat, A. (2024). Enhancing K-nearest neighbour algorithm: a comprehensive review and performance analysis of modifications. Journal of Big Data, 11(1), Article 113. https://doi.org/10.1186/s40537-024-00973-y Go to original source...
Hossin, M., & Rushdi, A. M. (2025). Anchor-Point Based Euclidean Reduction for Enhanced Instance-based Classification. In 2025 14th International Conference on Information Technology in Asia, (pp. 90-95). IEEE. https://doi.org/10.1109/CITA66455.2025.11198677 Go to original source...
Kalra, V., Kashyap, I., & Kaur, H. (2022). Effect of Distance Measures on K-Nearest Neighbour Classifier. In 2022 Second International Conference on Computer Science, Engineering and Applications, (pp. 1-7). IEEE. https://doi.org/10.1109/ICCSEA54677.2022.9936314 Go to original source...
Kordos, M., Blachnik, M., & Scherer, R. (2021). Fuzzy clustering decomposition of genetic algorithm-based instance selection for regression problems. Information Sciences, 587, 23-40. https://doi.org/10.1016/j.ins.2021.12.016 Go to original source...
Levy, A., Shalom, B. R., & Chalamish, M. (2025). A guide to similarity measures and their data science applications. Journal of Big Data, 12(1), Article 188. https://doi.org/10.1186/s40537-025-01227-1 Go to original source...
Li, J., Zhu, Q., & Wu, Q. (2020). A parameter-free hybrid instance selection algorithm based on local sets with natural neighbours. Applied Intelligence, 50, 1527-1541. https://doi.org/10.1007/s10489-019-01598-y Go to original source...
Malhat, M., Menshawy, M. E., Mousa, H., & Sisi, A. E. (2020). A new approach for instance selection: Algorithms, evaluation, and comparisons. Expert Systems With Applications, 149, 113297. https://doi.org/10.1016/j.eswa.2020.113297 Go to original source...
Moran, M., Cohen, T., Ben-Zion, Y., & Gordon, G. (2022). Curious instance selection. Information Sciences, 608, 794-808. https://doi.org/10.1016/j.ins.2022.07.025 Go to original source...
Muniswamaiah, M., Agerwala, T., & Tappert, C. C. (2023). Applications of binary similarity and distance measures. arXiv preprint arXiv:2307.00411. https://doi.org/10.48550/arXiv.2307.00411 Go to original source...
Ortiz-Villaseñor, D., Trujillo-Hernández, G., Real-Moreno, O., Castro-Toscano, M. J., Medina-Madrazo, L. D., & Barrera-Román, D. (2025). K-nearest neighbours regression and applications. In Exploring Psychology, Social Innovation and Advanced Applications of Machine Learning, (pp. 295-316). IGI Global. https://doi.org/10.4018/979-8-3693-6910-4.ch015 Go to original source...
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). SciKit-Learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. https://doi.org/10.5555/1953048.2078195 Go to original source...
Saha, S., Sarker, P. S., Saud, A. A., Shatabda, S., & Newton, M. H. (2022). Cluster-oriented instance selection for classification problems. Information Sciences, 602, 143-158. https://doi.org/10.1016/j.ins.2022.04.036 Go to original source...
Shi, Z. (2020). Improving K-Nearest Neighbors algorithm for imbalanced data classification. IOP Conference Series Materials Science and Engineering, 719(1), 012072. https://doi.org/10.1088/1757-899x/719/1/012072 Go to original source...
Shirkhorshidi, A. S., Aghabozorgi, S., & Wah, T. Y. (2015). A comparison study on similarity and dissimilarity measures in clustering continuous data. PloS One, 10(12), e0144059. https://doi.org/10.1371/journal.pone.0144059 Go to original source...
Shukla, S., Singh, A., & Vishwakarma, G. K. (2025). Predictive estimation for mean under median ranked set sampling: an application to COVID-19 data. Indian Journal of Pure and Applied Mathematics, 56(1), 218-229. https://doi.org/10.1007/s13226-023-00470-7 Go to original source...
Syriopoulos, P. K., Kalampalikis, N. G., Kotsiantis, S. B., & Vrahatis, M. N. (2023). kNN Classification: A review. Annals of Mathematics and Artificial Intelligence, 93(1), 43-75. https://doi.org/10.1007/s10472-023-09882-x Go to original source...
Xing, W., & Bei, Y. (2020). Medical Health Big Data Classification Based on KNN Classification Algorithm. IEEE Access, 8, 28808-28819. https://doi.org/10.1109/ACCESS.2019.2955754 Go to original source...
Zhai, J., & Song, D. (2022). Optimal instance subset selection from big data using genetic algorithm and open source framework. Journal of Big Data, 9, Article 87. https://doi.org/10.1186/s40537-022-00640-0 Go to original source...
Zhang, S. (2021). Challenges in KNN Classification. IEEE Transactions on Knowledge and Data Engineering, 34(10), 4663-4675. https://doi.org/10.1109/TKDE.2021.3049250 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return