Acta Informatica Pragensia 2025, 14(1), 88-111 | DOI: 10.18267/j.aip.2544582
Induced Partitioning for Incremental Feature Selection via Rough Set Theory and Long-tail Position Grey Wolf Optimizer
- Department of Computer Science, College of Computing, Khon Kaen University, Khon Kaen, Thailand
Background: Feature selection methods play a crucial role in handling challenges such as imbalanced classes, noisy data and high dimensionality. However, existing techniques, including swarm intelligence and set theory approaches, often struggle with high-dimensional datasets due to repeated reassessment of feature selection, leading to increased processing time and computational inefficiency.
Objective: This study aims to develop an enhanced incremental feature selection method that minimizes dependency on the initial dataset while improving computational efficiency. Specifically, the approach focuses on dynamic sampling and adaptive optimization to address the challenges in high-dimensional data environments.
Methods: We implement a dynamic sampling approach based on rough set theory, integrating the Long-Tail Position Grey Wolf Optimizer. This method incrementally adjusts to new data samples without relying on the original dataset for feature selection, reducing variance in partitioned datasets. The performance is evaluated on benchmark datasets, comparing the proposed method to existing techniques.
Results: Experimental evaluations demonstrate that the proposed method outperforms existing techniques in terms of F1 score, precision, recall and computation time. The incremental adjustment and reduced dependence on the initial data improve the overall accuracy and efficiency of feature selection in high-dimensional contexts.
Conclusion: This study offers a significant advancement in feature selection methods for high-dimensional datasets. By addressing computational demands and improving accuracy, the proposed approach contributes to data science and machine learning, paving the way for more efficient and reliable feature selection processes in complex data environments. Future work may focus on extending this method to new optimization frameworks and enhancing its adaptability.
Keywords: Optimizer; Rough set theory; Feature selection; Incremental; Data partitioning.
Received: August 11, 2024; Revised: November 25, 2024; Accepted: November 26, 2024; Prepublished online: December 16, 2024; Published: January 31, 2025 Show citation
ACS | AIP | APA | ASA | Harvard | Chicago | Chicago Notes | IEEE | ISO690 | MLA | NLM | Turabian | Vancouver |
References
- Abdel-Basset, M., El-Shahat, D., El-henawy, I., de Albuquerque, V. H. C., & Mirjalili, S. (2020). A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Systems with Applications, 139, 112824. https://doi.org/10.1016/j.eswa.2019.112824
Go to original source...
- Al Afghani Edsa, S., & Sunat, K. (2023). Hybridization of Modified Grey Wolf Optimizer and Dragonfly for Feature Selection. In Data Science and Artificial Intelligence, DSAI 2023, (pp. 35-42). Springer. https://doi.org/10.1007/978-981-99-7969-1_3
Go to original source...
- Almotairi, K. H. (2023). MiRNA subset selection for microarray data classification using grey wolf optimizer and evolutionary population dynamics. Neural Computing and Applications, 35(25), 18737-18761. https://doi.org/10.1007/s00521-023-08701-y
Go to original source...
- Altay, O., & Varol Altay, E. (2023). A novel hybrid multilayer perceptron neural network with improved grey wolf optimizer. Neural Computing and Applications, 35(1), 529-556. https://doi.org/10.1007/s00521-022-07775-4
Go to original source...
- Arora, V., & Agarwal, P. (2024). An Empirical Study of Nature-Inspired Algorithms for Feature Selection in Medical Applications. Annals of Data Science, (in press). https://doi.org/10.1007/s40745-024-00571-y
Go to original source...
- Asniar, Maulidevi, N. U., & Surendro, K. (2022). SMOTE-LOF for noise identification in imbalanced data classification. Journal of King Saud University - Computer and Information Sciences, 34(6), 3413-3423. https://doi.org/10.1016/j.jksuci.2021.01.014
Go to original source...
- Dash, M., & Liu, H. (2000). Feature Selection for Clustering. In T. Terano, H. Liu, & A. L. P. Chen (Eds.), Knowledge Discovery and Data Mining. Current Issues and New Applications (pp. 110-121). Springer. https://doi.org/10.1007/3-540-45571-X_13
Go to original source...
- Dehghan, Z., & Mansoori, E. G. (2018). A new feature subset selection using bottom-up clustering. Pattern Analysis and Applications, 21(1), 57-66. https://doi.org/10.1007/s10044-016-0565-8
Go to original source...
- Dhargupta, S., Ghosh, M., Mirjalili, S., & Sarkar, R. (2020). Selective Opposition based Grey Wolf Optimization. Expert Systems with Applications, 151, 113389. https://doi.org/10.1016/j.eswa.2020.113389
Go to original source...
- Gilal, A. R., Abro, A., Hassan, G., Jaafar, J., & Rehman, F. (2019). A Rough-Fuzzy Model for Early Breast Cancer Detection. Journal of Medical Imaging and Health Informatics, 9(4), 688-696. https://doi.org/10.1166/jmihi.2019.2664
Go to original source...
- Gu, S., Cheng, R., & Jin, Y. (2018). Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Computing, 22(3), 811-822. https://doi.org/10.1007/s00500-016-2385-6
Go to original source...
- Hancer, E., Xue, B., & Zhang, M. (2020). A survey on feature selection approaches for clustering. Artificial Intelligence Review, 53(6), 4519-4545. https://doi.org/10.1007/s10462-019-09800-w
Go to original source...
- Hashem, M. H., Abdullah, H. S., & Ghathwan, K. I. (2023). Grey Wolf Optimization Algorithm: A Survey. Iraqi Journal of Science, 64(11), 5964-5984. https://doi.org/10.24996/ijs.2023.64.11.40
Go to original source...
- Jain, A., Nagar, S., Singh, P. K., & Dhar, J. (2023). A hybrid learning-based genetic and grey-wolf optimizer for global optimization. Soft Computing, 27(8), 4713-4759. https://doi.org/10.1007/s00500-022-07604-9
Go to original source...
- Jia, H., Li, J., Song, W., Peng, X., Lang, C., & Li, Y. (2019). Spotted Hyena Optimization Algorithm with Simulated Annealing for Feature Selection. IEEE Access, 7, 71943-71962. https://doi.org/10.1109/ACCESS.2019.2919991
Go to original source...
- Khan, S. H., Hayat, M., Bennamoun, M., Sohel, F. A., & Togneri, R. (2017). Cost-Sensitive learning of deep feature representations from imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3573-3587. https://doi.org/10.1109/tnnls.2017.2732482
Go to original source...
- Kwakye, B. D., Li, Y., Mohamed, H. H., Baidoo, E., & Asenso, T. Q. (2024). Particle guided metaheuristic algorithm for global optimization and feature selection problems. Expert Systems with Applications, 248, 123362. https://doi.org/10.1016/j.eswa.2024.123362
Go to original source...
- Li, F., Zhang, Z., & Jin, C. (2016). Feature selection with partition differentiation entropy for large-scale data sets. Information Sciences, 329, 690-700. https://doi.org/10.1016/j.ins.2015.10.002
Go to original source...
- Li, J., Lei, H., Alavi, A. H., & Wang, G. G. (2020). Elephant herding optimization: Variants, hybrids, and applications. Mathematics, 8(9), 1415. https://doi.org/10.3390/MATH8091415
Go to original source...
- Li, K., Li, S., Huang, Z., Zhang, M., & Xu, Z. (2022). Grey Wolf Optimization algorithm based on Cauchy-Gaussian mutation and improved search strategy. Scientific Reports, 12(1), 18961. https://doi.org/10.1038/s41598-022-23713-9
Go to original source...
- Li, Y., Wu, X., & Wang, X. (2023). Incremental reduction methods based on granular ball neighborhood rough sets and attribute grouping. International Journal of Approximate Reasoning, 160, 108974. https://doi.org/10.1016/j.ijar.2023.108974
Go to original source...
- Ma, T., Lu, S., & Jiang, C. (2024). A membership-based resampling and cleaning algorithm for multi-class imbalanced overlapping data. Expert Systems with Applications, 240, 122565. https://doi.org/10.1016/j.eswa.2023.122565
Go to original source...
- Meng, Z., & Shi, Z. (2016). On quick attribute reduction in decision-theoretic rough set models. Information Sciences, 330, 226-244. https://doi.org/10.1016/j.ins.2015.09.057
Go to original source...
- Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey Wolf Optimizer. Advances in Engineering Software, 69, 46-61. https://doi.org/10.1016/j.advengsoft.2013.12.007
Go to original source...
- Pan, H., Chen, S., & Xiong, H. (2023). A high-dimensional feature selection method based on modified Gray Wolf Optimization. Applied Soft Computing, 135. https://doi.org/10.1016/j.asoc.2023.110031
Go to original source...
- Pham, T. H., & Raahemi, B. (2023). Bio-Inspired Feature Selection Algorithms With Their Applications: A Systematic Literature Review. IEEE Access, 11, 43733-43758. https://doi.org/10.1109/ACCESS.2023.3272556
Go to original source...
- Pichai, S., Sunat, K., & Chiewchanwattana, S. (2020). An asymmetric chaotic competitive swarm optimization algorithm for feature selection in high-dimensional data. Symmetry, 12(11), 1-13. https://doi.org/10.3390/sym12111782
Go to original source...
- Premalatha, M., Jayasudha, M., Èep, R., Priyadarshini, J., Kalita, K., & Chatterjee, P. (2024). A comparative evaluation of nature-inspired algorithms for feature selection problems. Heliyon, 10(1), e23571. https://doi.org/10.1016/j.heliyon.2023.e23571
Go to original source...
- Raza, M. S., & Qamar, U. (2018). Feature selection using rough set-based direct dependency calculation by avoiding the positive region. International Journal of Approximate Reasoning, 92, 175-197. https://doi.org/10.1016/j.ijar.2017.10.012
Go to original source...
- Roth, V., & Lange, T. (2003). Feature Selection in Clustering Problems. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in Neural Information Processing Systems 2003, (pp. 473-480). NeurIPS. https://proceedings.neurips.cc/paper_files/paper/2003/file/bb03e43ffe34eeb242a2ee4a4f125e56-Paper.pdf
- Sharma, M., & Kaur, P. (2021). A Comprehensive Analysis of Nature-Inspired Meta-Heuristic Techniques for Feature Selection Problem. Archives of Computational Methods in Engineering, 28(3), 1103-1127. https://doi.org/10.1007/s11831-020-09412-6
Go to original source...
- Shikoun, N. H., Al-Eraqi, A. S., & Fathi, I. S. (2024). BINCOA: An efficient binary crayfish optimization algorithm for feature selection. IEEE Access, 12, 28621-28635. https://doi.org/10.1109/access.2024.3366495
Go to original source...
- Tran, B., Xue, B., & Zhang, M. (2019). Variable-Length Particle Swarm Optimization for Feature Selection on High-Dimensional Classification. IEEE Transactions on Evolutionary Computation, 23(3), 473-487. https://doi.org/10.1109/TEVC.2018.2869405
Go to original source...
- Wang, C., Huang, Y., Ding, W., & Cao, Z. (2021). Attribute reduction with fuzzy rough self-information measures. Information Sciences, 549, 68-86. https://doi.org/10.1016/j.ins.2020.11.021
Go to original source...
- Wang, Y., Wang, T., Dong, S., & Yao, C. (2020). An Improved Grey-Wolf Optimization Algorithm Based on Circle Map. Journal of Physics Conference Series, 1682(1), 01202. https://doi.org/10.1088/1742-6596/1682/1/012020
Go to original source...
- Xu, H., Cao, Q., Fu, H., & Chen, H. (2019). Applying an Improved Elephant Herding Optimization Algorithm with Spark-based Parallelization to Feature Selection for Intrusion Detection. International Journal of Performability Engineering, 15(6), 1600-1610. https://doi.org/10.23940/ijpe.19.06.p11.16001610
Go to original source...
- Yang, Y., Chen, D., Wang, H., & Wang, X. (2018). Incremental Perspective for Feature Selection Based on Fuzzy Rough Sets. IEEE Transactions on Fuzzy Systems, 26(3), 1257-1273. https://doi.org/10.1109/TFUZZ.2017.2718492
Go to original source...
- Yang, Y., Chen, D., Zhang, X., Ji, Z., & Zhang, Y. (2022). Incremental feature selection by sample selection and feature-based accelerator. Applied Soft Computing, 121, 108800. https://doi.org/10.1016/j.asoc.2022.108800
Go to original source...
- Yang, Y., Song, S., Chen, D., & Zhang, X. (2020). Discernible neighborhood counting based incremental feature selection for heterogeneous data. International Journal of Machine Learning and Cybernetics, 11(5), 1115-1127. https://doi.org/10.1007/s13042-019-00997-4
Go to original source...
- Zhang, H., Chen, J., Zhang, Q., Chen, Z., Ding, X., & Yao, J. (2023). Grey Wolf Optimization Algorithm Based on Follow-Controlled Learning Strategy. IEEE Access, 11, 101852-101872. https://doi.org/10.1109/ACCESS.2023.3314514
Go to original source...
- Zhang, H., Chen, Q., Xue, B., Banzhaf, W., & Zhang, M. (2024). A geometric semantic macro-crossover operator for evolutionary feature construction in regression. Genetic Programming and Evolvable Machines, 25(1), Article no. 2. https://doi.org/10.1007/s10710-023-09465-z
Go to original source...
- Zhao, P., Zhang, Y., Ma, Y., Zhao, X., & Fan, X. (2023). Discriminatively embedded fuzzy K-Means clustering with feature selection strategy. Applied Intelligence, 53(16), 18959-18970. https://doi.org/10.1007/s10489-022-04376-5
Go to original source...
This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.