Modular Local Classification via Cluster-Guided Feature Selection in Tabular Data

doi:10.18267/j.aip.295

Acta Informatica Pragensia X:X | DOI: 10.18267/j.aip.29531

Modular Local Classification via Cluster-Guided Feature Selection in Tabular Data

Leila Boussaad ORCID...: Department of Management, Faculty of Economics, University of Batna 1, Batna, Algeria

Background: Many real-world tabular datasets are heterogeneous, with distinct regions of the feature space exhibiting different feature–label relationships. Conventional global classifiers often miss these local patterns, reducing both predictive accuracy and interpretability. Objective: This study aims to design a modular classification framework that combines local specialization with global consistency to enhance predictive performance and interpretability in heterogeneous tabular data.

Methods: The author proposes Cluster-guided local feature selection with top-2 voting and fallback (CGLFS+), which integrates unsupervised clustering, cluster-specific feature selection and lightweight local models. Final predictions combine top-2 local decisions with a global fallback classifier for robustness. The framework was evaluated on five diverse benchmark datasets using repeated stratified cross-validation.

Results: CGLFS+ achieved consistent gains in accuracy and macro F1 over strong baselines, with statistically significant improvements and competitive inference times.

Conclusion: CGLFS+ successfully balances local adaptation and global consistency, providing a scalable and interpretable approach well suited to heterogeneous domains such as healthcare, chemistry and finance.

Keywords: Local models; Feature selection; Clustering; Modular classification; Tabular data interpretable machine learning.

Received: August 9, 2025; Revised: October 9, 2025; Accepted: October 24, 2025; Prepublished online: December 29, 2025

Download citation

References

Aguilar-Ruiz, J. S. (2024). Class-specific feature selection for classification explainability. arXiv:2411.01204. https://doi.org/10.48550/arXiv.2411.01204 Go to original source...
Alangari, N., Menai, M. E. B., Mathkour, H., & Almosallam, I. (2023). Intrinsically interpretable gaussian mixture model. Information, 14(3), 164. https://doi.org/10.3390/info14030164 Go to original source...
Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., & Huang, J. (2025). A survey on mixture of experts in large language models. IEEE Transactions on Knowledge and Data Engineering, 37(7), 3896-3915. https://doi.org/10.1109/TKDE.2025.3554028 Go to original source...
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 47(4), 547-553. https://doi.org/10.1016/j.dss.2009.05.016 Go to original source...
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of machine learning research, 7, 1-30.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78-87. https://doi.org/10.1145/2347736.2347755 Go to original source...
Dua, D. & Graff, C. (2019). UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php
Fedus, W., Zoph, B., & Shazeer, N. (2022). Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23, 1-39.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems?. Journal of machine learning research, 15(1), 3133-3181.
Friedberg, R., Tibshirani, J., Athey, S., & Wager, S. (2020). Local linear forests. Journal of Computational and Graphical Statistics, 30(2), 503-517. https://doi.org/10.1080/10618600.2020.1831930 Go to original source...
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on typical tabular data?. In 36th Conference on Neural Information Processing Systems, (pp. 507-520). NIPS.
Hancer, E., Xue, B., & Zhang, M. (2020). A survey on feature selection approaches for clustering. Artificial intelligence review, 53(6), 4519-4545. https://doi.org/10.1007/s10462-019-09800-w Go to original source...
Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Element of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://doi.org/10.1007/978-0-387-84858-7 Go to original source...
Hu, L., Jiang, M., Dong, J., Liu, X., & He, Z. (2024). Interpretable clustering: A survey. arXiv:2409.00743. https://doi.org/10.48550/arXiv.2409.00743 Go to original source...
Ismail, A. A., Arik, S. Ö., Yoon, J., Taly, A., Feizi, S., & Pfister, T. (2022). Interpretable mixture of experts. arXiv:2206.02107. https://doi.org/10.48550/arXiv.2206.02107 Go to original source...
Kalangi, P. K., Rachuri, G., Saleem, D., Chandana, P., Goud, B. P., & Kumar, S. V. (2025). A Hybrid Approach to Accurate Breast Cancer Prediction Integrating: Explainable AI and Machine Learning. In 2025 5th International Conference on Intelligent Technologies (CONIT) (pp. 1-7). IEEE. https://doi.org/10.1109/CONIT65521.2025.11167733 Go to original source...
Kaur, I., & Ahmad, T. (2024). A cluster-based ensemble approach for congenital heart disease prediction. Computer Methods and Programs in Biomedicine, 243, 107922. https://doi.org/10.1016/j.cmpb.2023.107922 Go to original source...
Kheradpisheh, S. R., Sharifizadeh, F., Nowzari-Dalini, A., Ganjtabesh, M., & Ebrahimpour, R. (2014). Mixture of feature specified experts. Information Fusion, 20, 242-251. https://doi.org/10.1016/j.inffus.2014.02.006 Go to original source...
Kim, B., Shah, J. A., & Doshi-Velez, F. (2015). Mind the gap: A generative approach to interpretable feature selection and extraction. In 28th Conference on Neural Information Processing Systems, (pp. 1-9). NIPS.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (II), (pp. 1137-1145). IJCAI.
Kuang, Y. C., & Ooi, M. (2024). Performance Characterization of Clusterwise Linear Regression Algorithms. Wiley Interdisciplinary Reviews: Computational Statistics, 16(5), e70004. https://doi.org/10.1002/wics.70004 Go to original source...
Law, M. H., Figueiredo, M. A., & Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models. IEEE transactions on pattern analysis and machine intelligence, 26(9), 1154-1166. https://doi.org/10.1109/TPAMI.2004.71 Go to original source...
Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., ... & Chen, Z. (2020). Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv:2006.16668. https://doi.org/10.48550/arXiv.2006.16668 Go to original source...
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67. https://doi.org/10.1038/s42256-019-0138-9 Go to original source...
Luque, A., Mazzoleni, M., Zamora-Polo, F., Ferramosca, A., Lama, J. R., & Previdi, F. (2023). Determining the importance of physicochemical properties in the perceived quality of wines. IEEE Access, 11, 115430-115449. https://doi.org/10.1109/access.2023.3325676 Go to original source...
Ma, X. A., & Lu, K. (2024). Class-specific feature selection using neighborhood mutual information with relevance-redundancy weight. Knowledge-Based Systems, 300, 112212. https://doi.org/10.1016/j.knosys.2024.112212 Go to original source...
MacQueen, J. (1965). Some methods for classification and analysis of multivariate observations [C]. In Proc. of Berkeley Symposium on Mathematical Statistics & Probability, (pp. 281-297). University of California Press.
McElfresh, D., Khandagale, S., Valverde, J., Prasad C, V., Ramakrishnan, G., Goldblum, M., & White, C. (2023). When do neural nets outperform boosted trees on tabular data?. In Proceedings of the 37th International Conference on Neural Information Processing System, (pp. 76336-76369). NIPS.
McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11), 205. https://doi.org/10.21105/joss.00205 Go to original source...
Molnar, C. (2020). Interpretable machine learning. https://christophm.github.io/interpretable-ml-book/
Oyamada, M., & Nakadai, S. (2017). Relational mixture of experts: Explainable demographics prediction with behavioral data. In 2017 IEEE International Conference on Data Mining (ICDM) (pp. 357-366). IEEE. https://doi.org/10.1109/ICDM.2017.45 Go to original source...
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12, 2825-2830.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238. https://doi.org/10.1109/TPAMI.2005.159 Go to original source...
Peralta, B., & Soto, A. (2014). Embedded local feature selection within mixture of experts. Information Sciences, 269, 176-187. https://doi.org/10.1016/j.ins.2014.01.008 Go to original source...
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4(2), 1883. https://doi.org/10.4249/scholarpedia.1883 Go to original source...
Ross, B. C. (2014). Mutual information between discrete and continuous data sets. PloS one, 9(2), e87357. https://doi.org/10.1371/journal.pone.0087357 Go to original source...
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215. https://doi.org/10.1038/s42256-019-0048-x Go to original source...
Salehi, A. R., & Khedmati, M. (2024). A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data. Scientific Reports, 14(1), 5152. https://doi.org/10.1038/s41598-024-55598-1 Go to original source...
Scikit-learn developers. (2025). Digits dataset - scikit-learn documentation. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
Shi, Y., Zeng, H., Gong, X., Cai, L., Xiang, W., Lin, Q., Zheng, H., & Zhu, J. (2025). Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization. Applied Sciences, 15(12), 6884. https://doi.org/10.3390/app15126884 Go to original source...
Shwartz-Ziv, R., & Armon, A. (2022). Tabular data: Deep learning is not all you need. Information Fusion, 81, 84-90. https://doi.org/10.1016/j.inffus.2021.11.011 Go to original source...
Yeganejou, M., & Dick, S. (2019). Improved deep fuzzy clustering for accurate and interpretable classifiers. In 2019 IEEE international conference on fuzzy systems (FUZZ-IEEE) (pp. 1-7). IEEE. https://doi.org/10.1109/FUZZ-IEEE.2019.8858809 Go to original source...
Yilmaz Eroglu, D., & Guleryuz, E. (2025). Enhanced Three-Stage Cluster-Then-Classify Method (ETSCCM). Metals, 15(3), 318. https://doi.org/10.3390/met15030318 Go to original source...
Yuksel, S. E., Wilson, J. N., & Gader, P. D. (2012). Twenty years of mixture of experts. IEEE transactions on neural networks and learning systems, 23(8), 1177-1193. https://doi.org/10.1109/TNNLS.2012.2200299 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return