Corr-SHAP: Correlation-Aware Sampling for Faithful SHAP Value Estimation

doi:10.18267/j.aip.306

Acta Informatica Pragensia X:X | DOI: 10.18267/j.aip.306192

Corr-SHAP: Correlation-Aware Sampling for Faithful SHAP Value Estimation

Ridha El Hamdi ORCID...^1,2, Hana Charaabi ORCID...^1,3, Ibtissam Hdhiri ORCID...⁴, Mohamed Njah ORCID...^1,3: ¹ Laboratory of Advanced Technologies for Medicine and Signals, National Engineering School of Sfax, University of Sfax, Tunisia; ² National Engineering School of Gabès, University of Gabès, Tunisia; ³ Digital Research Center of Sfax, Technopole of Sfax, Tunisia; ⁴ Department of Mathematics, Faculty of Sciences of Gabès, University of Gabès, Tunisia

Background: SHapley Additive exPlanations (SHAP) methods are widely used to interpret machine learning models, yet most implementations assume feature independence. This assumption rarely holds in practice, especially when features are correlated, leading to biased and unstable attributions.

Objective: We introduce Corr-SHAP, a correlation-aware SHAP approach that produces more faithful and stable feature attributions by explicitly modeling feature dependencies. Our aim is to enhance the accuracy, robustness, and scalability of SHAP explanations for models trained on correlated data.

Methods: Corr-SHAP models feature correlations via a multivariate Gaussian approximation with a Ledoit–Wolf covariance estimator. We design a correlation-aware sampling distribution that penalizes redundant coalitions, improving computational efficiency in higher dimensions. To correct the induced bias, we employ a Self-Normalized Importance Sampling estimator, which re-weights samples by the ratio of the true Shapley kernel to the sampling probability. Our analysis establishes high probability error bounds in terms of Effective Sample Size, extending convergence guarantees to correlated feature spaces.

Results: Across synthetic and real-world datasets, Corr-SHAP achieves Shapley value estimates that closely align with Kernel SHAP, while exhibiting substantially lower variance and more stable feature rankings. In correlated clusters, Corr-SHAP systematically down-weights redundant features, improving ranking fidelity without introducing bias. To further support scalability, we demonstrate that combining Corr-SHAP with Leverage-SHAP reduces variance in higher-dimensional settings.

Conclusion: Corr-SHAP provides a statistically grounded and computationally efficient framework for SHAP value estimation under feature correlation. By integrating correlation modeling, bias correction, and variance reduction, it scales beyond small toy problems and delivers explanations that are both accurate and reliable, making it a valuable tool for practitioners analyzing complex real-world datasets.

Keywords: Explainable artificial intelligence; XAI; SHapley Additive exPlanations; Feature correlation; Model interpretability; Importance sampling; Variance reduction.

Received: September 29, 2025; Revised: February 2, 2026; Accepted: February 5, 2026; Prepublished online: March 27, 2026

Download citation

References

Aas, K., Jullum, M., & Løland, A. (2021). Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence, 298, 103502. https://doi.org/10.1016/j.artint.2021.103502 Go to original source...
Ali, A. A., Galal, G. R., & Hassan, H. S. (2025). Diabetes prediction on PIMA Indian Dataset using machine learning techniques. International Journal of Environmental Sciences, 529-550. https://doi.org/10.64252/3a8wqx36 Go to original source...
Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115. https://doi.org/10.1016/j.inffus.2019.12.012 Go to original source...
Bachmann, S. (2025). Efficient XAI: A low-cost data reduction approach to SHAP interpretability. Journal of Artificial Intelligence Research, 83(2), 1-21. https://doi.org/10.1613/jair.1.18325 Go to original source...
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324 Go to original source...
Burkart, N., & Huber, M. F. (2021). A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70, 245-317. https://doi.org/10.1613/jair.1.12228 Go to original source...
Charaabi, H., Sayari, A., Hamdi, R. E., Njah, M., & Slima, M. B. (2024). An XAI-Infused Multiclass MRI Brain Tumor Classification using Deep Transfert Learning (DTL). In 2024 10th International Conference on Control, Decision and Information Technologies (CoDIT), (pp. 1044-1049). IEEE. https://doi.org/10.1109/CoDIT62066.2024.10708599 Go to original source...
Dua, D., & Graff, C. (2019). UCI Machine Learning Repository: Pima Indians Diabetes Dataset. University of California, Irvine. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
Huang, X., & Marques-Silva, J. (2024). On the failings of Shapley values for explainability. International Journal of Approximate Reasoning, 171, 109112. https://doi.org/10.1016/j.ijar.2023.109112 Go to original source...
Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1989). Heart Disease - Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X Go to original source...
Kirbaş, İ., & Çifci, A. (2025). Leveraging SHAP for Interpretable Diabetes Prediction: A Study of Machine Learning Models on the Pima Indians Diabetes Dataset. Balkan Journal of Electrical and Computer Engineering, 13(2), 128-139. https://doi.org/10.17694/bajece.1577929 Go to original source...
Khan, A., Ali, A., Khan, J., Ullah, F., & Faheem, M. (2025). Exploring consistent feature selection for software fault prediction: an XAI-Based Model-Agnostic approach. IEEE Access, 13, 75493-75524. https://doi.org/10.1109/access.2025.3558913 Go to original source...
Leyh, N. (2026). Automated machine learning in action: Performance evaluation for predictive analytics tasks. Acta Informatica Pragensia, 15(1), 72-89. https://doi.org/10.18267/j.aip.288 Go to original source...
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56-67. https://doi.org/10.1038/s42256-019-0138-9 Go to original source...
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, (pp. 4768-4777). NeurIPS.
Mariappan, R. (2025). Extensive review of literature on explainable AI (XAI) in healthcare applications. Recent Advances in Computer Science and Communications, 18(1), e200324228159. https://doi.org/10.2174/0126662558296699240314055348 Go to original source...
Muhammad, D., & Bendechache, M. (2024). Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis. Computational and Structural Biotechnology Journal, 24, 542-560. https://doi.org/10.1016/j.csbj.2024.08.005 Go to original source...
Musco, C., & Witter, R. T. (2025). Provably accurate Shapley value estimation via leverage score sampling. In Proceedings of the 13th International Conference on Learning Representations, (pp. 91936-91963). ICLR.
Nikhil, S. S. (2024), Accurate Prediction of Heart Disease Using Machine Learning: A Case Study on the Cleveland Dataset. International Journal of Innovative Science and Research Technology, 9(7), 1042-1049. https://doi.org/10.38124/ijisrt/IJISRT24JUL1400 Go to original source...
Ras, G., Xie, N., van Gerven, M., & Doran, D. (2022). Explainable deep learning: A field guide for the uninitiated. Journal of Artificial Intelligence Research, 73, 329-396. https://doi.org/10.1613/jair.1.13200 Go to original source...
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 1135-1144). ACM. https://doi.org/10.1145/2939672.2939778 Go to original source...
Shrestha, D. (2024). Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction Using the Cleveland Heart Disease Dataset. Preprints.org. https://doi.org/10.20944/preprints202407.1333.v1 Go to original source...
Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning, (pp. 3145-3154). MLR. https://proceedings.mlr.press/v70/shrikumar17a.html
Štrumbelj, E., & Kononenko, I. (2010). An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research, 11, 1-18.
Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647-665. https://doi.org/10.1007/s10115-013-0679-x Go to original source...
Sujon, K. M., Hassan, R. B., Towshi, Z. T., Othman, M. A., Samad, M. A., & Choi, K. (2024). When to use standardization and normalization: Empirical evidence from machine learning models and XAI. IEEE Access, 12, 135300-135314. https://doi.org/10.1109/access.2024.3462434 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return