Automated Machine Learning in Action: Performance Evaluation for Predictive Analytics Tasks

doi:10.18267/j.aip.288

Acta Informatica Pragensia X:X | DOI: 10.18267/j.aip.288540

Automated Machine Learning in Action: Performance Evaluation for Predictive Analytics Tasks

Nicolas Leyh ORCID...: TUM School of Management, Technical University of Munich, Munich, Germany

Background: As organizations increasingly seek data-driven insights, the demand for machine learning (ML) expertise outpaces the current workforce supply. Automated Machine Learning (AutoML) frameworks help close this gap by streamlining the ML pipeline, making advanced modeling accessible to non-specialists.

Objective: This study evaluates the performance of four open-source AutoML frameworks—Auto-Keras, Auto-Sklearn, H2O, and TPOT—in predictive analytics, focusing on both binary and multiclass classification. The goal is to identify performance strengths and limitations under varying dataset conditions and propose improvements for framework optimization.

Methods: Quantitative experimental research design was employed. 22 publicly available datasets were selected from established benchmarking sources, covering diverse predictive analytics challenges. Framework performance was assessed across twelve data segments, defined by characteristics such as sample size, feature count, and categorical feature proportion. Evaluation metrics included AUC for binary and accuracy/F1 for multiclass classification tasks, with standardized runtime constraints applied to ensure comparability.

Results: The findings show that H2O delivered strong results across diverse datasets, particularly for binary classification. However, no single framework achieved superior performance across all data segments. Auto-Sklearn performed well in multiclass classification, especially with higher feature counts, while Auto-Keras and TPOT demonstrated variable outcomes depending on dataset complexity. Performance declined notably in scenarios with high categorical proportions, severe class imbalance, or extensive missing values.

Conclusion: This study demonstrates that AutoML frameworks can substantially support predictive analytics but exhibit distinct strengths and limitations under specific data conditions. While H2O proved most robust overall, targeted refinements such as enhancing feature selection in Auto-Keras and improving categorical variable handling in Auto-Sklearn could further optimize performance. The findings provide actionable insights for both practitioners selecting frameworks and developers enhancing AutoML design, highlighting the need for ongoing innovation to ensure adaptability to complex predictive analytics tasks.

Received: June 19, 2025; Revised: August 23, 2025; Accepted: August 25, 2025; Prepublished online: August 31, 2025

Download citation

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J. M., Devin, M., Ghemawat, S., Goodfellow, I. J., Harp, A., Irving, G., Isard, M., Jia, Y., Józefowicz, R., Kaiser, Ł., Kudlur, M., et al. (2016). TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467. https://doi.org/10.48550/arXiv.1603.04467 Go to original source...
Asniar, & Surendro, K. (2019). Predictive analytics for predicting customer behavior. In 2019 International Conference of Artificial Intelligence and Information Technology (ICAIIT) (pp. 230-233). IEEE. https://doi.org/10.1109/ICAIIT.2019.8834571 Go to original source...
Balaji, A., & Allen, A. (2018). Benchmarking automatic machine learning frameworks. arXiv:1808.06492. https://doi.org/10.48550/arXiv.1808.06492 Go to original source...
Barbudo, R., Ventura, S., & Romero, J. R. (2023). Eight years of AutoML: Categorisation, review and trends. Knowledge and Information Systems, 65(12), 5097-5149. https://doi.org/10.1007/s10115-023-01935-1 Go to original source...
Bell, S. (2009). Experimental design. In International Encyclopedia of Human Geography (pp. 672-675). Elsevier. https://doi.org/10.1016/B978-008044910-4.00431-4 Go to original source...
Bertsimas, D., & Kallus, N. (2019). From predictive to prescriptive analytics. Management Science, 66(3), 1025-1044. https://doi.org/10.1287/mnsc.2018.3253 Go to original source...
Bilal, M., Ali, G., Iqbal, M. W., Anwar, M., Malik, M. S. A., & Kadir, R. A. (2022). Auto-PreP: Efficient and automated data preprocessing pipeline. IEEE Access, 10, 107764-107784. https://doi.org/10.1109/ACCESS.2022.3198662 Go to original source...
Bischl, B., Casalicchio, G., Feurer, M., Gijsbers, P., Hutter, F., Lang, M., Mantovani, R. G., Van Rijn, J. N., & Vanschoren, J. (2017). OpenML Benchmarking Suites. arXiv:1708.03731. https://doi.org/10.48550/arXiv.1708.03731 Go to original source...
Bischl, B., Casalicchio, G., Feurer, M., Gijsbers, P., Hutter, F., Lang, M., Mantovani, R. G., Van Rijn, J. N., & Vanschoren, J. (2019). OpenML Benchmarking Suites. arXiv:1708.03731. https://doi.org/10.48550/arXiv.1708.03731 Go to original source...
Bischl, B., Casalicchio, G., Feurer, M., Gijsbers, P., Hutter, F., Lang, M., Mantovani, R. G., Van Rijn, J. N., & Vanschoren, J. (2021). OpenML benchmarking suites. In 35th Conference on Neural Information Processing Systems. NeuroIPS.
Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249-259. https://doi.org/10.1016/j.neunet.2018.07.011 Go to original source...
Chollet, F. (2015). Keras. https://keras.io
Conrad, F., Mälzer, M., Schwarzenberger, M., Wiemer, H., & Ihlenfeldt, S. (2022). Benchmarking AutoML for regression tasks on small tabular data in materials design. Scientific Reports, 12(1), Article number 19350. https://doi.org/10.1038/s41598-022-23327-1 Go to original source...
Coors, S., Schalk, D., Bischl, B., & Rügamer, D. (2021). Automatic Componentwise Boosting: An Interpretable AutoML System. arXiv:2109.05583. https://arxiv.org/abs/2109.05583
Dablain, D., Krawczyk, B., & Chawla, N. V. (2022). DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6390-6404. https://doi.org/10.1109/TNNLS.2021.3136503 Go to original source...
Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2019). A survey on ensemble learning. Frontiers of Computer Science, 14(2), 241-258. https://doi.org/10.1007/s11704-019-8208-z Go to original source...
Eldeeb, H., Maher, M., Elshawi, R., & Sakr, S. (2023). AutoMLBench: A comprehensive experimental evaluation of automated machine learning frameworks. Expert Systems with Applications, 243, 122877. https://doi.org/10.1016/j.eswa.2023.122877 Go to original source...
Feldman, D., Schmidt, M., & Sohler, C. (2020). Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM Journal on Computing, 49(3), 601-657. https://doi.org/10.1137/18M1209854 Go to original source...
Fels, A. E., Mandi, L., Kammoun, A., Ouazzani, N., Monga, O., & Hbid, M. L. (2023). Artificial intelligence and wastewater treatment: A global scientific perspective through text mining. Water, 15(19), Article 3487. https://doi.org/10.3390/w15193487 Go to original source...
Ferreira, L., Pilastri, A., Martins, C. M., Pires, P. M., & Cortez, P. (2021). A comparison of AutoML tools for machine learning, deep learning and XGBoost. In 2021 International Joint Conference on Neural Networks (IJCNN), (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN52387.2021.9534091 Go to original source...
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems, 28. NIPS.
Feurer, M., Van Rijn, J. N., Kadra, A., Gijsbers, P., Mallik, N., Ravi, S., Müller, A., Vanschoren, J., & Hutter, F. (2021). OpenML-Python: An extensible Python API for OpenML. Journal of Machine Learning Research, 22, 1-5.
Ge, P. (2020). Analysis on approaches and structures of automated machine learning frameworks. In 2020 International Conference on Communications, Information System and Computer Engineering (CISCE) (pp. 474-477). IEEE. https://doi.org/10.1109/CISCE50729.2020.00106 Go to original source...
Gijsbers, P., LeDell, E., Thomas, J., Poirier, S., Bischl, B., & Vanschoren, J. (2019). An open source AutoML benchmark. arXiv:1907.00909. https://doi.org/10.48550/arXiv.1907.00909 Go to original source...
Gijsbers, P., Bueno, M. L., Coors, S., LeDell, E., Poirier, S., Thomas, J., ... & Vanschoren, J. (2024). AMLB: an AutoML Benchmark. Journal of Machine Learning Research, 25, 1-65.
Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on typical tabular data? In 36th Conference on Neural Information Processing Systems (NeurIPS 2022). NeurIPS.
Grover, V., Chiang, R. H. L., Liang, T., & Zhang, D. (2018). Creating strategic business value from big data analytics: A research framework. Journal of Management Information Systems, 35(2), 388-423. https://doi.org/10.1080/07421222.2018.1451951 Go to original source...
H2O.ai. (2023). H2O AutoML. https://h2o.ai/platform/h2o-automl/
Halvari, T., Nurminen, J. K., & Mikkonen, T. (2020). Testing the robustness of AutoML systems. arXiv:2005.02649. https://doi.org/10.4204/EPTCS.319.8 Go to original source...
James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). K-fold cross-validation. In An introduction to statistical learning with applications in Python (1st ed., pp. 206-208). Springer. Go to original source...
Jin, H., Chollet, F., Song, Q., & Hu, X. (2023). AutoKeras: An AutoML library for deep learning. Journal of Machine Learning Research, 24(6), 1-6.
Jolly, K. (2018). Machine learning with scikit-learn quick start guide: Classification, regression, and clustering techniques in Python. Packt Publishing.
Kaggle. (2023). Kaggle. https://www.kaggle.com/
Karras, A., Karras, C., Schizas, N., Avlonitis, M., & Sioutas, S. (2023). Automl with bayesian optimizations for big data management. Information, 14(4), Article 223. https://doi.org/10.3390/info14040223 Go to original source...
Kavanagh, P. (2004). The open source definition. In Elsevier eBooks (pp. 321-322). Go to original source...
Khan, A. A., Dwivedi, P., Mugde, S., Sajidha, S., Sharma, G., & Soni, G. (2023). Toward automated machine learning for genomics: Evaluation and comparison of state-of-the-art AutoML approaches. In Data Science for Genomics, (pp. 129-152). Elsevier. https://doi.org/10.1016/B978-0-323-98352-5.00017-3 Go to original source...
Kelly, M., Longjohn, R., & Nottingham, K. (2023). The UCI machine learning repository. https://archive.ics.uci.edu
Lampert, C. H., Nickisch, H., & Harmeling, S. (2013). Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3), 453-465. https://doi.org/10.1109/TPAMI.2013.140 Go to original source...
LeDell, E., & Poirier, S. (2020). H2O AutoML: Scalable automatic machine learning. In 7th ICML Workshop on Automated Machine Learning. ICML. https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf
Lee, C. S., Cheang, P. Y. S., & Moslehpour, M. (2022). Predictive analytics in business analytics: Application of Decision Tree in Business Decision Making. Advances in Decision Sciences, 26(1), 1-29. https://doi.org/10.47654/v26y2022i1p1-30 Go to original source...
Lenkala, S., Marry, R., Gopovaram, S. R., Akinci, T. C., & Topsakal, O. (2023). Comparison of automated machine learning (AutoML) tools for epileptic seizure detection using electroencephalograms (EEG). Computers, 12(10), Article 197. https://doi.org/10.3390/computers12100197 Go to original source...
Mathew, J., Kshirsagar, R., Abidin, D. Z., Griffin, J., Kanarachos, S., James, J., Alamaniotis, M., & Fitzpatrick, M. E. (2023). A comparison of machine learning methods to classify radioactive elements using prompt-gamma-ray neutron activation data. Scientific Reports, 13(1), Article 9948. https://doi.org/10.1038/s41598-023-36832-8 Go to original source...
Olson, R. S., Urbanowicz, R. J., Andrews, P. C., Lavender, N. A., Kidd, L. C. R., & Moore, J. H. (2016). Automating biomedical data science through tree-based pipeline optimization. In Applications of Evolutionary Computation, (pp. 123-137). Springer. https://doi.org/10.1007/978-3-319-31204-0_9 Go to original source...
Omar, I., Khan, M., Starr, A., & Abou Rok Ba, K. (2023). Automated prediction of crack propagation using H2O AutoML. Sensors, 23(20), Article 8419. https://doi.org/10.3390/s23208419 Go to original source...
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Pfisterer, F., Thomas, J., & Bischl, B. (2019). Towards human centered AutoML. arXiv:1911.02391. https://doi.org/10.48550/arXiv.1911.02391 Go to original source...
Pio, P. B., Rívolli, A., De Carvalho, A. C. P. L. F., & García, L. (2023). A review on preprocessing algorithm selection with meta-learning. Knowledge and Information Systems, 66(1), 1-28. https://doi.org/10.1007/s10115-023-01970-y Go to original source...
Raschka, S., Patterson, J., & Nolet, C. (2020). Machine learning in Python: main developments and technology trends in data science, machine learning, and artificial intelligence. Information, 11(4), Article 193. https://doi.org/10.3390/info11040193 Go to original source...
Ray, P., Reddy, S. S., & Banerjee, T. (2021). Various dimension reduction techniques for high dimensional data analysis: a review. Artificial Intelligence Review, 54(5), 3473-3515. https://doi.org/10.1007/s10462-020-09928-0 Go to original source...
Salih, A., Raisi-Estabragh, Z., Boscolo Galazzo, I., Radeva, P., Petersen, S. E., Menegaz, G., & Lekadir, K. (2023). A perspective on explainable artificial intelligence methods: SHAP and LIME. arXiv preprint arXiv:2305.02012. https://arxiv.org/abs/2305.02012 Go to original source...
SAS Institute. (2022). How to solve the data science skills shortage. SAS Institute. https://www.sas.com/content/dam/SAS/documents/technical/education/en/solve-data-science-skills-shortage-uk-113039.pdf
Schmitt, M. (2023). Automated machine learning: AI-driven decision making in business analytics. Intelligent Systems with Applications, 18, 200188. https://doi.org/10.1016/j.iswa.2023.200188 Go to original source...
Sun, Y., Wong, A. K., & Kamel, M. S. (2009). Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4), 687-719. https://doi.org/10.1142/S0218001409007326 Go to original source...
Tan, J., Yang, J., Wu, S., Chen, G., & Zhao, J. (2021). A critical look at the current train/test split in machine learning. arXiv:2106.04525. https://doi.org/10.48550/arXiv.2106.04525 Go to original source...
The pandas development team. (2020). Pandas-dev/pandas: pandas (2.2.1) [Software]. Zenodo. https://doi.org/10.5281/zenodo.3509134 Go to original source...
Topsakal, O., & Akinci, T. Ç. (2023). Classification and regression using automatic machine learning (AutoML) - Open source code for quick adaptation and comparison. Balkan Journal of Electrical and Computer Engineering, 11(3), 257-261. https://doi.org/10.17694/bajece.1312764 Go to original source...
Truong, A., Walters, A., Goodsitt, J., Hines, K. E., Bruss, C. B., & Farivar, R. (2019). Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 1471-1479). IEEE. https://doi.org/10.1109/ICTAI.2019.00209 Go to original source...
Tuggener, L., Amirian, M., Rombach, K., Lörwald, S., Varlet, A., Westermann, C., & Stadelmann, T. (2019). Automated machine learning in practice: State of the art and recent results. In 2019 6th Swiss Conference on Data Science (SDS) (pp. 31-36). IEEE. https://doi.org/10.1109/SDS.2019.00-11 Go to original source...
Vanschoren, J., Van Rijn, J. N., Bischl, B., & Torgo, L. (2014). OpenML: Networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2), 49-60. https://doi.org/10.1145/2641190.2641198 Go to original source...
Velmurugan, M., Ouyang, C., Moreira, C., & Sindhgatta, R. (2020). Evaluating explainable methods for predictive process analytics: A functionally-grounded approach. arXiv preprint arXiv:2012.04218. https://arxiv.org/abs/2012.04218
Zöller, M. A., & Huber, M. F. (2021). Benchmark and survey of automated machine learning frameworks. Journal of Artificial Intelligence Research, 70, 409-472. https://doi.org/10.1613/jair.1.11854 Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

Return