Acta Informatica Pragensia 2019, 8(1), 18-37 | DOI: 10.18267/j.aip.1232904

Dolovanie dát z bankového sektora

Anna Biceková, Ľudmila Pusztová
Department of Cybernetics and Artificial Intelligence, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 042 00 Košice, Slovak Republic

Predkladaný príspevok sa zaoberá problematikou bankrotov podnikov a definuje spôsoby akými je možné tomuto nežiadúcemu stavu predísť. V súčasnosti medzi tieto spôsoby patria hlavne moderné prístupy z oblasti získavania znalostí a dolovania v dátach, ktoré podnikom dokážu pomôcť v mnohých smeroch. V rámci praktickej aplikácie metód dolovania v dátach s cieľom predikovať budúci stav podniku, boli použité dáta finančných ukazovateľov poľských spoločností. V predkladanom článku sme využili algoritmy vhodné na predikciu bankrotov – rozhodovacie stromy, ktoré poskytujú jednoduchú interpretáciu výsledkov. V niektorých experimentoch sme využili aj metódy výberu atribútov, LASSO alebo PCA metódu. Postup práce sa riadi metodológiou CRISP-DM, ktorá ponúka popis dôležitých krokov potrebných pri rôznych analytických úlohách. Súčasťou článku je aj analýza súčasného stavu, ktorá predstavuje riešenia danej problematiky inými autormi. Po vyhodnotení všetkých modelov sme dospeli k záveru, že algoritmus C5.0 je na 97,07 % schopný predikovať zbankrotovanie respektíve nezbankrotovanie podniku, pričom použitie metód výberu atribútov nebolo potrebné.

Keywords: Predikcia bankrotov, dolovanie v dátach, CRIPS-DM metodológia, rozhodovacie stromy

Data Mining from the Banking Sector´s Data

This paper deals with the prediction of company bankruptcies and defines how this undesirable state can be prevented. Currently, these methods include modern approaches from the area of data mining that can help companies in many ways. In a practical application of data mining methods for predicting the future state of a company, financial indicators of Polish companies were used. In the analyses, we used algorithms suitable for bankruptcy prediction – decision trees that provide a simple interpretation of results. In some experiments, we also used attribute selection methods, LASSO, or the PCA method. The workflow is governed by the CRISP-DM methodology, which describes the important steps needed for different analytical tasks. Part of the article is an analysis of the current state, which presents solutions to this problem suggested by other authors. After evaluating all models, we concluded that the C5.0 algorithm is capable of predicting a company’s bankruptcy or non-bankruptcy with 97.07 % accuracy, without the use of attribute selection methods.

Keywords: Bankruptcy prediction, Data mining, CRISP-DM methodology, Decision trees

Received: May 20, 2019; Accepted: June 28, 2019; Prepublished online: July 2, 2019; Published: July 10, 2019  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Biceková, A., & Pusztová, Ľ. (2019). Data Mining from the Banking Sector´s Data. Acta Informatica Pragensia8(1), 18-37. doi: 10.18267/j.aip.123
Download citation

References

  1. Altman, E. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. The Journal of Finance, 23(4), 589-609. Go to original source...
  2. Atiya, A.F. (2001). Bankruptcy Prediction for Credit Risk Using Neural Networks: A Survey and New Results. IEEE Transactions on Neural Networks, 12(4), 929-935. doi: 10.1109/72.935101 Go to original source...
  3. Beaver, W.H. (1966) Financial Ratios as Predictors of Failure. Journal of Accounting Research, 4, 71-111. Go to original source...
  4. Bellovary, J. L., Giacomino, D. E., & Akers, M. D. (2007). A Review of Bankruptcy Prediction Studies: 1930 to Present. Journal of Financial Education, 33, 1-42.
  5. Dallas, G. (2013). Principal Component Analysis 4 Dummies: Eigenvectors, Eigenvalues and Dimension Reduction. Retrieved May 15, 2019, from https://georgemdallas.wordpress.com/2013/10/30/principal-component-analysis-4-dummies-eigenvectors-eigenvalues-and-dimension-reduction/
  6. Delina, R., & Packová, M. (2013). Validácia predikčných bankrotových modelov v podmienkach SR. E+M Ekonomie a Management, 16(3), 101-112.
  7. Dwyer, G.P., & Tkac, P.A. (2009). The financial crisis of 2008 in fixed income markets. Journal of International Money and Finance, 28(8), 1293-1316. doi: 10.1016/j.jimonfin.2009.08.007 Go to original source...
  8. Fan, S., Liu, G., & Chen, Z. (2017). Anomaly detection methods for bankruptcy prediction. In Proceedings of the 4th International Conference on Systems and Informatics (pp.1456-1460). New York: IEEE. doi: 10.1109/ICSAI.2017.8248515 Go to original source...
  9. FitzPatrick, P. (1932). A Comparison of the Ratios of Successful Industrial Enterprises with Those of Failed Companies. The Certified Public Accountant, in three issues: October, 598-605; November, 656-662; December, 727-731.
  10. Fonti, V. (2017): Feature selection using LASSO. Retrieved May 15, 2019, from https://beta.vu.nl/nl/Images/werkstuk-fonti_tcm235-836234.pdf
  11. Hardinata, L., Warsito, B., & Suparti, A. (2018). Bankruptcy prediction based on financial ratios using Jordan Recurrent Neural Networks: a case study in Polish companies. Journal of Physics: Conference Series. 1025 (1), 1-6. doi: 10.1088/1742-6596/1025/1/012098 Go to original source...
  12. Chen, M.Y. (2011). Bankruptcy prediction in firms with statistical and intelligent techniques and a comparison of evolutionary computation approaches. Computers & Mathematics with Applications, 62(12), 4514-4524. doi: 10.1016/j.camwa.2011.10.030 Go to original source...
  13. Ivashina, V., & Scharfstein, D. S. (2010). Bank Lending During the Financial Crisis of 2008. Journal of Financial Economics, 97(3), 319-338. doi: 10.1016/j.jfineco.2009.12.001 Go to original source...
  14. Jardin, P., & Séverin, E. (2011). Predicting corporate bankruptcy using a self-organizing map: An empirical study to improve the forecasting horizon of financial failure model. Decision Support Systems, 51(3), 701-711. doi: 10.1016/j.dss.2011.04.001 Go to original source...
  15. Merwin, C. L. (1942). Financing small corporations in five manufacturing industries. New York: National Bureau of Economic Research.
  16. Moreira, J., Carvalho, A., & Horvath, T. (2018). A general introduction to data analytics. Hoboken: John Wiley. Go to original source...
  17. Nagaraj, K., & Sridhar, A. (2015). A predictive system for detection of bankruptcy using machine learning techniques. Retrieved May 15, 2019, from https://arxiv.org/abs/1502.03601 Go to original source...
  18. Paralič, J. (2003). Objavovanie znalostí v databázach. Košice: Elfa.
  19. Ringner, M. (2008). What is principal component analysis?. Nature Biotechnology, 26(3), 303-304. Go to original source...
  20. Rybarova, D., Braunova, M., & Jantosova, L. (2016). Analysis of the Construction Industry in the Slovak Republic by Bankruptcy Model. Procedia - Social and Behavioral Sciences, 230, 298-306. doi: 10.1016/j.sbspro.2016.09.038 Go to original source...
  21. Smith, R., & Winakor, A. (1935). Changes in Financial Structure of UnsuccessfUl Industrial Corporations. In Bureau of Business Research, Bulletin No. 51. Urbana: University of Illinois Press.
  22. Tomczak, S. (2016). Polish Companies Bankruptcy Data, Data Set. UCI - Machine Learning Repository. Retrieved May 15, 2019, from https://archive.ics.uci.edu/ml/datasets/Polish+companies+bankruptcy+data
  23. Zieba, M., Tomczak, S., & Tomczak, J. M. (2016). Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Systems with Applications, 58, 93-101. doi: 10.1016/j.eswa.2016.04.001 Go to original source...
  24. Zhou, A., & Elhag, T.M.S. (2007). Apply Logit analysis in Bankruptcy Prediction. In Proceedings of the 7th WSEAS International Conference on Simulation, Modelling and Optimization (pp. 302-308). Stevens Point: WSEAS.

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.