Acta Informatica Pragensia 2022, 11(3), 293-308 | DOI: 10.18267/j.aip.1934191

A Novel Automatic Relational Database Normalization Method

Emre Akadal ORCID...1, Mehmet Hakan Satman ORCID...2
1 Department of Management Information Systems, Faculty of Economics, Istanbul University, Beyazıt Kampüsü, 34116 Fatih/İstanbul, Turkey
2 Department of Econometrics, Faculty of Economics, Istanbul University, Beyazıt Kampüsü, 34116 Fatih/İstanbul, Turkey

The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.

Keywords: Relational databases; Automatic normalization; Genetic algorithms; Optimization; Decision support.

Received: July 26, 2022; Revised: August 25, 2022; Accepted: September 8, 2022; Prepublished online: September 9, 2022; Published: December 26, 2022  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Akadal, E., & Satman, M.H. (2022). A Novel Automatic Relational Database Normalization Method. Acta Informatica Pragensia11(3), 293-308. doi: 10.18267/j.aip.193
Download citation

References

  1. Ahmad, R., Saknakosnak, P., & Hooi, Y. K. (2014). Excel-database converting system using data normalization technique. In Proceedings of the First International Conference on Advanced Data and Information Engineering, (pp. 23-30). Springer. https://doi.org/10.1007/978-981-4585-18-7_3 Go to original source...
  2. Ahmedi, L., Jakupi, N., & Jajaga, E. (2014). NORMALDB-A Logic-Based Interactive e-Learning Tool for Database Normalization and Denormalization. In eLmL 2012: The Fourth International Conference on Mobile, Hybrid, and On-line Learning. http://personales.upv.es/thinkmind/dl/conferences/elml/elml_2012/elml_2012_2_40_50084.pdf
  3. Bahmani, A. H., Naghibzadeh, M., & Bahmani, B. (2008). Automatic database normalization and primary key generation. In Canadian Conference on Electrical and Computer Engineering, (pp. 11-16). IEEE. https://doi.org/10.1109/CCECE.2008.4564486 Go to original source...
  4. Bernstein, P. A. (1976). Synthesizing Third Normal Form Relations from Functional Dependencies. ACM Transactions on Database Systems, 1(4), 277-298. https://doi.org/10.1145/320493.320489 Go to original source...
  5. Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, 13(6), 377-387. https://doi.org/10.1145/362384.362685 Go to original source...
  6. Codd, E. F. (1982). Relational database: A practical foundation for productivity. Communications of the ACM, 25(2), 109-117. https://doi.org/10.1145/358396.358400 Go to original source...
  7. Delplanque, J., Etien, A., Anquetil, N., & Auverlot, O. (2018). Relational database schema evolution: An industrial case study. In Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, (pp. 635-644). IEEE. https://doi.org/10.1109/ICSME.2018.00073 Go to original source...
  8. Dimitrieski, V., Čeliković, M., Aleksić, S., Ristić, S., Alargt, A., & Luković, I. (2015). Concepts and evaluation of the extended entity-relationship approach to database design in a multi-paradigm information system modeling tool. Computer Languages, Systems and Structures, 44, 299-318. https://doi.org/10.1016/j.cl.2015.08.011 Go to original source...
  9. Dongare, Y., Dhabe, P., & Deshmukh, S. (2011). RDBNorma: A semi-automated tool for relational database schema normalization up to third normal form. International Journal of Database Management Systems, 3(1), 133-154. https://doi.org/10.5121/ijdms.2011.3109 Go to original source...
  10. Du, H., & Wery, L. (1999). Micro: A normalization tool for relational database designers. Journal of Network and Computer Applications, 22(4), 215-232. https://doi.org/10.1006/jnca.1999.0096 Go to original source...
  11. Fanguy, R. A., & Betty Kleen, N. A. (2005). Normalization Shootout: A Competitive Game That Impacts Student Learning. Issues in Information Systems, 6(1), 21-27. https://doi.org/10.48009/1_iis_2005_21-27 Go to original source...
  12. Gärtner, T. (2003). A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1), 49-58. https://doi.org/10.1145/959242.959248 Go to original source...
  13. Goldberg, D. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Longman Publishing.
  14. Hoffer, J., Ramesh, V., & Topi, H. (2016). Modern Database Management. Pearson.
  15. Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press. Go to original source...
  16. Koza, J. R. (1995). Survey of genetic algorithms and genetic programming. In Proceedings of WESCON'95, (pp. 589-594). IEEE. https://doi.org/10.1109/wescon.1995.485447 Go to original source...
  17. Kraleva, R., Kralev, V., Sinyagina, N., Koprinkova-Hristova, P., & Bocheva, N. (2018). Design and analysis of a relational database for behavioral experiments data processing. International Journal of Online and Biomedical Engineering, 14(2), 117-132. https://doi.org/10.3991/ijoe.v14i02.7988 Go to original source...
  18. Kurnianda, N. R. (2018). Database Design for Customer Retention and Loyalty Administration Information System. https://www.academia.edu/37566052/Database_Design_for_Customer_Retention_and_Loyalty_Administration_Information_System
  19. Lee, H. (1995). Justifying database normalization: a cost/benefit model. Information Processing and Management, 31(1), 59-67. https://doi.org/10.1016/0306-4573(95)80006-F Go to original source...
  20. Lim, S. M., Sultan, A. B. M., Sulaiman, M. N., Mustapha, A., & Leong, K. Y. (2017). Crossover and mutation operators of genetic algorithms. International Journal of Machine Learning and Computing, 7(1), 9-12. https://doi.org/10.18178/ijmlc.2017.7.1.611 Go to original source...
  21. Mallig, N. (2010). A relational database for bibliometric analysis. Journal of Informetrics, 4(4), 564-580. https://doi.org/10.1016/j.joi.2010.06.007 Go to original source...
  22. Mitrovic, A. (2002). NORMIT: A Web-enabled tutor for database normalization. In Proceedings - International Conference on Computers in Education, ICCE 2002, (pp. 1276-1280). IEEE. https://doi.org/10.1109/CIE.2002.1186210 Go to original source...
  23. O'Mara, J., Meredig, B., & Michel, K. (2016). Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access. JOM, 68(8), 2031-2034. https://doi.org/10.1007/s11837-016-1984-0 Go to original source...
  24. Read, R. L., Fussell, D. S., & Silberschatz, A. (1992). A Multi-Resolution Relational Data Model. https://www.vldb.org/conf/1992/P139.PDF
  25. Riordan, R. M. (2005). Designing effective database systems. Addison-Wesley Professional.
  26. Satman, M. H., & Akadal, E. (2020). Machine Coded Compact Genetic Algorithms for Real Parameter Optimization Problems. Alphanumeric Journal, 8(1), 43-58. https://doi.org/10.17093/alphanumeric.576919 Go to original source...
  27. Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1-37. https://doi.org/10.18637/jss.v053.i04 Go to original source...
  28. Scrucca, L. (2017). On Some Extensions to GA Package: Hybrid Optimisation, Parallelisation and Islands EvolutionOn some extensions to GA package: hybrid optimisation, parallelisation and islands evolution. The R Journal, 9(1), 187-206. https://doi.org/10.32614/RJ-2017-008 Go to original source...
  29. Soler, J., Boada, I., Prados, F., & Poch, J. (2006). A web-based problem-solving environment for database normalization. In Proceedings of the 8th International Symposium on Computers in Education, SIIE 2006, (pp. 86-93). ACME. http://acme.udg.cat/articles/siie2006.pdf
  30. Sumathi, S., & Esakkirajan, S. (2007). Fundamentals of Relational Database Management Systems. Springer. https://doi.org/10.1007/978-3-540-48399-1 Go to original source...
  31. Suranauwarat, S. (2017). An Approach to Solving Technical Difficulties Facing Non-CS Students in a Database Class. International Journal of Modern Education and Computer Science, 9(2), 14-26. https://doi.org/10.5815/ijmecs.2017.02.02 Go to original source...
  32. Tessler, S. (2002). Data Model and Relational Database Design for the New England Water-Use Data System (USGS Open-File Report 01-359). USGS. https://pubs.usgs.gov/of/2001/ofr01359/ Go to original source...
  33. Verma, S. (2012). Comparing Manual and Automatic Normalization Techniques for Relational Database. https://www.semanticscholar.org/paper/COMPARING-MANUAL-AND-AUTOMATIC-NORMALIZATION-FOR-Verma/769b6e0b6b6cc84ff7e4822133b99367ffcb0531
  34. Yazici, A., & Karakaya, Z. (2007). JMathNorm: A database normalization tool using mathematica. In Computational Science - ICCS 2007, (pp. 186-193). Springer. https://doi.org/10.1007/978-3-540-72586-2_27 Go to original source...
  35. Zhu, X. H., Zeng, Q. L., & Cao, Q. H. (2010). A Complex XML Schema to Map the XML Documents of Distance Education Technical Specifications into Relational Database. International Journal of Digital Content Technology and its Applications, 4(8), 182-192. Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.