Acta Informatica Pragensia 2022, 11(3), 293-308 | DOI: 10.18267/j.aip.1934191
A Novel Automatic Relational Database Normalization Method
- 1 Department of Management Information Systems, Faculty of Economics, Istanbul University, Beyazıt Kampüsü, 34116 Fatih/İstanbul, Turkey
- 2 Department of Econometrics, Faculty of Economics, Istanbul University, Beyazıt Kampüsü, 34116 Fatih/İstanbul, Turkey
The increase in data diversity and the fact that database design is a difficult process make it practically impossible to design a unique database schema for all datasets encountered. In this paper, we introduce a fully automatic genetic algorithm-based relational database normalization method for revealing the right database schema using a raw dataset and without the need for any prior knowledge. For measuring the performance of the algorithm, we perform a simulation study using 250 datasets produced using 50 well-known databases. A total of 2500 simulations are carried out, ten times for each of five denormalized variations of all database designs containing different synthetic contents. The results of the simulation study show that the proposed algorithm discovers exactly 72% of the unknown database schemas. The performance can be improved by fine-tuning the optimization parameters. The results of the simulation study also show that the devised algorithm can be used in many datasets to reveal structs of databases when only a raw dataset is available at hand.
Keywords: Relational databases; Automatic normalization; Genetic algorithms; Optimization; Decision support.
Received: July 26, 2022; Revised: August 25, 2022; Accepted: September 8, 2022; Prepublished online: September 9, 2022; Published: December 26, 2022 Show citation
References
- Ahmad, R., Saknakosnak, P., & Hooi, Y. K. (2014). Excel-database converting system using data normalization technique. In Proceedings of the First International Conference on Advanced Data and Information Engineering, (pp. 23-30). Springer. https://doi.org/10.1007/978-981-4585-18-7_3
Go to original source...
- Ahmedi, L., Jakupi, N., & Jajaga, E. (2014). NORMALDB-A Logic-Based Interactive e-Learning Tool for Database Normalization and Denormalization. In eLmL 2012: The Fourth International Conference on Mobile, Hybrid, and On-line Learning. http://personales.upv.es/thinkmind/dl/conferences/elml/elml_2012/elml_2012_2_40_50084.pdf
- Bahmani, A. H., Naghibzadeh, M., & Bahmani, B. (2008). Automatic database normalization and primary key generation. In Canadian Conference on Electrical and Computer Engineering, (pp. 11-16). IEEE. https://doi.org/10.1109/CCECE.2008.4564486
Go to original source...
- Bernstein, P. A. (1976). Synthesizing Third Normal Form Relations from Functional Dependencies. ACM Transactions on Database Systems, 1(4), 277-298. https://doi.org/10.1145/320493.320489
Go to original source...
- Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, 13(6), 377-387. https://doi.org/10.1145/362384.362685
Go to original source...
- Codd, E. F. (1982). Relational database: A practical foundation for productivity. Communications of the ACM, 25(2), 109-117. https://doi.org/10.1145/358396.358400
Go to original source...
- Delplanque, J., Etien, A., Anquetil, N., & Auverlot, O. (2018). Relational database schema evolution: An industrial case study. In Proceedings - 2018 IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, (pp. 635-644). IEEE. https://doi.org/10.1109/ICSME.2018.00073
Go to original source...
- Dimitrieski, V., Čeliković, M., Aleksić, S., Ristić, S., Alargt, A., & Luković, I. (2015). Concepts and evaluation of the extended entity-relationship approach to database design in a multi-paradigm information system modeling tool. Computer Languages, Systems and Structures, 44, 299-318. https://doi.org/10.1016/j.cl.2015.08.011
Go to original source...
- Dongare, Y., Dhabe, P., & Deshmukh, S. (2011). RDBNorma: A semi-automated tool for relational database schema normalization up to third normal form. International Journal of Database Management Systems, 3(1), 133-154. https://doi.org/10.5121/ijdms.2011.3109
Go to original source...
- Du, H., & Wery, L. (1999). Micro: A normalization tool for relational database designers. Journal of Network and Computer Applications, 22(4), 215-232. https://doi.org/10.1006/jnca.1999.0096
Go to original source...
- Fanguy, R. A., & Betty Kleen, N. A. (2005). Normalization Shootout: A Competitive Game That Impacts Student Learning. Issues in Information Systems, 6(1), 21-27. https://doi.org/10.48009/1_iis_2005_21-27
Go to original source...
- Gärtner, T. (2003). A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1), 49-58. https://doi.org/10.1145/959242.959248
Go to original source...
- Goldberg, D. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Longman Publishing.
- Hoffer, J., Ramesh, V., & Topi, H. (2016). Modern Database Management. Pearson.
- Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press.
Go to original source...
- Koza, J. R. (1995). Survey of genetic algorithms and genetic programming. In Proceedings of WESCON'95, (pp. 589-594). IEEE. https://doi.org/10.1109/wescon.1995.485447
Go to original source...
- Kraleva, R., Kralev, V., Sinyagina, N., Koprinkova-Hristova, P., & Bocheva, N. (2018). Design and analysis of a relational database for behavioral experiments data processing. International Journal of Online and Biomedical Engineering, 14(2), 117-132. https://doi.org/10.3991/ijoe.v14i02.7988
Go to original source...
- Kurnianda, N. R. (2018). Database Design for Customer Retention and Loyalty Administration Information System. https://www.academia.edu/37566052/Database_Design_for_Customer_Retention_and_Loyalty_Administration_Information_System
- Lee, H. (1995). Justifying database normalization: a cost/benefit model. Information Processing and Management, 31(1), 59-67. https://doi.org/10.1016/0306-4573(95)80006-F
Go to original source...
- Lim, S. M., Sultan, A. B. M., Sulaiman, M. N., Mustapha, A., & Leong, K. Y. (2017). Crossover and mutation operators of genetic algorithms. International Journal of Machine Learning and Computing, 7(1), 9-12. https://doi.org/10.18178/ijmlc.2017.7.1.611
Go to original source...
- Mallig, N. (2010). A relational database for bibliometric analysis. Journal of Informetrics, 4(4), 564-580. https://doi.org/10.1016/j.joi.2010.06.007
Go to original source...
- Mitrovic, A. (2002). NORMIT: A Web-enabled tutor for database normalization. In Proceedings - International Conference on Computers in Education, ICCE 2002, (pp. 1276-1280). IEEE. https://doi.org/10.1109/CIE.2002.1186210
Go to original source...
- O'Mara, J., Meredig, B., & Michel, K. (2016). Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access. JOM, 68(8), 2031-2034. https://doi.org/10.1007/s11837-016-1984-0
Go to original source...
- Read, R. L., Fussell, D. S., & Silberschatz, A. (1992). A Multi-Resolution Relational Data Model. https://www.vldb.org/conf/1992/P139.PDF
- Riordan, R. M. (2005). Designing effective database systems. Addison-Wesley Professional.
- Satman, M. H., & Akadal, E. (2020). Machine Coded Compact Genetic Algorithms for Real Parameter Optimization Problems. Alphanumeric Journal, 8(1), 43-58. https://doi.org/10.17093/alphanumeric.576919
Go to original source...
- Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1-37. https://doi.org/10.18637/jss.v053.i04
Go to original source...
- Scrucca, L. (2017). On Some Extensions to GA Package: Hybrid Optimisation, Parallelisation and Islands EvolutionOn some extensions to GA package: hybrid optimisation, parallelisation and islands evolution. The R Journal, 9(1), 187-206. https://doi.org/10.32614/RJ-2017-008
Go to original source...
- Soler, J., Boada, I., Prados, F., & Poch, J. (2006). A web-based problem-solving environment for database normalization. In Proceedings of the 8th International Symposium on Computers in Education, SIIE 2006, (pp. 86-93). ACME. http://acme.udg.cat/articles/siie2006.pdf
- Sumathi, S., & Esakkirajan, S. (2007). Fundamentals of Relational Database Management Systems. Springer. https://doi.org/10.1007/978-3-540-48399-1
Go to original source...
- Suranauwarat, S. (2017). An Approach to Solving Technical Difficulties Facing Non-CS Students in a Database Class. International Journal of Modern Education and Computer Science, 9(2), 14-26. https://doi.org/10.5815/ijmecs.2017.02.02
Go to original source...
- Tessler, S. (2002). Data Model and Relational Database Design for the New England Water-Use Data System (USGS Open-File Report 01-359). USGS. https://pubs.usgs.gov/of/2001/ofr01359/
Go to original source...
- Verma, S. (2012). Comparing Manual and Automatic Normalization Techniques for Relational Database. https://www.semanticscholar.org/paper/COMPARING-MANUAL-AND-AUTOMATIC-NORMALIZATION-FOR-Verma/769b6e0b6b6cc84ff7e4822133b99367ffcb0531
- Yazici, A., & Karakaya, Z. (2007). JMathNorm: A database normalization tool using mathematica. In Computational Science - ICCS 2007, (pp. 186-193). Springer. https://doi.org/10.1007/978-3-540-72586-2_27
Go to original source...
- Zhu, X. H., Zeng, Q. L., & Cao, Q. H. (2010). A Complex XML Schema to Map the XML Documents of Distance Education Technical Specifications into Relational Database. International Journal of Digital Content Technology and its Applications, 4(8), 182-192.
Go to original source...
This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.