Acta Informatica Pragensia 2021, 10(1), 38-60 | DOI: 10.18267/j.aip.1453087
Three Metric-Based Method for Data Compatibility Calculation
- Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill Sq. 1938/4, 130 67 Prague 3, Czech Republic
This article analyzes ways of calculating characteristics of data and most common data structure types that allow comparison between them or on a time axis. To achieve this, it studies the key aspects of relational databases, XML, JSON and RDF structure types. These data structure types are compared to multiple isolated approaches to data quality and other data characteristics measurements. The goals of the article are the calculation method itself and a storage structure for calculated values. The article presents a method of characterization of data and data structure types based on the calculation of three metrics: the amount of structuredness, the amount of hierarchicallity and the amount of information. This triad of metrics allows comparison between various data sets (objects), for example evaluating the complexity of the transformation of data from one data object to another, as well as with data structure types (as mentioned above). Based on the vector of three metrics, the calculation method of the compatibility between data and data structure type is proposed. This method can help select the most compatible data format for existing data. The calculated values of metrics can also detect non-optimal storage design and classify data transformations. The method was evaluated on an example case study, which showed its usability on an example demonstration data set. It can be used in the process of data modelling to help select optimal data structure type, to design a data transformation process and to optimize existing data storages.
Keywords: Data metrics, Amount of information, Metadata, Relational database, XML, JSON, RDF, Ontology, Transformation, Structuredness, Hierarchicallity, Normalization, Visualization.
Received: March 6, 2021; Revised: May 2, 2021; Accepted: May 2, 2021; Prepublished online: May 2, 2021; Published: June 30, 2021 Show citation
References
- Bartmann, D., Bodendorf, F., Sinz, E. J., & Ferstl, O. K. (2011). Dienstorientierte IT-Systeme für hochflexible Geschäftsprozesse. University of Bamberg Press.
- Begg, C., & Caira, T. (2012). Exploring the SME Quandary: Data Governance in Practise in the Small to Medium-Sized Enterprise Sector. Electronic Journal of Information Systems Evaluation, 15(1), 3-13.
- Codd, E. F. (1990). The relational model for database management: Version 2. Addison-Wesley.
- Florescu, D. (2005). Managing Semi-Structured Data. Queue, 3(8), 18-24. https://doi.org/10.1145/1103822.1103832
Go to original source...
- Floridi, L. (2013). Information Quality. Philosophy & Technology, 26(1), 1-6. https://doi.org/10.1007/s13347-013-0101-3
Go to original source...
- Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A. G., Draicchio, F., & Mongiovì, M. (2017). Semantic web machine reading with FRED. Semantic Web, 8(6), 873-893. https://doi.org/10.3233/SW-160240
Go to original source...
- Grünwald, P. D., & Vitányi, P. M. (2008). Algorithmic information theory. https://arxiv.org/abs/0809.2754
Go to original source...
- Halpin, T. (2001). Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. Morgan Kaufmann Publishers Inc.
- Helland, P. (2017). XML and JSON Are Like Cardboard. Communications of the ACM, 60(12), 46-47. https://doi.org/10.1145/3132269
Go to original source...
- Hutter, M. (2007). Algorithmic information theory. Scholarpedia, 2(3), 2519. https://doi.org/10.4249/scholarpedia.2519
Go to original source...
- Krishnamurthy, R., Naughton, J. F., Shanmugasundaram, J., & Shekita, E. (2001). Dealing with (un) structuredness in XML Data and Queries Using Relational Databases. DB Seminar at Wise University. https://pdfs.semanticscholar.org/acb6/72e6feea4893192c74fc4cf3dcce31b3ad65.pdf
- Ma, Z., Bai, L., & Yan, L. (2020). Transformation of Fuzzy Spatiotemporal Data Between Relational Databases and XML. In Z. Ma, L. Bai, & L. Yan (Eds.), Modeling Fuzzy Spatiotemporal Data with XML (pp. 123-145). Springer International Publishing. https://doi.org/10.1007/978-3-030-41999-8_6
Go to original source...
- Meinsma, G. (n.d.). Data compression & Information theory. 2014. https://www.yumpu.com/en/document/view/27882302/data-compression-information-theory
- Morton, J. (Ed.). (2014). Big data: Opportunities and challenges. BCS, The Chartered Institute for IT.
- Musca, S. C., Kamiejski, R., Nugier, A., Méot, A., Er-Rafiy, A., & Brauer, M. (2011). Data with Hierarchical Structure: Impact of Intraclass Correlation and Sample Size on Type-I Error. Frontiers in Psychology, 2, 74. https://doi.org/10.3389/fpsyg.2011.00074
Go to original source...
- Närman, P., Holm, H., Johnson, P., König, J., Chenine, M., & Ekstedt, M. (2011). Data accuracy assessment using enterprise architecture. Enterprise Information Systems, 5(1), 37-58. https://doi.org/10.1080/17517575.2010.507878
Go to original source...
- Oren, E., Möller, K., Scerri, S., Handschuh, S., & Sintek, M. (2006). What are semantic annotations. Relatório Técnico. 9, 62. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.7985&rep=rep1&type=pdf
- Pokorný, J. (2010). Databases in the 3rd Millennium: Trends and Research Directions. Journal of Systems Integration, 1(1-2), 3-15. https://doi.org/10.20470/jsi.v1i1-2.25
Go to original source...
- Ramel, D. (2015). Relational Databases Still Reign in Enterprises, Survey Says. Enterprise Systems Journal. https://esj.com/articles/2015/04/23/database-survey.aspx
- Shanmugasundaram, J., Shekita, E., Kiernan, J., Krishnamurthy, R., Viglas, E., Naughton, J., & Tatarinov, I. (2001). A general technique for querying XML documents using a relational database system. ACM SIGMOD Record, 30(3), 20-26. https://doi.org/10.1145/603867.603871
Go to original source...
- Shannon, C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3. https://doi.org/10.1145/584091.584093
Go to original source...
- Song, E., & Haw, S.-C. (2020). XML-REG: Transforming XML Into Relational Using Hybrid-Based Mapping Approach. IEEE Access, 8, 177623-177639. https://doi.org/10.1109/ACCESS.2020.3026006
Go to original source...
- Šperková, L. (2014). Unstructured Data Analysis from Facebook Banking Sites. Acta Informatica Pragensia, 3(2), 154-167. https://doi.org/10.18267/j.aip.44
Go to original source...
- Vodňanský, D. (2016). Entropy-based hierarchization of relational data structures. Journal of Systems Integration, 7(4), 25-34. https://doi.org/10.20470/jsi.v7i4.275
Go to original source...
- Vodňanský, D. (2020). 3D data metrics visualizer. https://danielvodnansky.github.io/3d-data-histogram/
- Vodňanský, D., & Zamazal, O. (2016). Study on Graph Metrics over Linked Open Vocabularies and OntoFarm Collections. In Proceedings of the 7th International Conference of Knowledge Engineering and Semantic Web, KESW 2016 (pp. 1-2). Prague University of Economics and Business.
- Wellenzohn, K., Böhlen, M. H., & Helmer, S. (2020). Dynamic Interleaving of Content and Structure for Robust Indexing of Semi-Structured Hierarchical Data (Extended Version). https://doi.org/10.14778/3401960.3401963
Go to original source...
This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.