Acta Informatica Pragensia 2020, 9(2), 170-183 | DOI: 10.18267/j.aip.1394309
The Process of Unit Price Extraction from Public Sector Contracts
- Department of Information Technologies, Faculty of Informatics and Statistics, Prague University of Economics and Business, W. Churchill Sq. 1938/4, 130 67 Prague, Czech Republic
Czech government institutions commissioned a research on extracting usual unit prices from public IT contracts to aid future public tender sizing. The goal of the project is to obtain millions of contracts from the public register, convert them to full text, extract unit prices from the text and publish a pricelist of IT industry manday prices. This paper designs the process and method of price extraction, demonstrates and evaluates the result on five iterations of extraction and discusses the experience of two years of project performance. The process is designed as a set of repeatable workflows and specified activity and role description. The method is designed as a combination of automated and manual actions. Due to the format and content variability of involved documents and the low mistake tolerance, the possibility of automated extraction of unit prices from full text contract is limited, and human workforce for validation is crucial.
Keywords: Usual price, Contracting, Full text analytics, Information technology, Business process
Received: August 24, 2020; Revised: October 1, 2020; Accepted: October 1, 2020; Prepublished online: October 1, 2020; Published: December 31, 2020 Show citation
References
- Aibinu, A. A. & Pasco, T. (2008). The accuracy of pretender building cost estimates in Australia. Construction Management and Economics, 26(12), 1257-1269. https://doi.org/10.1080/01446190802527514.
Go to original source...
- Bruckner, T. (2019). Design of the technological architecture for PUMPIT project. Journal of Systems Integration, 10(2), 34-40. http://www.sijournal.org/index.php/JSI/article/view/370.
- Bruckner, T. & Vencovsky, F. (2020). Extracting usual service prices from public contracts. In 3rd International Conference on Advanced Research Methods and Analytics (CARMA 2020), (pp. 259-268). Editorial Universitat Politècnica de València.https://doi.org/10.4995/CARMA2020.2020.11645.
Go to original source...
- Czechia (1990). The Act No. 536/1990 Coll., on prices.
Go to original source...
- Czechia (2015). The Act No. 340/2015 Coll., on special conditions for the effectiveness of some contracts.
- Eurostat (2008). NACE Rev. 2 Statistical classification of economic activities. European Communities. https://ec.europa.eu/eurostat/web/nacerev2.
- Gao, X., Singh, M. P., & Mehra, P. (2012). Mining Business Contracts for Service Exceptions. IEEE Transactions on Services Computing, 5(3), 333-344. https://doi.org/10.1109/TSC.2011.1.
Go to original source...
- Ha, H. T. (2017). Recognition of Invoices from Scanned Documents. In Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2017, (pp. 71-78). Tribun EU.
- Ha, H. T., Medved, M., Neverilova, Z., & Horak, A. (2018). Recognition of OCR Invoice Metadata Block Types. In Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science, (pp. 304-312). Springer, Cham. https://doi.org/10.1007/9783030007942_33.
Go to original source...
- Hevner, A. & Chatterjee, S. (2010). Design Research in Information Systems. Springer US. https://doi.org/10.1007/9781441956538.
Go to original source...
- Hevner, A., vom Brocke, J., & Maedche, A. (2019). Roles of Digital Innovation in Design Science Research. Business & Information Systems Engineering, 61(1), 3-8. https://doi.org/10.1007/s125990180571z.
Go to original source...
- Kim, Y., Lee, J., Lee, E.B., & Lee, J.H. (2020). Application of Natural Language Processing (NLP) and TextMining of BigData to EngineeringProcurementConstruction (EPC) Bid and Contract Documents. In 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), (pp. 123-128). IEEE. https://doi.org/10.1109/CDMA47397.2020.00027.
Go to original source...
- Ministry of Finance (2019). Register of business subjects. Ministry of Finance of the Czech Republic. https://wwwinfo.mfcr.cz/ares/ares_es.html.cz.
- Ministry of Internal Affairs (2019a). Pricelist of it industry unit prices. Ministry of Internal Affairs of the Czech Republic. https://www.mvcr.cz/clanek/prehledobvyklychcenictpraci.aspx.
- Ministry of Internal Affairs (2019b). Public register of contracts. Ministry of Internal Affairs of the Czech Republic. https://smlouvy.gov.cz/.
- Ochrana, F. & Pavel, J. (2013). Analysis of the impact of transparency, corruption, openness in competition and tender procedures on public procurement in the Czech Republic. Central European Journal of Public Policy, 7(2), 114-134.
- OMG (2014). Business Process Model and Notation (BPMN), Version 2.0.2. Standard, Object Management Group. https://www.omg.org/spec/BPMN/2.0.2.
- Palm, R. B., Winther, O., & Laws, F. (2017). CloudScan A ConfigurationFree Invoice Analysis System Using Recurrent Neural Networks. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), (pp. 406-413). IEEE. https://doi.org/10.1109/ICDAR.2017.74.
Go to original source...
- Skitmore, M. & Picken, D. H. (2000). The accuracy of pretender building price forecasts: An analysis of USA data. Australian Institute of Quantity Surveyors Refereed Journal, 4(1), 33-39.
- Tarawneh, A. S., Hassanat, A. B., Chetverikov, D., Lendak, I., & Verma, C. (2019). Invoice Classification Using Deep Features and Machine Learning Techniques. In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019 Proceedings, (pp. 855-859). IEEE. https://doi.org/10.1109/JEEIT.2019.8717504.
Go to original source...
- Taylor, S., Iqbal, M., & Nieves, M. (2007). ITIL Service strategy. Stationery Office.
This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.