Model for the extraction, transformation and load process in data warehouses. An application with environmental data
Abstract
Data warehouse management requires a procedure to ensure the accuracy, completeness, and centralization of data when there are several sources of information, thus making the use of specialized applications for Extraction, Transformation, and Loading of Data -ETL- necessary. These applications have conflicts with the parameterization, lack the implementation of correction filters adaptable to the data characteristics, and can demand high costs for their implementation. In this article, it is presented a generic model that applies the stages of ETL and allows monitoring the process to keep a historical record of errors filtered and to calculate indicators to identify quality in processing. Model validation was performed on a case study with environmental data. The model showed satisfactory results. Finally, it is planned to conduct validations of the model in other areas, including new types and data structures.
Downloads
References
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J. and Becker, B. (2008). The Data Warehouse Lifecycle Toolkit. Indianapolis, USA: Wiley Publishing, Inc.
Calabria-Sarmiento, C. J. (2011). Construcción y poblamiento de un datawarehouse basado en el paradigma de bases de datos objeto relacional. Prospect, 9(1), pp. 69-77.
Talend (2016). Application Integration. The best way to accelerate delivery of real-time application integration. En: http://www.talend.com/products/application-integration (enero de 2016).
Pentaho (2016). Data Integration. Pentaho Community. En: http://community.pentaho.com/projects/data-integration/ (enero de 2016).
CloverETL (2016). CloverETL Rapid Data Integration. En: https://www.cloverdx.com/product (enero de 2016).
Jaramillo Valbuena, S. y Londo-o, J. M. (2015). Sistemas para almacenar grandes volúmenes de datos. Revista Gerencia Tecnológica Informática, 13(37), pp. 17-28.
Van den Hoven, J. (1998). Data Warehousing: Bringing it All Together. Information Systems Management, 15(2), pp. 92-96. doi: 10.1201/1078/43184.15.2.19980301/31127.16
Han, J., Kamber, M. & Pei, J. (2011). Data Mining: Concepts and Techniques. Waltham, MA, USA: Elsevier. Tercera edición.
Chaudhuri, S. & Dayal, U. (1997). An Overview of Data Warehousing and OLAP Technology. ACM SIGMOD Record, 26(1), pp. 65-74. doi: 10.1145/248603.248616
Shi, D., Lee, Y., Duan, X. & Wu, Q. H. (2001). Power system data warehouses. IEEE Computer Applications in Power, 14(3), pp. 49-55. doi: 10.1109/mcap.2001.952937
Tamayo, M. & Moreno, F. J. (2006). Análisis del modelo de almacenamiento MOLAP frente al modelo de almacenamiento ROLAP. Ingeniería e Investigación, 26(3), pp. 135-142.
Trujillo, J. & Luján-Mora, S. (2003). A UML Based Approach for Modeling ETL Processes in Data Warehouses. En I.-Y. Song, S. W. Liddle, T.-W. Ling y P. Scheuermann, Conceptual Modeling - ER (2003), Berlin Heidelberg: Eds. Springer, pp. 307-320. doi: 10.1007/978-3-540-39648-2_25
Vassiliadis, P., Simitsis, A. & Skiadopoulos, S. (2002). Conceptual Modeling for ETL Processes. En: Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 14-21. doi: 10.1145/583890.583893
Duque-Méndez, N. D., Orozco-Alzate, M. & Vélez, J. J. (2014). Hydro-meteorological data analysis using OLAP techniques. Revista DYNA, 81(185), pp. 160-167. doi: 10.15446/dyna.v81n185.37700
El-Sappagh, S. H. A., Hendawi, A. M. A. & El Bastawissy, A. H. (2011). A proposed model for data warehouse ETL processes. Journal of King Saud University - Computer and Information Sciences, 23(2), pp. 91-104. doi: 10.1016/j.jksuci.2011.05.005
Guo, S. S., Yuan, Z. M., Sun, A. B. & Yue, Q. (2015). A New ETL Approach Based on Data Virtualization. Journal of Computer Science and Technology, 30(2), pp. 311-323. doi:10.1007/s11390-015-1524-3
Betancur-Calderón, D. & Moreno-Cadavid, J. (2012). Una aproximación multi-agente para el soporte al proceso de extracción-transformación-carga en bodegas de datos. Revista Tecno Lógicas, 28, pp. 89-107.
Morales, A. E. (2012). Estadística y probabilidad. Chile.
Johnson, R. & Kuby, P. (2012). Estadística elemental. México, D.F.: Cengage Learning. 11° edición, pp. 95-102.