We have recently completed a project in helping a client with a data mart for regulatory reporting, touch over thirty data sources and a lot of enrichment’s to prepare for reporting. Requirements were quite complex and expectations from client were to have a CIM aligned Data Architecture, Data Quality for the elements in the mart, Data Lineage and a control framework for tracking of delivery of reports and its inter-dependency and finally key deliverable Automation of the reports.
We started with an Assessment and Planning phase to get a feel for the scope of work and this also helped client see a glimpse of our capabilities.
Analysis phase of this project was time-consuming from diverse data sources , during those days I always thought this is a data wrangling project than an ETL project.
Key characteristics of a data wrangler project is it has different types of users, different sets of data and use cases of these data sets are different. Let me explain further without deep diving into specific examples we were touching pretty much all enterprise data i.e. over thirty source systems, the use case is different in everyone of them, users are different and of course data is different too. Thou the goal was one i.e. providing metrics needed by the regulator
on some specific categories which are defined by the regulator for industry, so they can measure all players equally and benchmark the service for society.
Key characteristics of this project is to understand, clean, and organize data in an appropriate format, which tells me that its more of a data wrangling solution than ETL
ETL are generally designed to handle data that is well-structured, often originating from a variety of operational systems or databases, may not necessarily handle complex raw sources that requires substantial extraction and derivation e.g. asset on the other hand data wrangling can be designed and architect ed to handle diverse,
complex data at any scale.
The core concept of data wrangling is that the people who know the data best should be exploring and preparing that data. i.e. business analysts, line-of-business users, and their managers are the intended users of data.