Talk about the pros and cons of data transformation at each level (Source, Middle Marts, and Power Query)

Commonly, a dashboard contains a variety of data sources. There must be some preliminary cleaning and transforming work before building visualizations. Typically, the reporting structure includes three layers: sources, middle mart, and applications( reporting, visualizations).

Big companies with detailed cooperation among teams

In big companies, there is more detailed team collaboration. You may have a team take care of the database, and your data is all from the database. In this case, you can consider the following structure: send the request to the DB team delivering queries and get the data. You can build your offline files based on the output data and put all the files in SharePoint or other online shared spaces with your team. And here comes the question: when should we clean and transform data? I recommend cleaning and changing data at the PBI level by leveraging the power query tool. Why? Because we leave SharePoint or share spaces as the source, we can control for versions and backup (although there must be backups at the DB level). Keep the files in shared space in the original formats that can help us communicate consistently without too many customized data formats and transforming. We do the cleaning at the BI level, which helps us provide more personalized visualizations and retain automotive pipelines.

Startups or small organizations as a multitasking role

Suppose you work in a multitasking function to handle all tasks from the DB level to the presenting layer. In that case, I recommend cleaning and transforming data at the DB level while building your queries as reusable modules to fit any possible combinations.

The reason is simple. You can keep your BI layer more efficient and reduce the computing resources taken while moving all the personalized reforming to the database. You can build procedures in DB and generate temp tables for specific visualizations. In this case, you don't need to write a power query again at the reporting level; all processes are automatic, too.

Generally speaking, there are two focuses for our data cleaning and transforming design. One is how easily we can maintain our regular tasks (including team collaboration and managing codes). The other is the automotive processes. Ultimately, we are not building customized solutions for multiple requests. We aim to reuse our modules as much as possible and reduce human workload.

--

--

Enthusiastic about enabling commercial excellence

An analyst who is familiar with the APAC market and stays with 10-year experience in data analytics, project management, and go to market strategies.