Introduction to Wrangling Data Flows in Azure Data Factory

Hello! It’s been a while since I’ve done a video on Azure Data Factory. To get back in the flow of blogging on ADF I will be starting with Data Flows, specifically Wrangling Data Flows.

The video can be seen here:

What are Wrangling Data Flows in Azure Data Factory?

Wrangling Data flows are a method of easily cleaning and transforming data at scale. huh?

Wrangling Data Flows uses the M query language and the UI experience provided by the Power Query Editor in Power BI Desktop. This is a brilliant move by Microsoft to include this technology in Azure Data Factory. Just think of the hundreds of millions of people who currently are transforming and cleaning their data in Excel or Power BI Desktop. Now they can take their self service ETL (extract, transform and load) skills to the enterprise level with ADF.

What makes it scalable? Power Query Editor at Scale.

Wrangling data flows allows the developer to use the graphical user interface to do all the hard work with minimal to no code. But in the background all of your UI steps are being converted to the M language. At runtime, Azure Data Factory will take that M code and convert it to Spark and then run your data flow against big data clusters. This means as your data volumes grow, you should experience consistent performance!

Are there any limitations with Wrangling Data Flows?

Yes… quite a few actually. Wrangling Data Flows are still in preview at the time of this blog and the related YouTube video. Currently there are quite a few operations that just aren’t supported. The most obvious of those operations being promoting header rows and pivoting data. I hope that these features will be available once the product is in GA.

https://docs.microsoft.com/en-us/azure/data-factory/wrangling-data-flow-functions#known-unsupported-functions9

As always, thank you for reading my blog and watching my YouTube videos! Have a great day!!

Other Azure Data Factory resources!


Leave a comment