Introduction to Wrangling Data Flows in Azure Data Factory

Hello! It’s been a while since I’ve done a video on Azure Data Factory. To get back in the flow of blogging on ADF I will be starting with Data Flows, specifically Wrangling Data Flows.

The video can be seen here:

What are Wrangling Data Flows in Azure Data Factory?

Wrangling Data flows are a method of easily cleaning and transforming data at scale. huh?

Wrangling Data Flows uses the M query language and the UI experience provided by the Power Query Editor in Power BI Desktop. This is a brilliant move by Microsoft to include this technology in Azure Data Factory. Just think of the hundreds of millions of people who currently are transforming and cleaning their data in Excel or Power BI Desktop. Now they can take their self service ETL (extract, transform and load) skills to the enterprise level with ADF.

What makes it scalable? Power Query Editor at Scale.

Wrangling data flows allows the developer to use the graphical user interface to do all the hard work with minimal to no code. But in the background all of your UI steps are being converted to the M language. At runtime, Azure Data Factory will take that M code and convert it to Spark and then run your data flow against big data clusters. This means as your data volumes grow, you should experience consistent performance!

Are there any limitations with Wrangling Data Flows?

Yes… quite a few actually. Wrangling Data Flows are still in preview at the time of this blog and the related YouTube video. Currently there are quite a few operations that just aren’t supported. The most obvious of those operations being promoting header rows and pivoting data. I hope that these features will be available once the product is in GA.

https://docs.microsoft.com/en-us/azure/data-factory/wrangling-data-flow-functions#known-unsupported-functions9

As always, thank you for reading my blog and watching my YouTube videos! Have a great day!!

Other Azure Data Factory resources!


Azure Data Factory–Rule Based Mapping and This($$) Function

Hello! This is the eight video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

Schema flexibility and late schema binding really separates Azure Data Factory from its’ on-prem rival SQL Server Integration Services (SSIS). This video focuses on leveraging the capability of flexible schemas and how rules can be defined to map changing column names to the sink.

Rule Based Mapping

Rule based mapping in ADF allows you to define rules where you can map columns that come into a data flow to a specific column. For example, you can map a column that has ‘date’ anywhere in the name to a column named ‘Order_Date’. This ability to define rules based allows for very flexible and reusable data flows, in the video below I walk through and explain how to set this up in side of a Select transform, enjoy!

This ( $$ ) Function in a Derived transform and a Select Transform

The this ($$) function simply returns the name of the column or value of the column depending on where it is used. In this video I show two use cases, one in a Select transform and one in a Derived transform.

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training