Hey All! Just in time for the holidays I’m uploading another Azure Data Factory video to YouTube. In this video we specifically look at how to use Parameters in Azure Data Factory to make your datasets and pipelines dynamic and reusable! In addition to parameters and expressions we also take a look at the Lookup, For Each and Execute Pipeline activities. This video is very informative, touches on a lot of different pieces and I hope you enjoy it!
Parent / Child Design Pattern in ADF
The Parent / Child design pattern is a popular design pattern for ETL processes and has been for many many years. This gives you that compartmentalization (if that’s a word) of your code making it more reusable and also giving you the ability to easily scale up and scale down by increasing or decreasing parallelization of your workers.
In this scenario the Parent pipeline will determine what work needs to be done and then pass that work to the worker pipeline. In my video I show how we use the Lookup activity in Azure Data Factory to get the list of items to process and then the list can be parsed out.
Azure Data Factory Lookup and Foreach activity
This scenario is retrieving the work from an Azure SQL Database, therefore, I use the lookup activity to retrieve the work. However, keep in mind you could use many different activities to serve this purpose, for example, I could use the metadata activity to get the list of files in a folder that needed to be processed. See my video on the metadata activity here:
The For Each activity is used to parse through the list / array that was created by the lookup activity. The For Each activity will perform a set of activities, defined by you, for each item that is returned in the list or array provided.
In this example, we are simply executing the worker pipeline passing in the current schema and table name of the table we want to process!
Execute Pipeline activity in ADF
The child pipeline or worker pipeline is where all the magic happens. In development, we make the datasets dynamic by parameterizing their connection so that the connection can be changed at run time based on the parameters passed in from the parent to to the child. In the following screenshot you can observe that the dataset is connecting to an Azure SQL Database and the schema and table name are determined by parameters at run time!
For specifics on the setup, orchestration and execution of this design pattern watch the video As always, thanks for reading my blog and have a great week!