The series continues! This is the sixth blog post in this series on Azure Data Factory, if you have missed any or all of the previous blog posts you can catch up using the provided links here:
- Check out part one here: Azure Data Factory – Get Metadata Activity
- Check out part two here: Azure Data Factory – Stored Procedure Activity
- Check out part three here: Azure Data Factory – Lookup Activity
- Check out part four here: Azure Data Factory – If Condition Activity
- Check out part five here: Azure Data Factory – Copy Data Activity
What is the Filter activity in Azure Data Factory?
The Filter activity applies a filter expression to an input array. Understanding that definition will help simplify how and where to use this activity.
Let me set up the scenario for you. In this example, I want to use Azure Data Factory to loop over a list of files that are stored in Azure Blob Storage. I am going to use the Metadata activity to return a list of all the files from my Azure Blob Storage container. Unfortunately, I don’t want to process all the files in the directory location. Below I have posted the list of files currently in my storage account, notice the file name “inputEmp_tq.txt”. I want to remove this file from the list of files returned.
Get Metadata Activity – Get List of Files to process
I reviewed the metadata activity in the very first blog post in this Azure Data Factory series, therefore, I won’t bore you with those details again.
- The dataset is pointing to a folder location, not a specific file. This is important.
- I selected “Child Items” from the field list properties, this will return the name of all files in that directory location.
Filter Activity – Remove unwanted files from an input array
The first step is to add the filter activity to the pipeline and connect the activity to the successful output of the metadata activity:
Now it’s time to set up the Filter activity. The filter activity requires two items during configuration.
- Items – Input array on which filter should be applied.
- Condition – Condition to be used for filtering the input array.
The items will be the output of our metadata activity and the Condition I will build using the built in expression language. Only items that evaluate to true will be returned in the final array!
Filter Activity Configuration
First, I will configure the Items property. This is simply the output parameter from the metadata activity and so I will use the following code:
@activity(‘meta_GetListOfFiles’).output.childItems
Next, I will set up a condition that will remove any files that don’t match the naming pattern I want. In this scenario I am simply looking for file names that start with FactInternetSales_ and any files that don’t match this criteria will be removed from the final array. You can see the actual formula in the previous screenshot but I want to quickly show you how I found the function used for this example.
First, I am going to click my mouse cursor in the Condition box and then I will click on “Add dynamic content”.
If you have followed any of the blogs in this series then you are familiar with this new window that opens up. The function that I use in this example is a String Function called startswith. See the following screenshot on where I found this function:
Here is the final expression:
As always, thanks for reading my blog!