The series continues! This is the sixth blog post in this series on Azure Data Factory, if you have missed any or all of the previous blog posts you can catch up using the provided links here:
- Check out part one here: Azure Data Factory – Get Metadata Activity
- Check out part two here: Azure Data Factory – Stored Procedure Activity
- Check out part three here: Azure Data Factory – Lookup Activity
- Check out part four here: Azure Data Factory – If Condition Activity
- Check out part five here: Azure Data Factory – Copy Data Activity
What is the Filter activity in Azure Data Factory?
The Filter activity applies a filter expression to an input array. Understanding that definition will help simplify how and where to use this activity.
Let me set up the scenario for you. In this example, I want to use Azure Data Factory to loop over a list of files that are stored in Azure Blob Storage. I am going to use the Metadata activity to return a list of all the files from my Azure Blob Storage container. Unfortunately, I don’t want to process all the files in the directory location. Below I have posted the list of files currently in my storage account, notice the file name “inputEmp_tq.txt”. I want to remove this file from the list of files returned.
Get Metadata Activity – Get List of Files to process
I reviewed the metadata activity in the very first blog post in this Azure Data Factory series, therefore, I won’t bore you with those details again.
- The dataset is pointing to a folder location, not a specific file. This is important.
- I selected “Child Items” from the field list properties, this will return the name of all files in that directory location.
Filter Activity – Remove unwanted files from an input array
The first step is to add the filter activity to the pipeline and connect the activity to the successful output of the metadata activity:
Now it’s time to set up the Filter activity. The filter activity requires two items during configuration.
- Items – Input array on which filter should be applied.
- Condition – Condition to be used for filtering the input array.
The items will be the output of our metadata activity and the Condition I will build using the built in expression language. Only items that evaluate to true will be returned in the final array!
Filter Activity Configuration
First, I will configure the Items property. This is simply the output parameter from the metadata activity and so I will use the following code:
@activity(‘meta_GetListOfFiles’).output.childItems
Next, I will set up a condition that will remove any files that don’t match the naming pattern I want. In this scenario I am simply looking for file names that start with FactInternetSales_ and any files that don’t match this criteria will be removed from the final array. You can see the actual formula in the previous screenshot but I want to quickly show you how I found the function used for this example.
First, I am going to click my mouse cursor in the Condition box and then I will click on “Add dynamic content”.
If you have followed any of the blogs in this series then you are familiar with this new window that opens up. The function that I use in this example is a String Function called startswith. See the following screenshot on where I found this function:
Here is the final expression:
As always, thanks for reading my blog!
Thanks very much. You make me save so much time
Great to hear, thanks for reading my blog!
Thanks! I need to use a Filter Activity and could hardly find any info on it at all (Including MS). This Blog really helped me. Mike
Always great to hear that these blogs are providing value! Thanks Mike.
Hey Mitchell,
I am trying to use a Filter and the following screenshot shows my condition but I am getting an error. Can you see anything I am doing wrong
Items
@activity(‘LookUp Company IDs and Connection String’).output.value
condition
@equals(activity(item().CompanyID, variables(‘CompanyID’)))
And I am getting this error:
Position 18Only a single string literal argument is allowed inside the class clause
thanks…I have tried everything.
Mike
I haven’t seen that specific error message so i’m not sure, shoot me an email mpearson@pragmaticworks.com. Do you need activity( there? After looking at the error again, it looks like it doesn’t like the fact that you are using a variable value rather than a string literal. Does it work if you replace the variable with a hard-coded string literal value?
Hello, I was using your example for Metadata, taking the output of childitems in a For Each activity to process an array of file names. Now I want to filter down the childitems, so I was able to use this Filter activity example to accomplish that.
How do I take the output from the Filter activity (similar to childitems from Metadata activity) and use it in the ForEach, to iterate through the filtered list? Hope this makes sense.
Hey Joe, I have a YouTube video that will show you how to find the output of the filter activity, it’s only a 13 minute video but the filter activity output is probably around 7-8 minutes in: https://www.youtube.com/watch?v=yIyyw1e1bPM&t=12s
Thanks a lot, your post helped me to figure out how to filter folders/files with dates.
@or(startswith(item().name,formatDateTime(adddays(utcNow(),-1),’yy-MM-dd’)),startswith(item().name,formatDateTime(adddays(utcNow(),-2),’yy-MM-dd’)))
Kindly Regards.