Mitchellsql

Skip to content
  • Home
  • About
Search

Get Metadata activity

Azure Data Factory–Filter Activity

August 6, 2018 / Mitchell Pearson / 9 Comments

image

The series continues! This is the sixth blog post in this series on Azure Data Factory, if you have missed any or all of the previous blog posts you can catch up using the provided links here:

  • Check out part one here: Azure Data Factory – Get Metadata Activity
  • Check out part two here: Azure Data Factory – Stored Procedure Activity
  • Check out part three here: Azure Data Factory – Lookup Activity
  • Check out part four here: Azure Data Factory – If Condition Activity
  • Check out part five here: Azure Data Factory – Copy Data Activity

What is the Filter activity in Azure Data Factory?

The Filter activity applies a filter expression to an input array. Understanding that definition will help simplify how and where to use this activity.

Let me set up the scenario for you. In this example, I want to use Azure Data Factory to loop over a list of files that are stored in Azure Blob Storage. I am going to use the Metadata activity to return a list of all the files from my Azure Blob Storage container. Unfortunately, I don’t want to process all the files in the directory location. Below I have posted the list of files currently in my storage account, notice the file name “inputEmp_tq.txt”. I want to remove this file from the list of files returned.

image

Get Metadata Activity – Get List of Files to process

I reviewed the metadata activity in the very first blog post in this Azure Data Factory series, therefore, I won’t bore you with those details again.

  1. The dataset is pointing to a folder location, not a specific file. This is important.
  2. I selected “Child Items” from the field list properties, this will return the name of all files in that directory location.

image

Filter Activity – Remove unwanted files from an input array

The first step is to add the filter activity to the pipeline and connect the activity to the successful output of the metadata activity:

image

Now it’s time to set up the Filter activity. The filter activity requires two items during configuration.

  1. Items – Input array on which filter should be applied.
  2. Condition – Condition to be used for filtering the input array.

The items will be the output of our metadata activity and the Condition I will build using the built in expression language. Only items that evaluate to true will be returned in the final array!

Filter Activity Configuration

First, I will configure the Items property. This is simply the output parameter from the metadata activity and so I will use the following code:

@activity(‘meta_GetListOfFiles’).output.childItems

image

Next, I will set up a condition that will remove any files that don’t match the naming pattern I want. In this scenario I am simply looking for file names that start with FactInternetSales_ and any files that don’t match this criteria will be removed from the final array. You can see the actual formula in the previous screenshot but I want to quickly show you how I found the function used for this example.

First, I am going to click my mouse cursor in the Condition box and then I will click on “Add dynamic content”.

image

If you have followed any of the blogs in this series then you are familiar with this new window that opens up. The function that I use in this example is a String Function called startswith. See the following screenshot on where I found this function:

image

Here is the final expression:

image

As always, thanks for reading my blog!

Advertisement

Azure Data Factory – If Condition activity

July 2, 2018 / Mitchell Pearson / 11 Comments

In part three of my Azure Data Factory series I showed you how the lookup activity could be used to return the output results from a stored procedure. In part one you learned how to use the get metadata activity to return the last modified date of a file. In this blog, we are going to use the if condition activity to compare the output of those two activities. If the last modified date of the file is greater than the last execution date (last time the file was processed) then the copy activity will be executed. In other words, the copy activity only runs if new data has been loaded into the file, currently located on Azure Blob Storage, since the last time that file was processed.

Check out the following links if you would like to review the previous blogs in this series:

  • Check out part one here: Azure Data Factory – Get Metadata Activity
  • Check out part two here: Azure Data Factory – Stored Procedure Activity
  • Check out part three here: Azure Data Factory – Lookup Activity

image_thumb7

Setup and configuration of the If Condition activity

For this blog, I will be picking up from the pipeline in the previous blog post. Therefore, this pipeline already has the get metadata activity and lookup activity and I am going to add the if condition activity and then configure it accordingly to read the output parameters from two previous activities.

  1. Expand the category Iteration & Conditionals in the activities pane.
  2. Select the if condition activity.
  3. Drag the if condition activity from the activities pane and drop it into the pipeline.

2_If_Condition_Pipeline

The next step is to configure the if condition activity to only execute after the lookup and get metadata activities complete successfully. This can be accomplished by using the built in constraints. The default constraint is set to success. This can be changed by simply selecting the constraint and then right clicking on it. There are currently four options available for the constraints:

  1. Successful – default behavior
  2. Failure
  3. Completed
  4. Skipped

To configure the constraints between activities is quite simple. Take a look at the animated gif below:

2_Connect_Activities

Now it’s time to set up the configuration of the if condition activity:

  1. With the if condition activity selected, navigate to the properties pane and rename the activity:
    • Name: Check if file is new

Adding a parameterized expression in Azure Data Factory

Now it is time to configure the settings tab for the if condition activity. The settings tab requires an expression that evaluates to either true or false. In this example, the expression needs to compare the output parameters from each of the previous tasks to determine if the file is new since the last load time. The “Add dynamic content” menu will help with building the expression for you, however, it will not give you the full path to the output parameter. In the previous blog post, I showed how you could identify the exact output parameter names after the debug phase. Let’s take another look at the output results:

3_Output_Results_Previous blog post

Let’s jump right in and build the parameterized expression:

  1. Select the settings tab from the properties window
  2. Click in the expression box
  3. Click the hyperlink that appears below the expression box “Add dynamic content”.

4_add_dynamic_content

System Variables and Functions in Azure Data Factory

In the Add Dynamic Content window there are some built in system variables and functions that can be utilized when building an expression. In this blog post I am going to use the built in function greaterOrEquals. This function will allow us to compare the two dates from the output parameters of our previous activities (Last Modified date and Last Execution date).

  1. Expand the functions category
  2. Next click to expand logical functions.
  3. Finally, click on the function greaterOrEquals, this function will now appear in the expression box.

5_greaterOrEquals_function

Now it’s time to finish building out the expression. The built in function, greaterOrEquals, expects two parameters separated by a comma. This is where I will insert the output parameters from the lookup and get metadata activities. As previously mentioned, the dynamic content window will help in referencing those outputs. In the animated gif below, you can see that I use the activity outputs to begin the parameter reference. Then I manually complete the expression by adding the specific parameter names. Remember, that we obtained these names by looking at the output of each activity after debugging the pipeline.

6_Parameterized_Expression

Adding activities to the If Condition activity

The final step is to add activities to if condition activity. You have two options under the activities tab. These options are If True activities and If False activities. In other words, what activities would you like to perform if the expression evaluates to true? Alternatively, what activities would you like to perform if the expression evaluates to false? For the sake of simplicity I will add a wait task to each activity condition. In the next blog post in this series I will replace the wait activity with the copy activity.

  1. Click on the Activities tab found in the properties window.
  2. Click the box “Add If True Activity”
  3. This will open a pipeline that is scoped only to the if condition activity.

7_Add_True_Activity

  1. Add the Wait activity to the new pipeline.
  2. I named the activity wait_TRUE to help during debug and validation.

Also, pay special attention to the breadcrumb link across the top of this pipeline. This makes navigation easy and also helps identify what scope you are currently developing in. As mentioned previously, the wait activity that was added is scoped to the if condition activity and only if the expression evaluates to true.

 8_Wait_activity_IfTrue

I want to add a final activity before debugging the pipeline. I want to add a wait activity to the if condition if the expression evaluates to false. To do this I have to navigate back to the if condition activity and select If False Activities under the activities property.

  1. I will use the breadcrumb link to navigate back to the main pipeline.
  2. Then select the Activities tab for the the If Condition activity.
  3. Finally, click the box for “Add If False Activity”.
  4. Add a wait activity to this pipeline and then name it wait_FALSE

9_Breadcrumb

With everything set up it’s now time to debug the pipeline. Since the last modified date of the file is 06/06/2018 and the last execution date is 06/13/2018, therefore, I would the wait activity defined within the If Condition-If False pipeline should be executed. As you can see from the results below, the wait_False activity was executed.

10_Debug_Results

Thanks for checking out my blog!

Recent Posts

  • Working with Averages in DAX using AVERAGEX, SUMMARIZE and DATESINPERIOD
  • Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory
  • Dynamically changing title names in Power BI
  • Working with Parameters and Expressions in Azure Data Factory
  • Introduction to Wrangling Data Flows in Azure Data Factory

Recent Comments

VINICIUS AUGUSTUS PA… on Azure Data Factory–Filte…
Mitchell Pearson on Advanced TSQL Takeover
lordi on Advanced TSQL Takeover
Mitchell Pearson on The transaction log for databa…
You cant see me on The transaction log for databa…

Archives

  • January 2021
  • December 2020
  • September 2020
  • May 2020
  • April 2020
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • August 2017
  • July 2017
  • June 2017
  • April 2017
  • August 2016
  • May 2016
  • April 2016
  • February 2016
  • July 2015
  • June 2015
  • April 2015
  • February 2015
  • January 2015
  • November 2014
  • September 2014
  • August 2014
  • June 2014
  • May 2014
  • February 2014
  • January 2014

Categories

  • Azure Data Factory
  • Azure Logic Apps
  • BIxPress
  • DAX
  • DBA
  • MDX
  • Performance Tuning
  • Power BI
  • PowerApps
  • Problem, Design, Solution
  • Professional / Organization
  • Scripting
  • Speaking
  • SQL Tips
  • SSAS – Analysis Services
  • SSIS
  • T-SQL
  • Task Factory
  • Tips and Tricks
  • Working with R

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Top Posts & Pages

  • Dynamically changing title names in Power BI
  • ALL vs ALLSELECTED in DAX and Power BI Desktop
  • How to add Data Labels to maps in Power BI
  • SQL Tips! How to edit the query for Edit TOP 200 Rows in Management Studio (SSMS)
  • Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory
Blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Mitchellsql
    • Join 92 other followers
    • Already have a WordPress.com account? Log in now.
    • Mitchellsql
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar