Mitchellsql

Skip to content
  • Home
  • About
Search

Azure Data Factory

Azure Data Factory–Metadata Activity (Part 1)

April 5, 2020April 18, 2020 / Mitchell Pearson / 8 Comments

Hello! This is the first video in a series of videos that will be posted on Azure Data Factory! This series will be primarily in video format and can be found on YouTube! Check it out there and if you like, subscribe and encourage me to keep posting new videos!

Metadata Activity in ADF v2

  • Metadata Activity of a file
  • Metadata Activity of a folder
  • Metadata Activity from an Azure SQL Server Table
  • Debugging and output parameters

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Advertisement

Azure Data Factory–Filter Activity

August 6, 2018 / Mitchell Pearson / 9 Comments

image

The series continues! This is the sixth blog post in this series on Azure Data Factory, if you have missed any or all of the previous blog posts you can catch up using the provided links here:

  • Check out part one here: Azure Data Factory – Get Metadata Activity
  • Check out part two here: Azure Data Factory – Stored Procedure Activity
  • Check out part three here: Azure Data Factory – Lookup Activity
  • Check out part four here: Azure Data Factory – If Condition Activity
  • Check out part five here: Azure Data Factory – Copy Data Activity

What is the Filter activity in Azure Data Factory?

The Filter activity applies a filter expression to an input array. Understanding that definition will help simplify how and where to use this activity.

Let me set up the scenario for you. In this example, I want to use Azure Data Factory to loop over a list of files that are stored in Azure Blob Storage. I am going to use the Metadata activity to return a list of all the files from my Azure Blob Storage container. Unfortunately, I don’t want to process all the files in the directory location. Below I have posted the list of files currently in my storage account, notice the file name “inputEmp_tq.txt”. I want to remove this file from the list of files returned.

image

Get Metadata Activity – Get List of Files to process

I reviewed the metadata activity in the very first blog post in this Azure Data Factory series, therefore, I won’t bore you with those details again.

  1. The dataset is pointing to a folder location, not a specific file. This is important.
  2. I selected “Child Items” from the field list properties, this will return the name of all files in that directory location.

image

Filter Activity – Remove unwanted files from an input array

The first step is to add the filter activity to the pipeline and connect the activity to the successful output of the metadata activity:

image

Now it’s time to set up the Filter activity. The filter activity requires two items during configuration.

  1. Items – Input array on which filter should be applied.
  2. Condition – Condition to be used for filtering the input array.

The items will be the output of our metadata activity and the Condition I will build using the built in expression language. Only items that evaluate to true will be returned in the final array!

Filter Activity Configuration

First, I will configure the Items property. This is simply the output parameter from the metadata activity and so I will use the following code:

@activity(‘meta_GetListOfFiles’).output.childItems

image

Next, I will set up a condition that will remove any files that don’t match the naming pattern I want. In this scenario I am simply looking for file names that start with FactInternetSales_ and any files that don’t match this criteria will be removed from the final array. You can see the actual formula in the previous screenshot but I want to quickly show you how I found the function used for this example.

First, I am going to click my mouse cursor in the Condition box and then I will click on “Add dynamic content”.

image

If you have followed any of the blogs in this series then you are familiar with this new window that opens up. The function that I use in this example is a String Function called startswith. See the following screenshot on where I found this function:

image

Here is the final expression:

image

As always, thanks for reading my blog!

Azure Data Factory – If Condition activity

July 2, 2018 / Mitchell Pearson / 11 Comments

In part three of my Azure Data Factory series I showed you how the lookup activity could be used to return the output results from a stored procedure. In part one you learned how to use the get metadata activity to return the last modified date of a file. In this blog, we are going to use the if condition activity to compare the output of those two activities. If the last modified date of the file is greater than the last execution date (last time the file was processed) then the copy activity will be executed. In other words, the copy activity only runs if new data has been loaded into the file, currently located on Azure Blob Storage, since the last time that file was processed.

Check out the following links if you would like to review the previous blogs in this series:

  • Check out part one here: Azure Data Factory – Get Metadata Activity
  • Check out part two here: Azure Data Factory – Stored Procedure Activity
  • Check out part three here: Azure Data Factory – Lookup Activity

image_thumb7

Setup and configuration of the If Condition activity

For this blog, I will be picking up from the pipeline in the previous blog post. Therefore, this pipeline already has the get metadata activity and lookup activity and I am going to add the if condition activity and then configure it accordingly to read the output parameters from two previous activities.

  1. Expand the category Iteration & Conditionals in the activities pane.
  2. Select the if condition activity.
  3. Drag the if condition activity from the activities pane and drop it into the pipeline.

2_If_Condition_Pipeline

The next step is to configure the if condition activity to only execute after the lookup and get metadata activities complete successfully. This can be accomplished by using the built in constraints. The default constraint is set to success. This can be changed by simply selecting the constraint and then right clicking on it. There are currently four options available for the constraints:

  1. Successful – default behavior
  2. Failure
  3. Completed
  4. Skipped

To configure the constraints between activities is quite simple. Take a look at the animated gif below:

2_Connect_Activities

Now it’s time to set up the configuration of the if condition activity:

  1. With the if condition activity selected, navigate to the properties pane and rename the activity:
    • Name: Check if file is new

Adding a parameterized expression in Azure Data Factory

Now it is time to configure the settings tab for the if condition activity. The settings tab requires an expression that evaluates to either true or false. In this example, the expression needs to compare the output parameters from each of the previous tasks to determine if the file is new since the last load time. The “Add dynamic content” menu will help with building the expression for you, however, it will not give you the full path to the output parameter. In the previous blog post, I showed how you could identify the exact output parameter names after the debug phase. Let’s take another look at the output results:

3_Output_Results_Previous blog post

Let’s jump right in and build the parameterized expression:

  1. Select the settings tab from the properties window
  2. Click in the expression box
  3. Click the hyperlink that appears below the expression box “Add dynamic content”.

4_add_dynamic_content

System Variables and Functions in Azure Data Factory

In the Add Dynamic Content window there are some built in system variables and functions that can be utilized when building an expression. In this blog post I am going to use the built in function greaterOrEquals. This function will allow us to compare the two dates from the output parameters of our previous activities (Last Modified date and Last Execution date).

  1. Expand the functions category
  2. Next click to expand logical functions.
  3. Finally, click on the function greaterOrEquals, this function will now appear in the expression box.

5_greaterOrEquals_function

Now it’s time to finish building out the expression. The built in function, greaterOrEquals, expects two parameters separated by a comma. This is where I will insert the output parameters from the lookup and get metadata activities. As previously mentioned, the dynamic content window will help in referencing those outputs. In the animated gif below, you can see that I use the activity outputs to begin the parameter reference. Then I manually complete the expression by adding the specific parameter names. Remember, that we obtained these names by looking at the output of each activity after debugging the pipeline.

6_Parameterized_Expression

Adding activities to the If Condition activity

The final step is to add activities to if condition activity. You have two options under the activities tab. These options are If True activities and If False activities. In other words, what activities would you like to perform if the expression evaluates to true? Alternatively, what activities would you like to perform if the expression evaluates to false? For the sake of simplicity I will add a wait task to each activity condition. In the next blog post in this series I will replace the wait activity with the copy activity.

  1. Click on the Activities tab found in the properties window.
  2. Click the box “Add If True Activity”
  3. This will open a pipeline that is scoped only to the if condition activity.

7_Add_True_Activity

  1. Add the Wait activity to the new pipeline.
  2. I named the activity wait_TRUE to help during debug and validation.

Also, pay special attention to the breadcrumb link across the top of this pipeline. This makes navigation easy and also helps identify what scope you are currently developing in. As mentioned previously, the wait activity that was added is scoped to the if condition activity and only if the expression evaluates to true.

 8_Wait_activity_IfTrue

I want to add a final activity before debugging the pipeline. I want to add a wait activity to the if condition if the expression evaluates to false. To do this I have to navigate back to the if condition activity and select If False Activities under the activities property.

  1. I will use the breadcrumb link to navigate back to the main pipeline.
  2. Then select the Activities tab for the the If Condition activity.
  3. Finally, click the box for “Add If False Activity”.
  4. Add a wait activity to this pipeline and then name it wait_FALSE

9_Breadcrumb

With everything set up it’s now time to debug the pipeline. Since the last modified date of the file is 06/06/2018 and the last execution date is 06/13/2018, therefore, I would the wait activity defined within the If Condition-If False pipeline should be executed. As you can see from the results below, the wait_False activity was executed.

10_Debug_Results

Thanks for checking out my blog!

Azure Data Factory – Lookup Activity

June 25, 2018 / Mitchell Pearson / 9 Comments

In the next few posts of my Azure Data Factory series I want to focus on a couple of new activities. Specifically the Lookup, If Condition, and Copy activities. The copy activity in this pipeline will only be executed if the modified date of a file is greater than the last execution date. In other words, the copy activity only runs if new data has been loaded into the file located on Azure Blob Storage. The following diagram provides a visualization of the final design pattern.

  • Check out part one here: Azure Data Factory – Get Metadata Activity
  • Check out part two here: Azure Data Factory – Stored Procedure Activity

image

Setting up the Lookup Activity in Azure Data Factory v2

If you come from an SQL background this next step might be slightly confusing to you, as it was for me. For this demo we are using the lookup activity to execute a stored procedure instead of using the stored procedure activity. The reason we are not using the stored procedure activity is because it currently does not produce any output parameters and therefore the output of a stored procedure can not be used in the pipeline. Maybe this is something that Microsoft will add as functionality in the future, but for now the lookup activity will get the job done!

  1. First, I am going to drag the Lookup activity into the pipeline along with the Get Metadata activity from the previous blog posts.
  2. In the properties window I changed the name of the task to “Get Last Load Date” to make it more descriptive.

1_Add_Lookup_Activity

  1. After the name has been assigned from the previous step, select the Settings tab.
  2. Choose a “Source Dataset”. A dataset must be selected even though we are not returning the results from the selected dataset. We are going to return results from a stored procedure. I have chosen a dataset that links to my Azure SQL DB where my control table and stored procedure current exist.
  3. Finally, click the radial button for Stored Procedure and then choose your stored procedure from the drop down list.

2_Settings_Configuration

The stored procedure referenced in the previous step is a very simple stored procedure that is returning the last ExecutionDate from my Control Table. Keep in mind that this code would typically be a little more robust as this control table would have many rows for many different sources. For this blog I have kept the code simple to demo the functionality of the lookup task. Take a look at the stored procedure code below:

3_StoredProcedure

Now that the lookup component has been configured it’s time to debug the pipeline and validate the output results of our activities.

4_Debug_Test

After a successful run you can validate the output parameters by clicking on the buttons highlighted in the below image, this was explained in more detail back in the first post of this blog series.

5_Output_Results

In the next blog in this series I will outline how to use the output parameters from these activities using the If Condition activity. Thanks for reading my blog!

Azure Data Factory – Stored Procedure activity

June 18, 2018 / Mitchell Pearson / 4 Comments

Welcome to part two of my blog series on Azure Data Factory. In the first post I discussed the get metadata activity in Azure Data Factory. Now you are going to see how to use the output parameter from the get metadata activity and load that into a table on Azure SQL Database.

Check out part one here: Azure Data Factory – Get Metadata Activity

Design Pattern_Blog Image

In this blog post you are going to learn the following:

  1. How to execute a stored procedure in Azure Data Factory.
  2. How to use the output parameters from the Get Metadata task as input parameters for a stored procedure.

Setting up the Stored Procedure Activity in Azure Data Factory

I am going to pick up where I left on in the last blog and add the stored procedure activity to my existing pipeline.

  1. Add the stored procedure activity to the pipeline, it’s found under “General”.
  2. Next, connect the two activities, the stored procedure will only execute following a successful completion of the get metadata activity.

1_Connect Activities

  1. Click on the stored procedure activity to get to the properties window and then click on “SQL Account”.
  2. Under SQL account select the linked service to your Azure SQL Database. If you haven’t already created a linked service, click + new to create a new one.

2_SQL Account

  1. Now click the “Stored Procedure” tab in the properties window.
  2. Choose your stored procedure from the dropdown menu. This dropdown will display all stored procedures on your Azure SQL DB. The stored procedure in this blog is inserting the modified date of the file into a table.
  3. Click on import parameter. This will import any parameters found in your stored procedure. Notice that the parameter ModifiedDate is returned in the parameter list.

3_Stored Procedure

Dynamically setting the Value

The last step is to now pass a value into the parameter ModifiedDate so that the table can be properly updated. The Modified Date value produced from the Get Metadata activity is the value we want to pass into the stored procedure. You can manually type in the output parameter reference or you can use the option to add dynamic content.

  • Add dynamic content by either clicking the hyperlink “add dynamic content [Alt + P]” or by clicking in the Value box and then using the hotkey [Alt + P].

4_Add Dynamic Content

  1. Scroll to the very bottom of this dynamic content list and then click “Get Metadata1” under activity outputs. Unfortunately, this does not generate the full expression for returning the last modified date of the property. We still need to specify which output we want here.
  2. Complete the expression by adding .lastModified.
  3. Click Finish.

Tip:  Getting the exact name of the output parameter can be challenging, we discussed how to find the property name in the get metadata blog post. If you missed that blog, or forgot, now would be a good time to go back and do a quick review.

5_output_lastModified

As the last step, debug and test your pipeline! Thanks for reading this blog and please check back often!

Azure Data Factory – Get Metadata Activity

June 11, 2018June 11, 2018 / Mitchell Pearson / 7 Comments

Welcome to part one of a new blog series I am beginning on Azure Data Factory. In this first post I am going to discuss the get metadata activity in Azure Data Factory. In this post you are going to see how to use the get metadata activity to retrieve metadata about a file stored in Azure Blob storage and how to reference the output parameters of that activity. In part two of this blog series I am going to show you how to use the stored procedure activity to load the metadata information into a table in Azure SQL Database. Take a look at the below design pattern:

Design Pattern_Blog Image

In this blog post you are specifically going to learn the following three items:

  1. Set up and configuration of the activity.
  2. How to read the outputs.
  3. How to reference output parameters from the Get Metadata activity.

Part 1: Setting up the Get Metadata activity.

First, I am going to create a new pipeline and then add the Get Metadata activity to the pipeline.

1_NewPipeline_GetMetadata

Next, I am going to set up and configure the activity to read from a file I have in Azure Blob storage.

Please note, for this post I assume you know how to create a dataset in Azure Data Factory. If you do not, kindly let me know and I can throw together a quick blog on how that is done!

  • With the Get Metadata activity selected, complete the following tasks:
    1. Click on Dataset in the property window.
    2. Select your dataset from the dropdown, or create a new one that points to your file.
    3. Select the property Last Modified from the fields list.
    4. Select the property Size from the fields list.
    5. Select any other properties you would like to get information about.

image

Part two: How to read the outputs from Get Metadata activity

Now that the activity has been configured, it’s time to run it in debug mode to validate the output parameters.

3_Debug

Once debug completes you can now take a look at the output of the debug execution for any of the activities in your pipeline. We only have the one activity in this example. Click on the output to see the output values for the items selected:

Tip: If you don’t see the output of the debug operation, click in the background of the pipeline to deselect any activities that may be selected. The output of the debug operation is a property on the pipeline, not any particular activity.

4_Outputs

Part 3: How to reference the output parameters

For me, this was the hard part, I discovered early on that there is no “Output Parameter” option defined on any of the activities, this is something I just expected since I come from a background of SQL and SSIS. However, not all is lost, you will find that referencing these output parameters is not that difficult and they have a basic pattern you can work with. But that will come after we cover the basics!

For now, let’s take a look at the basic pattern:

@activity(‘Get Metadata1’).output

This pattern can be broken down into three basic parts.

  1. @activity – specifies you are getting information from an activity in the pipeline.
  2. ‘Get Metadata1’ – This is the name of the activity that you want to get information from.
  3. .output – you are looking for an output from the specified activity.

Unfortunately this part is not complete, now you have to specify exactly which output parameter you want, and you have to figure out how to call that parameter. For example, do you want to retrieve the Last Modified date or the Size? what specific name do you use to call that parameter? For example, @activity(‘Get Metadata1’).output.Last Modified won’t work because Last Modified is the incorrect name of the output parameter, so the challenge now is figuring out what that output parameter name is so you can use that somewhere else in your pipeline.

So how do you get the specific name of the output parameters?

You can get the specific name of the output parameters by taking a look at the output results of the Debug operation. Take another look at the output results and you will see the exact name needed for the output parameter reference. Last Modified is going to be lastModified and the final code to reference the output parameter will look like the following:

@activity(‘Get Metadata1’).output.lastModified

5_ParameterNames

As always, thanks for checking out my blog! Be sure to check out the other blogs in this series to get a better understanding of how to use use the output parameter in other activities.

Posts navigation

Newer posts →

Recent Posts

  • Working with Averages in DAX using AVERAGEX, SUMMARIZE and DATESINPERIOD
  • Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory
  • Dynamically changing title names in Power BI
  • Working with Parameters and Expressions in Azure Data Factory
  • Introduction to Wrangling Data Flows in Azure Data Factory

Recent Comments

VINICIUS AUGUSTUS PA… on Azure Data Factory–Filte…
Mitchell Pearson on Advanced TSQL Takeover
lordi on Advanced TSQL Takeover
Mitchell Pearson on The transaction log for databa…
You cant see me on The transaction log for databa…

Archives

  • January 2021
  • December 2020
  • September 2020
  • May 2020
  • April 2020
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • August 2017
  • July 2017
  • June 2017
  • April 2017
  • August 2016
  • May 2016
  • April 2016
  • February 2016
  • July 2015
  • June 2015
  • April 2015
  • February 2015
  • January 2015
  • November 2014
  • September 2014
  • August 2014
  • June 2014
  • May 2014
  • February 2014
  • January 2014

Categories

  • Azure Data Factory
  • Azure Logic Apps
  • BIxPress
  • DAX
  • DBA
  • MDX
  • Performance Tuning
  • Power BI
  • PowerApps
  • Problem, Design, Solution
  • Professional / Organization
  • Scripting
  • Speaking
  • SQL Tips
  • SSAS – Analysis Services
  • SSIS
  • T-SQL
  • Task Factory
  • Tips and Tricks
  • Working with R

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Top Posts & Pages

  • Dynamically changing title names in Power BI
  • ALL vs ALLSELECTED in DAX and Power BI Desktop
  • Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory
  • SQL Tips! How to edit the query for Edit TOP 200 Rows in Management Studio (SSMS)
  • How to add Data Labels to maps in Power BI
Create a free website or blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Mitchellsql
    • Join 93 other followers
    • Already have a WordPress.com account? Log in now.
    • Mitchellsql
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar