Azure Data Factory–Copy Data Activity

July 30, 2018 / Mitchell Pearson / 1 Comment

In this blog, we are going to review the Copy Data activity. The Copy Data activity can be used to copy data among data stores located on-premises and in the cloud.

In part four of my Azure Data Factory series, I showed you how you could use the If Condition activity to compare the output parameters from two separate activities. We determined that if the last modified date of the file (located in blob storage) was greater than the last load time then the file should be processed. Now we are going to proceed with that processing using the Copy activity. If you are new to this series you can view the previous posts here:

Check out part one here: Azure Data Factory – Get Metadata Activity
Check out part two here: Azure Data Factory – Stored Procedure Activity
Check out part three here: Azure Data Factory – Lookup Activity
Check out part four here: Azure Data Factory – If Condition Activity

If Condition Activity – If True activities

As mentioned in the intro, we are going to pick up where we left on in the last blog post. Remember that the If Condition activity evaluates an expression and returns either true or false. If the result is true then one set of activities are executed and if the result is false then another set of activities will be executed. In this blog post we are going to focus on the If True activities.

First, select the If Condition activity and then click on activities from the properties pane, highlighted in yellow in the image below.
Next, click on the box “Add if True Activity”.

This will launch a new designer window, and this is where we will be configuring our Copy Data activity.

Copy Data Activity, setup and configuration

Fortunately the Copy Data activity is rather simple to set up and configure, this is a good thing since it is likely you will be using this activity quite often. The first thing we need to do is add the activity to our pipeline. Select the Copy Data activity from the Data Transformation category and add it to the pipeline.

Now we need to set up the source and the sink datasets, and then map those datasets. You can think of the sink dataset as the destination for all intents and purposes.

Select the copy data activity and then click on the Source tab found in the properties window. For the dataset I am simply going to choose the same dataset we used in our first blog in this series.

Creating the Sink dataset in Azure Data Factory

Now it’s time to set up the Sink (destination) dataset. My destination dataset in this scenario is going to be an Azure SQL Database. I am going to click on Sink and then I will click on + New.

When you click on + New, this will launch a new window with all your data store options. I am going to choose Azure SQL Database.

This will launch a new tab in a Azure Data Factory where you can now configure your dataset. In the properties window select “Connection”, here you will select your linked service, this is your connection to your database. If you haven’t already created a linked service to your database then click on + New. Clicking new will open up a new window where you can input the properties for your linked service. Take a look at the screenshot below:

Once the linked service account has been created I will select the table I want to load. All available tables will show up in the drop down list. I will select my dbo.emp table from the drop down.

The final step for the dataset is to import the schema, this will be used when we perform the mapping between the source and sink datasets. Next, I am going to click on the Schema tab. On the Schema tab I will select import schema, this actually returns a column that I don’t need and so I will remove the column from the schema as well. Take a look at steps three and four in the following screenshot:

Copy Data activity – Mapping

Now that the source and sink datasets have been configured, it’s time to finish configuring the Copy Data activity. The final step is to map these two datasets. Now I will navigate back to the pipeline and click on the copy data activity. From the properties window I will select the Mapping tab. Then I will import the schemas, importing schemas will import the schemas from the source and the destination. This step will also attempt to automatically map the source and the destination as well and so it is important to verify that the mapping is correct. Unfortunately my source dataset was a file and it did not have any column headers. Azure Data Factory automatically created the column headers Prop_0 and Prop_1 for my first and last name columns. Take a look at the following screenshot:

This was a simple application of the Copy Data activity, in a future blog post I will show you how to parameterize the datasets to make this process dynamic. Thanks for reading my blog!

Quick Tips–Updating Parameters from the PBI Service

July 23, 2018July 23, 2018 / Mitchell Pearson / Leave a comment

In this quick tip I want to share how you can update Parameters in Power BI from the service. Previously, this was not an option and Parameters could only be updated from Power BI Desktop.

Updating Parameters in Power BI

To update parameters, navigate to the Datasets section in the Power BI Service. Next, click on the schedule refresh icon. Updating parameters is done from the schedule refresh window.

Click on Parameters from the schedule refresh window. Once expanded you will see available parameters, simply update the parameters and you’re done!

Thanks for reading!

Quick Tips – Export data from Power BI using R

July 15, 2018October 31, 2018 / Mitchell Pearson / 7 Comments

One of my favorite tricks in Power BI is to use R integration to quickly export data outside of Power BI. This is also a very popular question among Power BI enthusiasts. Once users realize the true capabilities and easy of the Power Query editor to transform and clean data they want to clean up their files using Power BI but the challenge is then how do I get the data out?

Update: Video embedded at bottom of blog post.

Prerequisite – R

If you want to test this method out you need to install R on your machine, for your convenience see the following link:

https://mran.microsoft.com/open/

Export data from Power BI using R

Like all of my quick tip blogs, this will be quick Smile .

You will export data from the Power Query Editor. To launch the Power Query Editor click Edit Queries from the Home ribbon in Power BI.

Inside the Power Query Editor, click the transform ribbon and then click on R.

Once you click on the large letter R seen in the screenshot below a new window will open. Now you just have to type in some basic code. Here is the pseudo code:

write.csv(dataset, <destination>)

That’s it! Click ok and check your folder for the file. Do notice that I used two backslash characters, this is required. The other question I get is can I write the results to an excel file or to a SQL Database table and the answer is…. Yes! R has packages that make doing particular tasks easier and if you want to write to an Excel file or SQL Server table you would need to install those packages. My hope is to do some follow-up post around this topic.

Thanks for checking out my blog!

Azure Data Factory – If Condition activity

July 2, 2018 / Mitchell Pearson / 11 Comments

In part three of my Azure Data Factory series I showed you how the lookup activity could be used to return the output results from a stored procedure. In part one you learned how to use the get metadata activity to return the last modified date of a file. In this blog, we are going to use the if condition activity to compare the output of those two activities. If the last modified date of the file is greater than the last execution date (last time the file was processed) then the copy activity will be executed. In other words, the copy activity only runs if new data has been loaded into the file, currently located on Azure Blob Storage, since the last time that file was processed.

Check out the following links if you would like to review the previous blogs in this series:

Check out part one here: Azure Data Factory – Get Metadata Activity
Check out part two here: Azure Data Factory – Stored Procedure Activity
Check out part three here: Azure Data Factory – Lookup Activity

Setup and configuration of the If Condition activity

For this blog, I will be picking up from the pipeline in the previous blog post. Therefore, this pipeline already has the get metadata activity and lookup activity and I am going to add the if condition activity and then configure it accordingly to read the output parameters from two previous activities.

Expand the category Iteration & Conditionals in the activities pane.
Select the if condition activity.
Drag the if condition activity from the activities pane and drop it into the pipeline.

The next step is to configure the if condition activity to only execute after the lookup and get metadata activities complete successfully. This can be accomplished by using the built in constraints. The default constraint is set to success. This can be changed by simply selecting the constraint and then right clicking on it. There are currently four options available for the constraints:

Successful – default behavior
Failure
Completed
Skipped

To configure the constraints between activities is quite simple. Take a look at the animated gif below:

Now it’s time to set up the configuration of the if condition activity:

With the if condition activity selected, navigate to the properties pane and rename the activity:
- Name: Check if file is new

Adding a parameterized expression in Azure Data Factory

Now it is time to configure the settings tab for the if condition activity. The settings tab requires an expression that evaluates to either true or false. In this example, the expression needs to compare the output parameters from each of the previous tasks to determine if the file is new since the last load time. The “Add dynamic content” menu will help with building the expression for you, however, it will not give you the full path to the output parameter. In the previous blog post, I showed how you could identify the exact output parameter names after the debug phase. Let’s take another look at the output results:

Let’s jump right in and build the parameterized expression:

Select the settings tab from the properties window
Click in the expression box
Click the hyperlink that appears below the expression box “Add dynamic content”.

System Variables and Functions in Azure Data Factory

In the Add Dynamic Content window there are some built in system variables and functions that can be utilized when building an expression. In this blog post I am going to use the built in function greaterOrEquals. This function will allow us to compare the two dates from the output parameters of our previous activities (Last Modified date and Last Execution date).

Expand the functions category
Next click to expand logical functions.
Finally, click on the function greaterOrEquals, this function will now appear in the expression box.

Now it’s time to finish building out the expression. The built in function, greaterOrEquals, expects two parameters separated by a comma. This is where I will insert the output parameters from the lookup and get metadata activities. As previously mentioned, the dynamic content window will help in referencing those outputs. In the animated gif below, you can see that I use the activity outputs to begin the parameter reference. Then I manually complete the expression by adding the specific parameter names. Remember, that we obtained these names by looking at the output of each activity after debugging the pipeline.

Adding activities to the If Condition activity

The final step is to add activities to if condition activity. You have two options under the activities tab. These options are If True activities and If False activities. In other words, what activities would you like to perform if the expression evaluates to true? Alternatively, what activities would you like to perform if the expression evaluates to false? For the sake of simplicity I will add a wait task to each activity condition. In the next blog post in this series I will replace the wait activity with the copy activity.

Click on the Activities tab found in the properties window.
Click the box “Add If True Activity”
This will open a pipeline that is scoped only to the if condition activity.

Add the Wait activity to the new pipeline.
I named the activity wait_TRUE to help during debug and validation.

Also, pay special attention to the breadcrumb link across the top of this pipeline. This makes navigation easy and also helps identify what scope you are currently developing in. As mentioned previously, the wait activity that was added is scoped to the if condition activity and only if the expression evaluates to true.

I want to add a final activity before debugging the pipeline. I want to add a wait activity to the if condition if the expression evaluates to false. To do this I have to navigate back to the if condition activity and select If False Activities under the activities property.

I will use the breadcrumb link to navigate back to the main pipeline.
Then select the Activities tab for the the If Condition activity.
Finally, click the box for “Add If False Activity”.
Add a wait activity to this pipeline and then name it wait_FALSE

With everything set up it’s now time to debug the pipeline. Since the last modified date of the file is 06/06/2018 and the last execution date is 06/13/2018, therefore, I would the wait activity defined within the If Condition-If False pipeline should be executed. As you can see from the results below, the wait_False activity was executed.