Azure Data Factory–Copy Data Activity

In this blog, we are going to review the Copy Data activity. The Copy Data activity can be used to copy data among data stores located on-premises and in the cloud.

In part four of my Azure Data Factory series, I showed you how you could use the If Condition activity to compare the output parameters from two separate activities. We determined that if the last modified date of the file (located in blob storage) was greater than the last load time then the file should be processed. Now we are going to proceed with that processing using the Copy activity. If you are new to this series you can view the previous posts here:

If Condition Activity – If True activities

As mentioned in the intro, we are going to pick up where we left on in the last blog post. Remember that the If Condition activity evaluates an expression and returns either true or false. If the result is true then one set of activities are executed and if the result is false then another set of activities will be executed. In this blog post we are going to focus on the If True activities.

  • First, select the If Condition activity and then click on activities from the properties pane, highlighted in yellow in the image below.
  • Next, click on the box “Add if True Activity”.

image

This will launch a new designer window, and this is where we will be configuring our Copy Data activity.

image

Copy Data Activity, setup and configuration

Fortunately the Copy Data activity is rather simple to set up and configure, this is a good thing since it is likely you will be using this activity quite often. The first thing we need to do is add the activity to our pipeline. Select the Copy Data activity from the Data Transformation category and add it to the pipeline.

image

Now we need to set up the source and the sink datasets, and then map those datasets. You can think of the sink dataset as the destination for all intents and purposes.

Select the copy data activity and then click on the Source tab found in the properties window. For the dataset I am simply going to choose the same dataset we used in our first blog in this series.

image

Creating the Sink dataset in Azure Data Factory

Now it’s time to set up the Sink (destination) dataset. My destination dataset in this scenario is going to be an Azure SQL Database. I am going to click on Sink and then I will click on + New.

image

When you click on + New, this will launch a new window with all your data store options. I am going to choose Azure SQL Database.

image

This will launch a new tab in a Azure Data Factory where you can now configure your dataset. In the properties window select “Connection”, here you will select your linked service, this is your connection to your database. If you haven’t already created a linked service to your database then click on + New. Clicking new will open up a new window where you can input the properties for your linked service. Take a look at the screenshot below:

image

Once the linked service account has been created I will select the table I want to load. All available tables will show up in the drop down list. I will select my dbo.emp table from the drop down.

image

The final step for the dataset is to import the schema, this will be used when we perform the mapping between the source and sink datasets. Next, I am going to click on the Schema tab. On the Schema tab I will select import schema, this actually returns a column that I don’t need and so I will remove the column from the schema as well. Take a look at steps three and four in the following screenshot:

image

Copy Data activity – Mapping

Now that the source and sink datasets have been configured, it’s time to finish configuring the Copy Data activity. The final step is to map these two datasets. Now I will navigate back to the pipeline and click on the copy data activity. From the properties window I will select the Mapping tab. Then I will import the schemas, importing schemas will import the schemas from the source and the destination. This step will also attempt to automatically map the source and the destination as well and so it is important to verify that the mapping is correct. Unfortunately my source dataset was a file and it did not have any column headers. Azure Data Factory automatically created the column headers Prop_0 and Prop_1 for my first and last name columns. Take a look at the following screenshot:

image

This was a simple application of the Copy Data activity, in a future blog post I will show you how to parameterize the datasets to make this process dynamic. Thanks for reading my blog!

Quick Tips–Updating Parameters from the PBI Service

In this quick tip I want to share how you can update Parameters in Power BI from the service. Previously, this was not an option and Parameters could only be updated from Power BI Desktop.

Updating Parameters in Power BI

To update parameters, navigate to the Datasets section in the Power BI Service. Next, click on the schedule refresh icon. Updating parameters is done from the schedule refresh window.

SNAGHTML20f21a8d

Click on Parameters from the schedule refresh window. Once expanded you will see available parameters, simply update the parameters and you’re done!

image

Thanks for reading!

Quick Tips – Export data from Power BI using R

One of my favorite tricks in Power BI is to use R integration to quickly export data outside of Power BI. This is also a very popular question among Power BI enthusiasts. Once users realize the true capabilities and easy of the Power Query editor to transform and clean data they want to clean up their files using Power BI but the challenge is then how do I get the data out?

Update: Video embedded at bottom of blog post.

Prerequisite – R

If you want to test this method out you need to install R on your machine, for your convenience see the following link:

https://mran.microsoft.com/open/

Export data from Power BI using R

Like all of my quick tip blogs, this will be quick Smile.

You will export data from the Power Query Editor. To launch the Power Query Editor click Edit Queries from the Home ribbon in Power BI.

image

Inside the Power Query Editor, click the transform ribbon and then click on R.

SNAGHTML3d882ca

Once you click on the large letter R seen in the screenshot below a new window will open. Now you just have to type in some basic code. Here is the pseudo code:

write.csv(dataset, <destination>)

image

That’s it! Click ok and check your folder for the file. Do notice that I used two backslash characters, this is required. The other question I get is can I write the results to an excel file or to a SQL Database table and the answer is…. Yes! R has packages that make doing particular tasks easier and if you want to write to an Excel file or SQL Server table you would need to install those packages. My hope is to do some follow-up post around this topic.

image

 

Thanks for checking out my blog!

Quick Tips – Connecting to a SharePoint Folder in Power BI

Recently, when teaching the Advanced Power BI Course by Pragmatic Works, I had a student ask how to use the SharePoint Folder in Power BI. She had been struggling with getting it to work correctly. After trying a few different URLS I could see why she was having a problem. Since you’re reading this, you are probably having the same kind of issues. The error message is actually pretty good, but a little confusing. When the error says “the site’s root URL only.” I took this to mean SharePoint’s URL, however, it actually means the URL of the site within SharePoint.

The URL isn’t valid. Please enter the site’s root URL only.

image

Let’s take a look at how you can properly connect to a SharePoint Folder in Power BI and extract the files.

Connecting to a SharePoint Folder in Power BI

In my first attempt I tried connecting to simply the root of my SharePoint site.

https://pragmaticworksonline.sharepoint.com

image

This only allowed access to the Shared Documents folder on the root and therefore didn’t work.

image

As you can see in the above image, this only allowed access to the Shared Documents folder on the root of the SharePoint site.

Connecting to a specific SharePoint site

Next, I tried connecting directly to the root of a particular site. I connected to the training site on SharePoint using the following URL:

image

This worked, the SharePoint folder option will traverse subfolders, just like the default from folder option. Therefore, if you are looking for a particular directory then you can simply apply a filter in the Power Query Editor on the Folder Path property.

Thanks for reading this blog!

Unexpected Totals in DAX (Part 3)

At some point or another every DAX author realizes that the total row is not the sum of each row/cell in a given column. This can be quite confusing and is exactly why I am writing a third blog post dedicated to the total row.  If you missed part one or part two in this series you can find them below:

Part 1: Unexpected Totals in DAX (Part 1)

Part 2: Unexpected Totals in DAX (Part 2)

Problem

In this blog, we want to return YTD Sales for previous months and Forecast YTD Sales for the current month. Therefore, there should be only one measure and that measure should return YTD Sales or Forecast YTD based on the month.

In the following table there are three measures, the first measure is YTD Sales and it tracks the actual sales. The second measure is Forecast YTD Sales and this is the forecasted sales. The third measure is the Dynamic Measure, this is the proposed measure designed to replace the other two measures. However, the total row produces unexpected results, and therefore is perfect for this blog post:

SNAGHTML1168985d

In the above screenshot, the total row for the dynamic measure is displaying $19,582,120 but the sum of all the rows is actually $28,164,680. Let’s take a look at the dynamic measure and figure out why the total row is not what we might expect.

Dynamic Measure

The “Dynamic Measure” is returning the YTD Sales for completed months and returning the Forecast YTD Sales for the current month. Let’s take a quick look at the measure just to understand better what is going on. Please note, for this example we assume there are sales for every day of the year.

image

  • A = Two variables to make the code more readable (See descriptions below):
    • LastSaleDate = Return the last day the company had a sale (in the current filter context)
    • LastDayOfMonth = Return the last day from the date table (in the current filter context)
  • B = The logical test performed by the IF function.
    • If the two variables don’t match, then it is the current month and [Forecast YTD Sales] is returned.
    • If the two variables do match, then it is a completed month and [YTD Sales] is returned.

Filter Context of the Total Row

The total row is executing the dynamic measure within the context at the total row. What is the last day in the date table within the current filter context? At the total row it is 12/31/2008, there is a filter on the report page that filters the report down to only the year 2008. The last day that there was a valid sale was on 6/20/2008. Since 12/31/2008 does not equal 6/20/2008 the calculation returns Forecasted YTD Sales, which at the total level is $19,582,120.60.

Steps to solve this problem:

  1. Determine if the calculation is at the total row.
  2. For the total row perform the Dynamic Measure calculation for each month in the table separately.
  3. Sum the results of each result separately.

However, steps two and three above are a little bit more complex than they sound and therefore to complete the solution we need to introduce you to two new functions in the DAX language: VALUES and SUMX. The great news is this is a design pattern that you will be able to use in many different scenarios!

The VALUES function in DAX

MSDN Definition:
Returns a one-column table that contains the distinct values from the specified table or column.

Remember step 2 from above? We want to execute our dynamic measure against each row in the table, the VALUES function will get us the distinct list of the months. Let’s take a look at the VALUES function in action. I have created a table using the following formula to display the results:

SNAGHTML11aaa388

image

The VALUES function returned a distinct list of months. This is exactly what we need. Now, how do we execute our measure against each row in this table and then SUM all the results at the very end to get the expected value at the total row? SUMX!!

Working with the SUMX Function in DAX

MSDN Definition: Returns the sum of an expression evaluated for each row in a table.

Syntax:

image

Perfect! The SUMX function is going to iterate over the list of months and execute the dynamic measure against each month. After this process has completed the SUMX function will then SUM the results returned for each month.

SUMX accepts two parameters, the first parameter is a table and the second parameter is the expression. The table that is returned from the VALUES function is the first parameter and the dynamic measure is passed in as the second parameter (expression). Let’s take a look at the final solution.

Solution

The following solution now returns the expected results for the total row:

image

Here are the results, pay special attention to the total row:

image

Thanks for reading my post!

Unexpected Totals in DAX (Part 2)

In a previous blog post we discussed how to replace the total row with a blank value, primarily to eliminate confusion. You can find the original post here: Unexpected Totals in DAX (Part 1)

In this post we want to go a step further and replace the total row with our own DAX calculation.

image_thumb5

Our goal is to replace the value of 8,691 with the value of 1,005. The value of 1,005 is the total number of new homes listed on the market in 2016, which in our case is the last year or current year of the data set we are looking at. This is slightly more challenging though, because our date table goes to the year 2018. But just wait, we will get to that shortly.

Let’s take a look at the steps to solve this problem:

  1. Identify if the calculation is at the total row, we will use HASONEVALUE as we did in the previous blog post.
  2. Determine the MAX year in the data set with homes on market, not the last year in your date table, but the last year with actual homes listed on the market.
  3. Write a calculation that returns the New Homes on Market for the last year in the data set.

I am going to create a new measure so we can look at the two measures side by side.

image

I have modified the original DAX calculation, here we first check to see if we are at the total row using HASONEVALUE, if we are at the total row then we return blank. If this doesn’t make sense, please stop and go back to Part 1 where I cover this in detail.

  • A = Check to see if the current filter context has one value, if not, then we are at total row.
  • B = If there is more than one year in the filter context, replace with a blank value.

Determine the last year with homes listed.

Now that the total row has been identified, it’s time to author a DAX formula that returns the total homes listed for the last year in our data. The last year with homes listed in our data is 2016, therefore we need to write a DAX formula that reads like this:

Return the number of new homes listed in 2016.

Now, technically we can’t write the year 2016 in our formula because we know that this would no longer work once we move into the next year and we need our DAX formula to be dynamic (automated) and change with the years. Here is our first attempt at getting the MAX year.

image

  • A = Return the MAX year in the Filter Context
  • B = Return the MAX year from the variable at the total row.

image

For demo and validation purposes I am displaying the results of the variable in the total row (The max year), here we can see that the results are maybe not what we would have expected. The year 2018 is the last year in our date table but not the last year that new homes were listed. One way to get the last year with homes listed is to use the function LASTNONBLANK.

LASTNONBLANK

For the sake of brevity I won’t cover LASTNONBLANK here, but I will do a separate blog series on semi-additive measures. Let’s rewrite the DAX formula:

image

Using LASTNONBLANK we now get the last year that we had new homes listed. See results below:

image

Perfect! The hard part is over, now we simply count the total rows where the listing date of the home equals the year 2016.

image

Here is the final result:

image

As always, thanks for reading and I hope this helped!