Working with Averages in DAX using AVERAGEX, SUMMARIZE and DATESINPERIOD

In this blog post, we are going to dive into the world of averages and how to solve problems when calculating averages by using CALCULATE, SUMMARIZE and AVERAGEX. Specifically, we look at creating an average calculation across months and as you will see in the video below, this can be a challenge on multiple levels!

Working with SUMMARIZE and SUMMARIZECOLUMNS in DAX

Our goal in this video is to return the average sales for the last 3 months, see Figure-1 below.

SNAGHTML9292404

Figure 1 – Average of last 3 months of sales

Our first attempt at solving this problem may be to just use the expression AVERAGE([Sales Amount]) as seen in Figure-2 below. However, the AVERAGE function will first return the AVERAGE of all the transactions for the month and therefore the results of the following calculation are not quite as expected:

image

Figure 2 – Average of Sales Amount, incorrect calculation.

The result of the expression in Figure 2 can be seen in the following screenshot. Ultimately, average is returning the SUM of the Sales Amount divided by the number of transactions for each month.

SNAGHTML92fc901

Figure 3– Average of Sales Amount

Now that we understand this behavior and what is occurring, we can take a direct approach at solving the problem. First, we need to create a virtual table that contains the months along with their total sales. In order to create this aggregated table we can use SUMMARIZE or SUMMARIZECOLUMNS. In this scenario I will use SUMMARIZE because SUMMARIZECOLUMNS doesn’t quite work correctly within a modified filter context and we will need to change the default filter context to get 3 months in our aggregate table.

The following expression will return an aggregate table with the Year, Month and Total Sales.

image

Figure 4 – Creating a virtual, aggregated table with SUMMARIZE

SUMMARIZE is a function that will return a table expression and therefore you would get an error message if you tried to just put this expression in a calculated measure. However, you could test this out or “DEBUG” this expression by putting it in a calculated table or running the expression from Dax Studio. See the results of this expression from Dax Studio below in Figure 5:

image

Figure 5 – A view of the virtualized table created by SUMMARIZE

Working with AVERAGEX, CALCULATE and DATESINPERIOD

There are a few challenges still left to solve to get our desired result. First, you can’t use the AVERAGE function on a table or table expression, you must provide a column and therefore the result produced by SUMMARIZE here cannot be used by AVERAGE. However, X Functions like AVERAGEX accept a table or table expression as their first parameter and therefore work perfectly with this virtual table.

The next challenge is that the virtual table is returned within the current filter context and therefore we would always be returning the AVERAGE of the current month and only the current month. UNLESS, we use the CALCULATE function to evaluate the expression within a modified filter context, which is exactly what we would do! The calculation in Figure 6 below is the expression that returns the desired result seen in Figure 1.

SNAGHTML94d9dd9

Figure 6 – Final calculation

Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory

Hey everyone, back in October I did a 3 hour live stream on YouTube for introduction to Azure. A big part of that 3 hours focused on Azure Data Factory. In this post, I am responding to one of the questions that I received during that live stream with a blog accompanied with a YouTube video.

Is there a way to create a secure connection between Azure Data Factory and Azure SQL DB?

Check out my YouTube video showing how to set up Managed Virtual Networks and Private Endpoints:

Let’s first take a look at the two methods I discussed in the live stream. I showed how to add the IP address of the Azure VM that was making the connection from Azure Data Factory. The method of using the IP address is problematic because the IP address is not static and will change. Therefore, adding the IP Address is a not a permanent fix. The second method I showed was that you could turn on Allow Azure Services. This will work, but….. many companies consider this a bit of a security risk.

When Allow All Azure services is enabled, any Azure Resource can try to authenticate to your Azure SQL DB and that’s a problem for many organizations.

Managed Virtual Network (V-Net) connections and Private End Points in Azure Data Factory

Creating private end points to all your services in Azure is recommended as a best practice and therefore we will be covering the necessary steps here.

image

Creating a secure connection between your Azure services is a 3 step process.

  1. Create an Azure Integration runtime and enable Virtual Network Configuration
  2. Create a Managed Private Endpoint to the Azure Service (Azure SQL DB, Azure Storage, ect..)
  3. Approve the private endpoint request through the Private Link Center

Azure Integration Runtime with Managed VNET in ADF and Synapse

Integration runtimes are the compute that is used to move resources. You are billed based on the amount of Data Integration Units (DIUs) that are used during the data movement process. To securely move your data in a managed virtual network, you first need to make sure that your Azure Integration runtime is created within a managed virtual network. This can be configured when you are provisioning the Data Factory for the first time or later from manage tab.

Note: At the time of this writing/video, Azure Synapse workspaces require that you configure this property when you are provisioning the Synapse resource. If you create your Synapse workspace and you do not enable virtual network configuration, you will not be able to enable it after the fact. Here is a screenshot from the Microsoft documentation on this:

Here are the steps to create an Integration Runtime within a Managed Virtual Network.

  1. Select the manage tab in Data Factory / Synapse
  2. Click on +New
  3. Select Azure when prompted.
  4. On the next screen, name your Integration Runtime and enable Virtual Network Configuration
  5. Click Create.

image

image

How to create Managed Private Endpoints

Once the Integration Runtime with the Managed Virtual Network has been created, you need to create managed private endpoints. Your private endpoint is a private IP address connecting your ADF and Synapse pipelines to a specific resource. Therefore you will create a private endpoint for each data store (Blob, ADLS, Azure SQL DB) that you wish to securely connect to.

To create a managed private endpoint in Azure Data Factory and Synapse, go to your Manage hub, then click on Managed private endpoints, then click New. Keep in mind, this will be disabled and not available until after you have created the Integration Runtime with the managed virtual network.

image

Next, choose the resource in Azure that you want to connect to.

Azure Private Link Center and Approving Private Endpoints

Once the private endpoint has been created it will be in a “Pending” state. This will need to be approved. You can approved a private endpoint from the specific resource or you can go to the Azure Private Link center.

In Azure, search for Private Link and then select Private Link from the list of services returned.

image

Once in the Private Link Center go to Pending Connections, from here, you can approve, reject or remove any connections that may be pending. In my screenshot I don’t have any pending connections because I approved them in the video!

image

Wrapping it up

If you’re like me, networking is a tough topic, I come from a background of writing code, developing solutions and performance tuning. In on-prem development scenarios I let the specialist handle things like networking. With Azure the developer can branch out and learn new things! I hope you enjoyed this blog / video series. Thanks for reading!