Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory

Hey everyone, back in October I did a 3 hour live stream on YouTube for introduction to Azure. A big part of that 3 hours focused on Azure Data Factory. In this post, I am responding to one of the questions that I received during that live stream with a blog accompanied with a YouTube video.

Is there a way to create a secure connection between Azure Data Factory and Azure SQL DB?

Check out my YouTube video showing how to set up Managed Virtual Networks and Private Endpoints:

Let’s first take a look at the two methods I discussed in the live stream. I showed how to add the IP address of the Azure VM that was making the connection from Azure Data Factory. The method of using the IP address is problematic because the IP address is not static and will change. Therefore, adding the IP Address is a not a permanent fix. The second method I showed was that you could turn on Allow Azure Services. This will work, but….. many companies consider this a bit of a security risk.

When Allow All Azure services is enabled, any Azure Resource can try to authenticate to your Azure SQL DB and that’s a problem for many organizations.

Managed Virtual Network (V-Net) connections and Private End Points in Azure Data Factory

Creating private end points to all your services in Azure is recommended as a best practice and therefore we will be covering the necessary steps here.

image

Creating a secure connection between your Azure services is a 3 step process.

  1. Create an Azure Integration runtime and enable Virtual Network Configuration
  2. Create a Managed Private Endpoint to the Azure Service (Azure SQL DB, Azure Storage, ect..)
  3. Approve the private endpoint request through the Private Link Center

Azure Integration Runtime with Managed VNET in ADF and Synapse

Integration runtimes are the compute that is used to move resources. You are billed based on the amount of Data Integration Units (DIUs) that are used during the data movement process. To securely move your data in a managed virtual network, you first need to make sure that your Azure Integration runtime is created within a managed virtual network. This can be configured when you are provisioning the Data Factory for the first time or later from manage tab.

Note: At the time of this writing/video, Azure Synapse workspaces require that you configure this property when you are provisioning the Synapse resource. If you create your Synapse workspace and you do not enable virtual network configuration, you will not be able to enable it after the fact. Here is a screenshot from the Microsoft documentation on this:

Here are the steps to create an Integration Runtime within a Managed Virtual Network.

  1. Select the manage tab in Data Factory / Synapse
  2. Click on +New
  3. Select Azure when prompted.
  4. On the next screen, name your Integration Runtime and enable Virtual Network Configuration
  5. Click Create.

image

image

How to create Managed Private Endpoints

Once the Integration Runtime with the Managed Virtual Network has been created, you need to create managed private endpoints. Your private endpoint is a private IP address connecting your ADF and Synapse pipelines to a specific resource. Therefore you will create a private endpoint for each data store (Blob, ADLS, Azure SQL DB) that you wish to securely connect to.

To create a managed private endpoint in Azure Data Factory and Synapse, go to your Manage hub, then click on Managed private endpoints, then click New. Keep in mind, this will be disabled and not available until after you have created the Integration Runtime with the managed virtual network.

image

Next, choose the resource in Azure that you want to connect to.

Azure Private Link Center and Approving Private Endpoints

Once the private endpoint has been created it will be in a “Pending” state. This will need to be approved. You can approved a private endpoint from the specific resource or you can go to the Azure Private Link center.

In Azure, search for Private Link and then select Private Link from the list of services returned.

image

Once in the Private Link Center go to Pending Connections, from here, you can approve, reject or remove any connections that may be pending. In my screenshot I don’t have any pending connections because I approved them in the video!

image

Wrapping it up

If you’re like me, networking is a tough topic, I come from a background of writing code, developing solutions and performance tuning. In on-prem development scenarios I let the specialist handle things like networking. With Azure the developer can branch out and learn new things! I hope you enjoyed this blog / video series. Thanks for reading!

Working with Parameters and Expressions in Azure Data Factory

Hey All! Just in time for the holidays I’m uploading another Azure Data Factory video to YouTube. In this video we specifically look at how to use Parameters in Azure Data Factory to make your datasets and pipelines dynamic and reusable! In addition to parameters and expressions we also take a look at the Lookup, For Each and Execute Pipeline activities. This video is very informative, touches on a lot of different pieces and I hope you enjoy it!

Parent / Child Design Pattern in ADF

The Parent / Child design pattern is a popular design pattern for ETL processes and has been for many many years. This gives you that compartmentalization (if that’s a word) of your code making it more reusable and also giving you the ability to easily scale up and scale down by increasing or decreasing parallelization of your workers.

In this scenario the Parent pipeline will determine what work needs to be done and then pass that work to the worker pipeline. In my video I show how we use the Lookup activity in Azure Data Factory to get the list of items to process and then the list can be parsed out.

Azure Data Factory Lookup and Foreach activity

image

This scenario is retrieving the work from an Azure SQL Database, therefore, I use the lookup activity to retrieve the work. However, keep in mind you could use many different activities to serve this purpose, for example, I could use the metadata activity to get the list of files in a folder that needed to be processed. See my video on the metadata activity here:

https://www.youtube.com/watch?v=zm7ybXmUZV0

The For Each activity is used to parse through the list / array that was created by the lookup activity. The For Each activity will perform a set of activities, defined by you, for each item that is returned in the list or array provided.

In this example, we are simply executing the worker pipeline passing in the current schema and table name of the table we want to process!

Execute Pipeline activity in ADF

The child pipeline or worker pipeline is where all the magic happens. In development, we make the datasets dynamic by parameterizing their connection so that the connection can be changed at run time based on the parameters passed in from the parent to to the child. In the following screenshot you can observe that the dataset is connecting to an Azure SQL Database and the schema and table name are determined by parameters at run time!

image

For specifics on the setup, orchestration and execution of this design pattern watch the video Smile As always, thanks for reading my blog and have a great week!

Introduction to Wrangling Data Flows in Azure Data Factory

Hello! It’s been a while since I’ve done a video on Azure Data Factory. To get back in the flow of blogging on ADF I will be starting with Data Flows, specifically Wrangling Data Flows.

The video can be seen here:

What are Wrangling Data Flows in Azure Data Factory?

Wrangling Data flows are a method of easily cleaning and transforming data at scale. huh?

Wrangling Data Flows uses the M query language and the UI experience provided by the Power Query Editor in Power BI Desktop. This is a brilliant move by Microsoft to include this technology in Azure Data Factory. Just think of the hundreds of millions of people who currently are transforming and cleaning their data in Excel or Power BI Desktop. Now they can take their self service ETL (extract, transform and load) skills to the enterprise level with ADF.

What makes it scalable? Power Query Editor at Scale.

Wrangling data flows allows the developer to use the graphical user interface to do all the hard work with minimal to no code. But in the background all of your UI steps are being converted to the M language. At runtime, Azure Data Factory will take that M code and convert it to Spark and then run your data flow against big data clusters. This means as your data volumes grow, you should experience consistent performance!

Are there any limitations with Wrangling Data Flows?

Yes… quite a few actually. Wrangling Data Flows are still in preview at the time of this blog and the related YouTube video. Currently there are quite a few operations that just aren’t supported. The most obvious of those operations being promoting header rows and pivoting data. I hope that these features will be available once the product is in GA.

https://docs.microsoft.com/en-us/azure/data-factory/wrangling-data-flow-functions#known-unsupported-functions9

As always, thank you for reading my blog and watching my YouTube videos! Have a great day!!

Other Azure Data Factory resources!


Azure Data Factory–Rule Based Mapping and This($$) Function

Hello! This is the eight video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

Schema flexibility and late schema binding really separates Azure Data Factory from its’ on-prem rival SQL Server Integration Services (SSIS). This video focuses on leveraging the capability of flexible schemas and how rules can be defined to map changing column names to the sink.

Rule Based Mapping

Rule based mapping in ADF allows you to define rules where you can map columns that come into a data flow to a specific column. For example, you can map a column that has ‘date’ anywhere in the name to a column named ‘Order_Date’. This ability to define rules based allows for very flexible and reusable data flows, in the video below I walk through and explain how to set this up in side of a Select transform, enjoy!

This ( $$ ) Function in a Derived transform and a Select Transform

The this ($$) function simply returns the name of the column or value of the column depending on where it is used. In this video I show two use cases, one in a Select transform and one in a Derived transform.

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory–Executing an ADF Pipeline from Azure Logic Apps

Hello! This is the seventh video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

This video in the series highlights Azure Data Factory integration with Azure Logic Apps!

Azure Logic Apps

Azure Logic apps is a great way of extending the capability of different services in Azure. In this video I take a look at how we can use Azure Logic Apps to perform a wide array of event based triggers for a Data Factory Pipeline

Azure Logic Apps – Create a pipeline run (Executing a Data Factory Pipeline)

We are going to execute a Data Factory pipeline run using the action “Create a pipeline run” in Azure Logic Apps. The biggest challenge here is understanding and learning how to pass parameters into your data factory pipeline! I show this in the video but I will also provide the code snippet here for reference. This can be modified as necessary! Enjoy.

image

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory–Web Activity / Azure Logic Apps

Hello! This is the fifth video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

This video in the series highlights Azure Data Factory integration with Azure Logic Apps!

Web Activity in ADF v2

The web activity within Azure Data Factory allows you to call a custom REST endpoint from an ADF control flow. In this video we make a POST API Method call to Azure Logic Apps.

Azure Logic Apps

Azure Logic Apps is a great automated tool for building automated workflows and integrates really well with Azure Data Factory!

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory–Lookup and If Condition activities (Part 3)

Hello! This is the third video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

This video in the series leverages the lookup and if condition activity to return a set of results and then determine what operation should occur next based on an expression within the control flow. This is a great demo for learning new activities, expressions and referencing output parameters. Below is a couple of quick highlights on each of the activities featured here and then a link to the ADF video!

Lookup Activity in ADF v2

The lookup activity within Azure Data Factory allows you to execute a stored procedure and return an output. Interestingly, the stored procedure activity does not return any outputs.

  • Leverage the Lookup activity to execute SQL Code or a Stored procedure and return an output.

If Condition Activity in Azure Data Factory

  • Leverage the If Condition activity and ADF Expression language to help control operations in the ADF Control Flow

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training

Azure Data Factory – Stored Procedure Activity (Part 2)

Hello! This is the second video in a series of videos that will be posted on Azure Data Factory! Feel free to follow this series and other videos I post on YouTube! Remember to  like, subscribe and encourage me to keep posting new videos! Smile

First blog in series: Azure Data Factory – Metadata Activity

Stored Procedure Activity in ADF v2

  • Writing data to an Azure SQL Database via a stored procedure.
  • Populating input parameters from the output properties of other activities in ADF.
  • Limitations for the Stored Procedure activity

Video Below:

If you like what you see and want more structured end to end training then check out the training offerings for Pragmatic Works! https://pragmaticworks.com/training