Azure Data Factory–Copy Data Activity

In this blog, we are going to review the Copy Data activity. The Copy Data activity can be used to copy data among data stores located on-premises and in the cloud.

In part four of my Azure Data Factory series, I showed you how you could use the If Condition activity to compare the output parameters from two separate activities. We determined that if the last modified date of the file (located in blob storage) was greater than the last load time then the file should be processed. Now we are going to proceed with that processing using the Copy activity. If you are new to this series you can view the previous posts here:

If Condition Activity – If True activities

As mentioned in the intro, we are going to pick up where we left on in the last blog post. Remember that the If Condition activity evaluates an expression and returns either true or false. If the result is true then one set of activities are executed and if the result is false then another set of activities will be executed. In this blog post we are going to focus on the If True activities.

  • First, select the If Condition activity and then click on activities from the properties pane, highlighted in yellow in the image below.
  • Next, click on the box “Add if True Activity”.

image

This will launch a new designer window, and this is where we will be configuring our Copy Data activity.

image

Copy Data Activity, setup and configuration

Fortunately the Copy Data activity is rather simple to set up and configure, this is a good thing since it is likely you will be using this activity quite often. The first thing we need to do is add the activity to our pipeline. Select the Copy Data activity from the Data Transformation category and add it to the pipeline.

image

Now we need to set up the source and the sink datasets, and then map those datasets. You can think of the sink dataset as the destination for all intents and purposes.

Select the copy data activity and then click on the Source tab found in the properties window. For the dataset I am simply going to choose the same dataset we used in our first blog in this series.

image

Creating the Sink dataset in Azure Data Factory

Now it’s time to set up the Sink (destination) dataset. My destination dataset in this scenario is going to be an Azure SQL Database. I am going to click on Sink and then I will click on + New.

image

When you click on + New, this will launch a new window with all your data store options. I am going to choose Azure SQL Database.

image

This will launch a new tab in a Azure Data Factory where you can now configure your dataset. In the properties window select “Connection”, here you will select your linked service, this is your connection to your database. If you haven’t already created a linked service to your database then click on + New. Clicking new will open up a new window where you can input the properties for your linked service. Take a look at the screenshot below:

image

Once the linked service account has been created I will select the table I want to load. All available tables will show up in the drop down list. I will select my dbo.emp table from the drop down.

image

The final step for the dataset is to import the schema, this will be used when we perform the mapping between the source and sink datasets. Next, I am going to click on the Schema tab. On the Schema tab I will select import schema, this actually returns a column that I don’t need and so I will remove the column from the schema as well. Take a look at steps three and four in the following screenshot:

image

Copy Data activity – Mapping

Now that the source and sink datasets have been configured, it’s time to finish configuring the Copy Data activity. The final step is to map these two datasets. Now I will navigate back to the pipeline and click on the copy data activity. From the properties window I will select the Mapping tab. Then I will import the schemas, importing schemas will import the schemas from the source and the destination. This step will also attempt to automatically map the source and the destination as well and so it is important to verify that the mapping is correct. Unfortunately my source dataset was a file and it did not have any column headers. Azure Data Factory automatically created the column headers Prop_0 and Prop_1 for my first and last name columns. Take a look at the following screenshot:

image

This was a simple application of the Copy Data activity, in a future blog post I will show you how to parameterize the datasets to make this process dynamic. Thanks for reading my blog!

Advertisement

Quick Tips – Export data from Power BI using R

One of my favorite tricks in Power BI is to use R integration to quickly export data outside of Power BI. This is also a very popular question among Power BI enthusiasts. Once users realize the true capabilities and easy of the Power Query editor to transform and clean data they want to clean up their files using Power BI but the challenge is then how do I get the data out?

Update: Video embedded at bottom of blog post.

Prerequisite – R

If you want to test this method out you need to install R on your machine, for your convenience see the following link:

https://mran.microsoft.com/open/

Export data from Power BI using R

Like all of my quick tip blogs, this will be quick Smile.

You will export data from the Power Query Editor. To launch the Power Query Editor click Edit Queries from the Home ribbon in Power BI.

image

Inside the Power Query Editor, click the transform ribbon and then click on R.

SNAGHTML3d882ca

Once you click on the large letter R seen in the screenshot below a new window will open. Now you just have to type in some basic code. Here is the pseudo code:

write.csv(dataset, <destination>)

image

That’s it! Click ok and check your folder for the file. Do notice that I used two backslash characters, this is required. The other question I get is can I write the results to an excel file or to a SQL Database table and the answer is…. Yes! R has packages that make doing particular tasks easier and if you want to write to an Excel file or SQL Server table you would need to install those packages. My hope is to do some follow-up post around this topic.

image

 

Thanks for checking out my blog!

SSIS Performance Tuning

This is my much overdue follow-up blog from the presentation I gave to the New York SQL Server User Group. In this post, I am going to provide some additional resources to supplement the presentation, check out the following blog post on different performance tuning techniques that can be used for SSIS.

SSIS Performance Tuning the For Each Loop. Retain Same Connection.

https://mitchellpearson.com/2015/01/12/ssis-performance-tuning-the-for-each-loop-retain-same-connection/

Use File Cache and the Cache Transform to help mitigate network issues.

https://mitchellpearson.com/2014/06/30/use-file-cache-and-the-cache-transform-to-help-mitigate-network-issues/

Create efficient Range Lookups with the Task Factory Advanced Lookup Transform

https://mitchellpearson.com/2014/06/27/create-efficient-range-lookups-with-the-task-factory-advanced-lookup-transform/

Load thousands of files in SSIS lighting fast! MultiFlatFile Connection Manager.

https://mitchellpearson.com/2014/06/27/load-thousands-of-files-in-ssis-lighting-fast-multiflatfile-connection-manager/

Why are Asynchronous components so bad in SSIS?

http://tinyurl.com/ycnvr5ef

Using cache transforms in SSIS to improve Performance (Part 1)

https://mitchellpearson.com/2014/06/20/using-cache-transforms-in-ssis-to-improve-performance-part-1/

Use the aggregate transform in SSIS to out perform T-SQL!

https://mitchellpearson.com/2014/06/22/use-the-aggregate-transform-in-ssis-to-improve-performance-of-t-sql/

Advanced DAX: Problem, Design, Solution

Thanks for watching my latest webinar on DAX. If you missed the webinar you can watch the recording here:

http://pragmaticworks.com/Training/Details/Advanced-DAX-Problem-Design-Solution

I didn’t have a lot of questions from this webinar but I will answer a the ones I did have.

Question: What is the difference between the Find function and the Search function.

Answer: The FIND function is case-sensitive and the search function is case-insensitive. In the webinar I used the FIND function in my first demo but really this could have been replaced with the SEARCH function since I was not concerned with case sensitivity.

Question: Where did you find the School Grade demo that you used in your webinar?

Answer: The data I used in the webinar can be found at the link below:

http://schoolgrades.fldoe.org/reports/#2014

Question: Might there be an upcoming webinar regarding DAX as it pertains to SSAS cubes rather than Power BI?

Answer: I am not aware of any webinars upcoming that will be specific to Tabular Cubes. It is important to point out that the examples used in this webinar are also applicable with Tabular SSAS Cubes since it uses the same backend engine as Power BI Desktop (xVelocity). If you implement these solutions in SSAS Tabular, the consumption of the data would be achieved through excel and then the necessary filters and slicers would be applied there instead of the report view provided in Power BI Desktop.

Thanks again!

Advanced DAX Webinar

Hey Everyone,

Thanks for attending the free 1 hour DAX Webinar by Pragmatic Works. In this blog I am going to focus on the primary question that I saw come up over and over again in the chat logs. The webinar can be found here:

  • How come the totals for Forecast and Dynamic are the same?
  • The Grand Total is different as well.
  • Are we sure all those totals are right? for The MLS data?

The totals that are generated in DAX are usually an area that cause confusion, especially if you are still learning about filter context. Totals do NOT simply add up the values that appear in the rows on the report. This is confusing because most of the time it looks like this is exactly what is occurring in the background.

If the Totals cell in your table does not sum the rows then how does it come up with a total?? Great question! The total cell also executes the DAX formula defined in your measure. Let’s start with a visual, one from the webinar that prompted the questions leading to this post. Notice in this picture you can clearly see that the “TOTAL” is not the sum of the rows. The values represented in the total row look completely wrong, however they are not.

SNAGHTML1a38ee9e

Remember this question? How come the totals for Forecast and Dynamic are the same? Let’s try to answer this question now. We can begin with Forecast YTD Sales.

Forecast YTD Sales is a simple calculation in the model. It is simply Prior YTD Sales * 2. In the screenshot below I have added the calculated measure Prior YTD Sales to the table so we can visualize exactly how the total cell is being calculated.

The total value is $19,582,120 which is $9,791,060 * 2.

image

Remember that in our original table there was only data up until June of 2008. Therefore, why is the total row taking Prior YTD Sales from December? Why is the total calculation taking the value of $9,791,060 from the Prior YTD Sales measure instead of $3,037,501 (June)? The table visual only displayed data up until June but this was because we don’t have any sales past June of 2008. The total row is looking at the entire 2008 year and therefore gets the Prior YTD Sales value as of the last date in 2008 ($9.791,060). Although this may not be the desired behavior it is the correct behavior.

How come the totals for Forecast and Dynamic are the same?

Well all of that is great, but that still doesn’t explain why the Forecast YTD Sales has the same value as our Dynamic Measure at the total row. The Dynamic Measure calculation always takes YTD Sales unless there are no sales on the last day of the current month in which case it will take the Forecast YTD Sales. This was explained in the webinar so please revisit that if you don’t remember the why behind this calculation.

IF(
    LastSaleDate <> LastDayOfMonth,
    [Forecast YTD Sales],
    [YTD Sales])

At the total row the calculation would read like this:

IF(
    “6/20/2008” <> “12/31/2008”,
    [Forecast YTD Sales],
    [YTD Sales])

Based on this interpretation of the measure we can see that the last day a sale took place in our model does not equal the last date in our model for the current year. That means that the total ROW is going to display the measure [Forecast YTD Sales]. This explains why we see the following values in our total row:

image

Once again understanding the total row is not easy, especially if you are transitioning to DAX from working with excel. If the total row seems incorrect it’s important to validate the calculated measure at the total level.

Thanks!

Move Files in SSIS Using execute process task

Recently I found myself redesigning an ETL Process for my client to improve performance. Ultimately the client receives hundreds of thousands of files daily that are pushed incrementally throughout the day. We wanted to take advantage of using the Multi Flat File connection manager for efficiently and quickly processing these files. So far so good right?

Not sure what the Multi Flat File Connection manager is? Check out my blog here:

First in order to really use this connection manager I needed to move the files from the original “landing zone” (directory where files are being pushed) to a controlled working directory. So how do I do this quickly and efficiently? Glad you asked Winking smile. With an execute process task and the windows move command of course!

Walkthrough:

  1. Create a new SSIS Package.
  2. Pull the Execute Process Task into the control flow.
  3. Open the Execute Process Task and click on the Process tab (on the left).
  4. For executable point to your windows command prompt, found here:
    1. C:\Windows\System32\cmd.exe
  5. Finally it’s time to issue our windows move command, here is the breakdown in 5 parts:
    1. /C – required when running windows commands from the Execute Process Task
    2. /move – Move file command, read more here:
    3. /Y – Suppresses prompting to confirm you want to overwrite an existing destination file.
    4. Source Directory with wild card filter: C:\Blogs\1000_Files_Test\Test*
    5. Destination Directory: C:\Blogs\1000_Files_Test_Move\

image

Thanks and enjoy!

Upgrading Script Tasks in SSIS. Target of Invocation on Script Task.

Recently I was working on a project with @SQLDusty (Dustin Ryan) and we were upgrading some projects from SQL Server 2012 to 2014. One of our script tasks that we were using to dynamically create and process cube partitions was failing. The worse part is that the error message was completely generic so it wasn’t immediately apparent what was wrong. After a little troubleshooting we were able to find the root cause and solve the issue.

Below is the error message we were getting from the SSISDB:

image

The problem that we had was that our .net reference for Analysis Management Objects was 11.0 (2012) and it needed to be 12.0 (2014).

Here is a screenshot of the .net reference as seen in the solution explorer.

image

This screenshot of the properties window displays the version number for my .net reference. Also pay close attention to the Description box. This description is what you actually look for and add when adding the Microsoft.AnalysisServices reference to your script task.

image

So now all we need to do to fix this problem is remove the current reference and add the correct reference back in.

image

1) From the solution explorer inside your script task component right click on reference and select “Add Reference….”

image

2) From the new window that opens up select assemblies and then extensions:

image

3) Finally select Analysis Management Objects 12.0.0.0 as seen below and add it to your references.

image

This fixed our problem and if you are also going through the upgrade process I hope this helps you as well!

Download Files in SSIS with an HTTP or HTTPS connection.

Recently I wanted to download historical stock information from the internet and store this information into a data warehouse. Using the Task Factory Download File Task I was able to dynamically download thousands of csv files with historical stock information and trends for my data warehouse.

The Download file task allows you to download files over an HTTP connection from inside an SSIS Package and it also has support for HTTPS, I will show you where to enter credentials.

If you want to walk through this example and don’t currently have Task Factory you can download a free 14 day trial here: Free Trial Download

Demo Time:

So in this example I am going to download a CSV File from Yahoo Finance with historical price information for Fed Ex.

  • Create a new package in SSIS and pull the Task Factory Download File Task into the control flow.
  • Now we need to create our HTTP Connection manager.
  • Inside the Download File Task select “Create New HTTP Connection…”

image

  • Creating the connection manager is very easy, especially if you are not using HTTPS.
    • Server URL: (This is the domain that is hosting the file)
    • Use Credentials: (Select this box and proceed with your username and password if credentials are required.
    • Test Connection: (Test Connection before proceeding.)
  • For Server URL I used: Http://finance.yahoo.com
  • Click Ok. Connection manager has now been created successfully.

image

  • Now we need the exact location of the file we want to download.

image

  • Now that we have the exact location of the file we can finish setting up the Download File Task.
  • Paste the file location into the section “Enter Path of File To Download:”
    • Note I am simply specifying a Text Value for this walkthrough. You can use variables and expressions with this task to make it as dynamic as you need.

image

  • The final step in this process is to specify the location where the file will be stored once downloaded. Once again I am using “Text Value”.
  • “Enter File Path For Local File:” Enter the specific file location. (Screenshot)

image

Now execute the package and there you go!

image

Thanks for Looking.

SSIS Email Connection Information for SMTP, IMAP, and POP3 connections.

Quite often when performing a demo, teaching, or building SSIS Packages I am using either SMTP, IMAP or POP3 connection information. Each time I have to find my own local document that has this information stored or I have to search the web, so I’ve decided to post the information here for my own convenience and perhaps yours as well!

If you have some providers you would like to add to this just email me at mpearson@pragmaticworks.com Thanks!

Note: Task Factory by Pragmatic Works offers three separate email tasks for use in SSIS. If you are interested you can learn more about Task Factory here: http://pragmaticworks.com/Products/Task-Factory

Yahoo Configuration Settings

SMTP

  • Mail Server:     smtp.yahoo.mail.com
  • Port:                465
  • Encryption:      SSL

IMAP

  • Mail Server:    Imap.mail.yahoo.com
  • Port:               993
  • TLS/SSL:       Yes

POP3

  • Mail Server:     pop.mail.yahoo.com
  • Port:                995
  • Encryption:     SSL

Gmail Configuration Settings

SMTP

  • Mail Server:      smtp.gmail.com
  • Port:                 587
  • Encryption:       TSL

Office 365 Configuration Settings

SMTP

  • Server name:     smtp.office365.com
  • Port:                  587
  • Encryption:        TLS

IMAP

  • Server name:    outlook.office365.com
  • Port:                 993
  • Encryption:      SSL

POP3

  • Server name:   outlook.office365.com
  • Port:                995
  • Encryption:      SSL 

Build notification framework to send SMS Text messages and Emails in SSIS!

Of the over 40 components in Task Factory the TF Advanced Email and SMS Task is one of my favorites. In this blog I am going to discuss the rich features this task offers and provide a walkthrough of setting this task up to send an email or SMS text message upon package failure.

If you like this blog and you want to try out a free trial of Task Factory you can find it here: http://pragmaticworks.com/Products/Task-Factory/TrialDownload

Feature Highlights

  • Supports Email and SMS messages.
  • Can send emails with attachments.
  • Inbuilt HTML editor for HTML email.
  • Allows use of variables as placeholders in the Email Body so variables can be automatically replaced when an email is sent!
  • Expressions can be set on properties to make the package dynamic.

Note: We will be using SMTP connection information to set up our email connection manager. Generally a basic Bing/Google search will return the connection information required. For example Yahoo SMTP Connection info or Gmail SMTP Connection info.

Ok let’s get started with our walkthrough.

  1. Please open up a new SSIS Package.
  2. Next click on the Event Handlers tab at the top.
    1. Don’t change the executable, by default it is set at the package level.
    2. Also leave OnError for the Event Handler.

image

  1. Next click on the hyperlink “Click here to create an ‘OnError’ event handler for executable ‘<Executable Name>’.

image

  1. Now pull in the Task Factory Advanced Email and SMS task.
  2. Next we will set up a new SMTP Connection Manager.
    1. Select the drop down box next to SMTP Connection and then select “Create New Email Connection”.

image

  1. On the general tab fill out the following information:
    • Protocol Type: (Must be SMTP)
    • Mail Server: (The Mail Server will differ by mail provider. I’m using Office 365 settings.)
    • User Name:
    • Password
  2. On the advanced tab fill out the following:
    • Server Port: (Once again changes per mail provider)
    • Type of encrypted connection: TLS, SSL or NONE
    • Timeout in Seconds: (Default is 60)
  3. Finally test connection to validate and click ok to close.

image

image

image

Setting up SMS Providers and Emails

In this section I will demo how easy it is to set up the component to send either an SMS or Email message. Note all of this can be set up dynamically using variables and expressions!

  1. Select the dropdown for “To:” and select either email or SMS. Here I selected email and then manually entered the email address.
    • Please note you can send emails to multiple recipients by separating them with a semi colon ;.

image

  1. Ok, next let’s override our email option and select SMS.
    • First enter the phone number.
    • Next select the provider (The “Provider” will be required for SMS messages.)
    • Finally select the add Icon, see screenshot below:

image

Dynamically set the body of the email and subject using expressions:

In this section we are going to use expressions for the body and subject of the email. Note that expressions can be set on virtually every property of this component.

  1. Select the “Expressions” tab found at the bottom left of the TF Advanced Email editor.

image

  1. From the properties window select “Subject” from the drop down.
  2. In the “Expression” box select the Ellipsis button to open the expression editor.
  3. Here is where we can use the SSIS Expression language and variables to make our subject dynamic at run time. (See third Screenshot below)

image

image

image

  1. Once you have completed your expression for the Subject click ok and that will close the expression editor. Now from the properties window select “Message” and once again select the Ellipsis button to open the expression editor.

image

Now when I execute the package if an OnError event happens in the package and only if this event occurs then I will receive an email or text message!

Here is the email I received upon failure of the package, pretty cool!

image

Thanks for looking!