Get list of subfolders in SSIS with SSIS Script Task!

Recently I needed to get a list of subfolders in SSIS. I also only wanted to bring back a list of subfolders that had the date appended to the end. So let’s walk through how to do this in Script Task using c#.

image

First we need to create two variables in SSIS:

image

Next bring a script task into the control flow and open it up for editing.

  • Select C# for the script language.
  • ReadWriteVariables select objDirectoryList

image

Now select Edit Script

  • Under the section “public void Main()” enter the following code:
    • Of course replace the directory location with your directory.
Dts.Variables["objDirectoryList"].Value = 
       System.IO.Directory.GetDirectories(@"C:\Blogs\RootFolder\","*20*",AllDirectories);
Dts.TaskResult = (int)ScriptResults.Success;
        }

        public SearchOption AllDirectories { get; set; }

Essentially all I’m doing here is populating the object variable objDirectoryList with all the subdirectories in the folder C:\Blogs\RootFolder\. Notice that I have also added a filter here “*20*”. If you do not wish to have a filter and  you want to bring back all subdirectories then just replace my filter with “*”.

Finally we can now iterate through this object variable in a FELC.

image

Below you can see the two directories that are returned from the script task above.

image

image

Thanks for looking, hope this helps.

Advertisement

SSIS Performance Tuning the For Each Loop. Retain Same Connection.

With the Retain Same Connection property I was recently able to more than double the performance of my SSIS package for a client. The package was looping over hundreds of thousands of files and logging the file names into a table to be later processed.

Retain Same Connection is a property setting found on connection managers. By default this property is set to false which means that each time the connection manager is used the connection is opened and subsequently closed. However in a situation like mine this can significantly degrade overall performance as the package has to open and close that connection hundreds of thousands of times. In this blog I’m going to set up a very basic example and walk through setting up this connection manager.

Here I have set up two connection managers. Both connection managers point to the same database. It’s important to note that if you are using Project Level Connection managers in SQL 2012 that setting this property inside any one package will persist across all packages. Therefore I create two connection managers.

image

Inside the package I am simply using a for each loop task to loop through a list of files in a directory and then I load the file names into a table using an Execute SQL task. For demo purposes I have two examples in one package.

image

  1. The FELC on the left takes 30 Seconds to run.
  2. The FELC on the right takes 11 Seconds to run.

Let’s now discuss how and where we can set this property.

Right click on your connection manager found in the connection managers pain inside the package and select Properties.

image

Inside the properties window find the property “RetainSameConnection” and set the value to “True”. Now the connection will remain open for the duration of the package.

image

As always thanks for looking!

Download Emails and Email attachments in SSIS.

Ever need to download emails and attachments using SSIS? If so you have come to right the place. Task Factory, a suite of custom SSIS components by Pragmatic Works has a component called the Email Source. The component supports both POP3 and IMAP and is really easy to set up, so let’s get started!

There is also full support for filtering messages based on sender, message to, subject, date received, body and priority.  This way you can filter down to only the messages you want to load.

If you are one of the poor unfortunate souls that don’t yet have Task Factory you can give it a try with a free trial download here: http://pragmaticworks.com/Products/Task-Factory

Walkthrough:

  1. Create a new package and bring in a data flow task.
  2. Inside the Data Flow Task pull in the TF Email Source component.
  3. Open the TF Email Source. Once open we need to create our connection manager. You can choose between POP3 or IMAP for reading emails. I prefer IMAP because it allows you to choose specifically what folder in your email you want to read from. More on this later.
  4. image
  5. On the general tab of the connection manager fill in the following items:
        1. Protocol Type
        2. Mail Server
        3. User Name
        4. Password
  6. image
  7. On the advanced tab of the connection manager fill in the following items:
        1. Server Port
        2. Encrypted Connection
        3. Root Folder Path (Directory to read from)
  8. image
  9. Click ok to close out the connection manager.
  10. Next select the location to save attachments. “Attachments Directory”
  11. I created a variable to store the message ids in. I won’t be using this object variable in this blog.
  12. Finally if we want to add a filter we can do that below as well.
  13. image
  14. Now simply connect the email source to a destination component.

Does cube partition exists? Script task in SSIS explained!

The first necessary step in dynamically creating cube partitions is to determine if the partition already exists. If the partition does not exists then we create the partition, if it does exists we simply process the existing partition. Recently I gave a free webinar online at PragmaticWorks.com and a few of the attendees wanted more details on the script task I used, so here we go. It’s also important to note that I used a daily partitioning strategy for the webinar. This is important because the naming conventions of my partitions included the partition name and date that I used as a filter. For example:

image

In this blog post I will walk you through step by step setting up this script task. Please see below. First the setup:

  • Create the following variables, if you are in 2012 or 2014 create these as package parameters. We will use these parameters to establish our connection to Analysis Services and then build out the partition name dynamically.
    • image
  • Now we need to create another variable, this variable will be used in our control flow to determine if we should create the partition or process the partition, we will get to this part in a bit.

image The Script Task: (Setup) Now that we have created our variables and have that part set up we are going to build our script task.

  • Step 1) Drag a script task component from the toolbox into your control flow.
  • Step 2) Open the task for editing and choose C# as the script language.
  • Step 3) ReadOnlyVariables: Select all the variables from Step 1 in the setup section.
  • Step 4) ReadWriteVariables: Select the variable intDoesPartitionExists
  • Step 5) Finally, click the Edit Script button.

The Script Task: (Code)

  • Step 1) Add the Microsoft.AnalysisServices reference.
          1. From the solution explorer right click on references and select “new reference”.
          2. From the Add Reference window select .NET tab
          3. Find and select Analysis Management Objects and click ok.
    • Step 2) Add Microsoft.Analysis Services as a namespace. Screenshot below:
      • image
  • Step 3)  Finally we can add our code. In this section we are going to use the parameters previously created to connect to our cube and then check to see if a partition exists. For this blog we simply hardcoded these values for demonstration purposes but typically we would set these values dynamically. If the partition exists I populate the variable intDoesPartitionExists with a value of 1 for True. If the partition does not exists I populate it with the value of 0 for false. I then use this variable in the control flow to determine whether to process or create partitions.
  • Step 4) Find the section titled “public void main() and paste the following code where it says Add your code here
// TODO: Add your code here

            string serverName = "";
            string databaseName = "";
            string cubeName = "";
            string measuregroupName = "";
            string partitionname = "";
            string Date = "";


            //Retrieve the server, database, cube, etc. details from the variables
            serverName = Dts.Variables["ServerName"].Value.ToString();
            databaseName = Dts.Variables["DatabaseName"].Value.ToString();
            cubeName = Dts.Variables["CubeName"].Value.ToString();
            measuregroupName = Dts.Variables["MeasureGroupName"].Value.ToString();
            partitionname = Dts.Variables["MeasureGroupPartitionName"].Value.ToString();
            Date = Dts.Variables["Date"].Value.ToString();

            //Build the connection string
            string ConnectionString = "Data Source=" + serverName + ";Catalog=" + databaseName + ";";

            //Connect to the server and select the database, cube, etc.
            Server srv = new Server();
            srv.Connect(ConnectionString);
            Database db = srv.Databases.FindByName(databaseName);
            Cube cb = db.Cubes.FindByName(cubeName);
            MeasureGroup mg = cb.MeasureGroups.FindByName(measuregroupName);

            if (mg.Partitions.ContainsName(partitionname + "_" + Date))
            {
                Dts.Variables["intDoesPartitionExists"].Value = 1;
            }
            else
            {
                Dts.Variables["intDoesPartitionExists"].Value = 0;
            }

            srv.Disconnect();
            srv.Dispose();

You can find my webinar on Processing SSAS with SSIS at the following link: http://pragmaticworks.com/Training/FreeTraining/ViewWebinar/WebinarID/1714 Thank you for looking!

Processing SSAS Cubes with SSIS –Q & A

For those of you who attended my webinar on Processing SSAS cubes with SSIS, thank you! For those of you who didn’t the video is still available here: http://pragmaticworks.com/Training/FreeTraining/ViewWebinar/WebinarID/1714.

Below are a few of the questions I received in the webinar. There will be a couple of follow up blogs for the more detailed questions.

Q: Can you please share the script used to generate the partitions?

A: I will post a follow up blog in the next couple of days with step by step instructions. Sharing my exact script would not be helpful. I will update this post with the additional blog.

Q: How are you stopping through your SSIS Steps within package execution?

A: Breakpoints! I used breakpoints to pause the package during execution so I could explain exactly what was going on. To enable breakpoints on a task inside SSIS you simply right click on the task and then select edit breakpoints. I showed this at the end of the webinar during Q&A.

Q: What are our best logging options if we do all of our cube processing with script tasks instead of the native components?

A: Your best logging options are going to be setting up a Profiler Trace or Extended events. I briefly explained setting up a trace at the end of my webinar.

Q: Does the C# script task need the connection string information to query SSAS for the partition information?

A: Yes. In my webinar I showed how to dynamically create partitions for your cube inside of SSIS. Before creating them I used a script task to see if the partition already existed. I have posted a follow up blog below:

https://mitchellsql.wordpress.com/2014/11/18/does-cube-partition-exists-script-task-in-ssis-explained/

Q: How are the aggregation designs handled on the dynamic partitions?

A: This is handled in the XMLA by specifying the Aggregation ID. I showed this at the end of my webinar during the Q&A section and I will also include this in a follow up blog on creating the XMLA for dynamic partitions.

Q: Is this applicable to tabular cube processing?

A: Yes. The analysis services processing task processes tabular models, cubes, dimensions, and mining models.

Once again thank you for attending our free webinar. Until next time!

Quickly encrypt and decrypt files with the TF PGP task for SSIS!

Business Problem

I recently had to help a client with encrypting a file before uploading that file to an FTP. Thanks to the PGP task in Task Factory we were able to fulfill this requirement in a matter of minutes.

Walkthrough

In this example I will walk you through quickly creating public and private keys and then using those keys to encrypt the file.

    1. Create two variables. One for the source location and one for the destination:
    2. image
    3. From the SSIS Toolbox drag and drop the PGP Task into the control flow and then open it for editing.
    4. For “What action will this perform?” choose Encrypt File.
    5. Click Generate Key to create your own public and private keys. Note: Public keys are used for encryption and private keys are used for decryption.
    6. image
    7. Now to fill out some basic information. The private key created will require a password, please remember to store this password somewhere safe, i.e. some kind of password vault. This will be required to decrypt files.
    8. image
    9. Now that we have generated PGP keys we can now encrypt our files in just a couple steps. A screenshot of all steps below is listed at the bottom.
    10. For Section 1:
      1. Select “File location is stored in a variable”
      2. Select “strFileName” for the variable
    11. Section 2:
      1. Select “Destination location is stored in a variable”
      2. Select “strFileName_Encrypt” for the variable.
      3. Select the checkbox for Overwrite the destination file if it already exists
    12. Section 3:
      1. Select connection manager: Select “Public”.
        1. Note: The connection manager “Public” was added to the package automatically when we generated new keys. Public is the name I gave to my public key ring.
      2. Finally select the Public Key you wish to use for encryption. We only have one key on our key ring.

I have provided three screenshots below.

    1. Screenshot 1 – Final Configuration
    2. Screenshot 2 – File before PGP Encryption
    3. Screenshot 3 – File after PGP Encryption

Screenshot of final configuration:

image

 

image

 

As always thanks for looking and your feedback is appreciated!

Get a List of Files on an FTP Server and then process with Foreach Loop Container

Today we are going to address a very common FTP question: “How can I get a list of the files that are on my FTP Server?”

This is a task that can be accomplished very easily with the Task Factory Secure FTP component and this walk through will provide step by step directions. If you are interested in Task Factory and the more than 40 tasks you can find the free trial here:

Here we will use the “Get list of files” option and store all the files for a given directory into an object variable. We will use a Foreach loop container to iterate through the object variable and then an Execute SQL task to write each file name to a table in our database. This walkthrough is broken down into three sections.

  • Part 1 – Setting up the Secure FTP component to get List of Files
  • Part 2 – Setting up the For Each Loop to read the the object variable
  • Part 3 – (Optional) Setting up the Execute SQL Task to write the file names to a table.

Part 1 – Configuring the Secure FTP site to Get List of Files from server.

  1. Start a new package and drag the Secure FTP task into your control flow from the toolbox.
  2. Next right click on the Secure FTP Task and click edit.
  3. Steps 4-7 below are in screenshot 1
  4. What will this task do? Select “Get list of files” from the drop down menu.
  5. For SFTP Connection Manager either choose an existing connection manager or create a new one if don’t have one set up.
  6. For your Directory location on server you have two options:
    1. First you can choose to click the ellipsis button and hard code the directory location.
    2. Second you can choose to store the directory location in a variable
  7. Next create an object variable to store the list of file names in from the server. (objFilesList)

Screenshot 1

Part 2 – Configuring the For Each Loop to iterate through the object variable

  1. Pull a Foreach Loop container into the Control Flow and drag the precedence constraint from the Secure FTP task to the Foreach loop container.
  2. Steps 3-6 can be found in Screenshot 2
  3. Open the For Each container and click on the Collection tab.
  4. For “Enumerator” choose “Foreach From Variable Enumerator” from the drop down.
  5. Next select the object variable you specified in the Secure FTP component. (objFilesList)
  6. Finally click the Variable Mappings tab.

Screenshot 2

  1. On the Variable Mappings tab select a string variable to store the file name in. (strFileName)
  2. Make sure Index is set to 0. (Screenshot 3)
  3. Click Ok.

Screenshot 3

Optional: (This section simply shows how to map variables in an Execute SQL Task and helps with visualization of each iteration of the Foreach Loop through the list of files on our FTP.)

Part 3 – Setting up the Execute SQL Task to read the string variable

(This section has been added for completeness and is fully optional.)

Note: For this example I have created a simple table with four columns:

CREATE TABLE [dbo].[FilesOnSftp](

[ID] [int] IDENTITY(1,1) NOT NULL,

[Directory] [varchar](50) NULL,

[FileName] [varchar](50) NULL,

[DateLoaded] [date] NULL)

  1. Pull an Execute SQL Task from the toolbox into the Foreach Loop container.
  2. Create a new variable called strSftpDirectory and give it a value of /

  3. Right click on the Execute SQL Task and click Edit to bring up the Editor
  4. For “Connection” choose the connection manager to your database.
  5. Click and highlight “SQLStatement” and then click the Ellipsis button that appears on the right.

Screenshot 4

  1. Clicking the Ellipsis button will give you a pop up window, this is where we enter our Insert SQL Statement. Here is the syntax I used if you are following along with the tutorial:

    INSERT INTO dbo.FilesOnSftp

([Directory], [FileName], [DateLoaded])

VALUES(?, ?,GETDATE())

  1. Now its time to map the parameters. Click on the Parameter Mapping tab. (Screenshot 5)
  2. Click the Add button and then select “strSftpDirectory” for Variable Name.
  3. For data type choose Varchar and change the Parameter Name to 0.
  4. Click the Add button a second time and select strFileName for the Variable Name.
  5. For data type choose Varchar and change the Parameter Name to 1. Click ok to save changes.

    Screenshot 5

How to flatten data using the Pivot Transform in SSIS.

The Pivot Transform is a very powerful tool in SSIS. It’s a component that is overlooked and isn’t the first thing to come to mind when you need it most.

Imagine you have a table that has an employee listing for each type of phone number. For example, you have one row for business phone, personal phone, and cell phone. We want to clean this data up and have only one row per employee. Well this is the situation I was presented with recently. The great news is with the Pivot Transform in SSIS this can be accomplished fairly easy. The table below is an simplified example of the data we had to work with.

EmployeeID PhoneType PhoneNumber
1 Business 123-987-6543
1 Personal 231-789-3456
1 Cell 312-897-5643
2 Business 321-978-6534
2 Personal 132-798-5634
2 Cell 213-879-3564

 

Our end product will look like this:

EmployeeID BusinessPhone PersonalPhone CellPhone
1 123-987-6543 231-789-3456 312-897-5643
2 321-978-6534 132-798-5634 213-879-3564

 

Step 1) Set up a data flow task in the control flow.

Step 2) Open up the data flow and pull in your source component. For this example I created a table in SQL Server.

Step 3) Drag in a Pivot Transform and link it to the source component.

Step 4) Configure the Pivot Transform

I was going to go through step by step instructions on how to configure the Pivot Transform. However, there is already an awesome blog out there by Devin Knight. Please click on the link below to view his blog post.

STEP 3 in Devin’s post explains how to configure the Pivot Transform.

https://devinknightsql.com/2009/06/15/better-know-a-ssis-transform-the-pivot-transform/http://bidn.com/blogs/DevinKnight/ssis/85/better-know-a-ssis-transform-%E2%80%93-the-pivot-transform

 

Thanks for Looking!

Quickly remove invalid XML characters! (Dimension Processing Error: Illegal XML Character)

Recently I was using the Dimension Processing task inside of SSIS to perform a process add to one of my dimensions. I then found out that I had some bad data, more specifically, I had some invalid XML characters in my source data and this was causing the dimension processing task to fail. Below is a screenshot that includes the invalid character, after Mitchell before =.

image

Error Message:

image

Business Problem:

I needed to remove invalid XML characters from source data so that I could use the dimension processing task to perform a process add on the dimension.

Solution:

Sadly, I originally began this process by trying to write my own script transform to remove the invalid XML characters. Unfortunately the script I was using (Found on the internet) apparently did not account for all characters as my processing task continued to fail with the same error.

So what did I do? I turned to Task Factory of course! To solve this problem I used the Task Factory replace unwanted characters transform.

Walkthrough:

    1. Bring the Task Factory Replace Unwanted Characters Transform into your data flow and place it between your source and destination:
    2. image
    3. Next open up the Task Factory component and you will see a list of your columns on the left.
    4. On the right you will see the column “Choose Action”. Click this drop down menu for the column you wish to cleanse. This menu will provide you with 4 different cleansing actions! (List below)
      1. image
    5. Select “Replace invalid characters that cannot be part of an XML Document.

image

Conclusion:

Now when I execute my package you will notice the illegal XML character is gone!

image

Thank you Task Factory Team!

For Loop Container in SSIS–Does File Exist (Part 2)

In this post I am going to explain how to set up and use the For Loop container inside SSIS in a real world scenario. In a previous post I showed how to set up a script task inside SSIS to check if a File Exist. Well if the file doesn’t exists I want to keep checking over and over again until that file does exists. The For Loop container is just what I need. Let’s get started.

This post assumes you are building on my previous post, found here: Does File Exist

Walkthrough

Open the package created in Part 1 and pull a For Loop container task into the control flow.

  1. Drag the script task created in part 1 into the For Loop container.
  2. Right click on the For Loop Container and click “edit”.
  3. Under For Loop properties you will see three options.
    1. InitExpression – This value is optional. Here you can assign an initial value. (More on this shortly)
    2. EvalExpression – This value is required. As long as this expression evaluates to true the loop will continue to loop. Once it evaluates to false the loop will stop.
    3. AssignExpression – This value is optional. This value is used to change the value of the variable that you use in the EvalExpression. For example you can add 1 to each iteration of the loop.
  4. For InitExpression enter the following: @bolDoesFileExist = False.
    1. Notice I used one (=) sign here. In SSIS you use one = for assigning values.
  5. For EvalExpression enter the following: @bolDoesFileExist == False.
    1. As long as this expressions evaluates to true the loop will continue. Once the variable no longer is false (meaning that the file does exists) the loop will stop.
  6. For Assign Expression, leave this value empty. Click ok to close out the For Loop Editor.
  7. image
  8. That’s it!

You might want to also add a delay between loops, maybe 5 seconds maybe 5 minutes, completely up to you. But here is an easy way you can add that functionality.

Pull an Execute SQL Task into your For Loop container.

  1. Choose OLEDB for connection type
  2. Select any OLEDB connection, doesn’t matter which one you choose.
  3. Select Direct Input for SQLSourceType and type the following code for your delay statement:
      • WAITFOR DELAY ‘00:00:05’

image

Here is how the final package will look:

image

Thanks for looking! Enjoy.