Quickly encrypt and decrypt files with the TF PGP task for SSIS!

Business Problem

I recently had to help a client with encrypting a file before uploading that file to an FTP. Thanks to the PGP task in Task Factory we were able to fulfill this requirement in a matter of minutes.

Walkthrough

In this example I will walk you through quickly creating public and private keys and then using those keys to encrypt the file.

    1. Create two variables. One for the source location and one for the destination:
    2. image
    3. From the SSIS Toolbox drag and drop the PGP Task into the control flow and then open it for editing.
    4. For “What action will this perform?” choose Encrypt File.
    5. Click Generate Key to create your own public and private keys. Note: Public keys are used for encryption and private keys are used for decryption.
    6. image
    7. Now to fill out some basic information. The private key created will require a password, please remember to store this password somewhere safe, i.e. some kind of password vault. This will be required to decrypt files.
    8. image
    9. Now that we have generated PGP keys we can now encrypt our files in just a couple steps. A screenshot of all steps below is listed at the bottom.
    10. For Section 1:
      1. Select “File location is stored in a variable”
      2. Select “strFileName” for the variable
    11. Section 2:
      1. Select “Destination location is stored in a variable”
      2. Select “strFileName_Encrypt” for the variable.
      3. Select the checkbox for Overwrite the destination file if it already exists
    12. Section 3:
      1. Select connection manager: Select “Public”.
        1. Note: The connection manager “Public” was added to the package automatically when we generated new keys. Public is the name I gave to my public key ring.
      2. Finally select the Public Key you wish to use for encryption. We only have one key on our key ring.

I have provided three screenshots below.

    1. Screenshot 1 – Final Configuration
    2. Screenshot 2 – File before PGP Encryption
    3. Screenshot 3 – File after PGP Encryption

Screenshot of final configuration:

image

 

image

 

As always thanks for looking and your feedback is appreciated!

Get a List of Files on an FTP Server and then process with Foreach Loop Container

Today we are going to address a very common FTP question: “How can I get a list of the files that are on my FTP Server?”

This is a task that can be accomplished very easily with the Task Factory Secure FTP component and this walk through will provide step by step directions. If you are interested in Task Factory and the more than 40 tasks you can find the free trial here:

Here we will use the “Get list of files” option and store all the files for a given directory into an object variable. We will use a Foreach loop container to iterate through the object variable and then an Execute SQL task to write each file name to a table in our database. This walkthrough is broken down into three sections.

  • Part 1 – Setting up the Secure FTP component to get List of Files
  • Part 2 – Setting up the For Each Loop to read the the object variable
  • Part 3 – (Optional) Setting up the Execute SQL Task to write the file names to a table.

Part 1 – Configuring the Secure FTP site to Get List of Files from server.

  1. Start a new package and drag the Secure FTP task into your control flow from the toolbox.
  2. Next right click on the Secure FTP Task and click edit.
  3. Steps 4-7 below are in screenshot 1
  4. What will this task do? Select “Get list of files” from the drop down menu.
  5. For SFTP Connection Manager either choose an existing connection manager or create a new one if don’t have one set up.
  6. For your Directory location on server you have two options:
    1. First you can choose to click the ellipsis button and hard code the directory location.
    2. Second you can choose to store the directory location in a variable
  7. Next create an object variable to store the list of file names in from the server. (objFilesList)

Screenshot 1

Part 2 – Configuring the For Each Loop to iterate through the object variable

  1. Pull a Foreach Loop container into the Control Flow and drag the precedence constraint from the Secure FTP task to the Foreach loop container.
  2. Steps 3-6 can be found in Screenshot 2
  3. Open the For Each container and click on the Collection tab.
  4. For “Enumerator” choose “Foreach From Variable Enumerator” from the drop down.
  5. Next select the object variable you specified in the Secure FTP component. (objFilesList)
  6. Finally click the Variable Mappings tab.

Screenshot 2

  1. On the Variable Mappings tab select a string variable to store the file name in. (strFileName)
  2. Make sure Index is set to 0. (Screenshot 3)
  3. Click Ok.

Screenshot 3

Optional: (This section simply shows how to map variables in an Execute SQL Task and helps with visualization of each iteration of the Foreach Loop through the list of files on our FTP.)

Part 3 – Setting up the Execute SQL Task to read the string variable

(This section has been added for completeness and is fully optional.)

Note: For this example I have created a simple table with four columns:

CREATE TABLE [dbo].[FilesOnSftp](

[ID] [int] IDENTITY(1,1) NOT NULL,

[Directory] [varchar](50) NULL,

[FileName] [varchar](50) NULL,

[DateLoaded] [date] NULL)

  1. Pull an Execute SQL Task from the toolbox into the Foreach Loop container.
  2. Create a new variable called strSftpDirectory and give it a value of /

  3. Right click on the Execute SQL Task and click Edit to bring up the Editor
  4. For “Connection” choose the connection manager to your database.
  5. Click and highlight “SQLStatement” and then click the Ellipsis button that appears on the right.

Screenshot 4

  1. Clicking the Ellipsis button will give you a pop up window, this is where we enter our Insert SQL Statement. Here is the syntax I used if you are following along with the tutorial:

    INSERT INTO dbo.FilesOnSftp

([Directory], [FileName], [DateLoaded])

VALUES(?, ?,GETDATE())

  1. Now its time to map the parameters. Click on the Parameter Mapping tab. (Screenshot 5)
  2. Click the Add button and then select “strSftpDirectory” for Variable Name.
  3. For data type choose Varchar and change the Parameter Name to 0.
  4. Click the Add button a second time and select strFileName for the Variable Name.
  5. For data type choose Varchar and change the Parameter Name to 1. Click ok to save changes.

    Screenshot 5

Quickly remove invalid XML characters! (Dimension Processing Error: Illegal XML Character)

Recently I was using the Dimension Processing task inside of SSIS to perform a process add to one of my dimensions. I then found out that I had some bad data, more specifically, I had some invalid XML characters in my source data and this was causing the dimension processing task to fail. Below is a screenshot that includes the invalid character, after Mitchell before =.

image

Error Message:

image

Business Problem:

I needed to remove invalid XML characters from source data so that I could use the dimension processing task to perform a process add on the dimension.

Solution:

Sadly, I originally began this process by trying to write my own script transform to remove the invalid XML characters. Unfortunately the script I was using (Found on the internet) apparently did not account for all characters as my processing task continued to fail with the same error.

So what did I do? I turned to Task Factory of course! To solve this problem I used the Task Factory replace unwanted characters transform.

Walkthrough:

    1. Bring the Task Factory Replace Unwanted Characters Transform into your data flow and place it between your source and destination:
    2. image
    3. Next open up the Task Factory component and you will see a list of your columns on the left.
    4. On the right you will see the column “Choose Action”. Click this drop down menu for the column you wish to cleanse. This menu will provide you with 4 different cleansing actions! (List below)
      1. image
    5. Select “Replace invalid characters that cannot be part of an XML Document.

image

Conclusion:

Now when I execute my package you will notice the illegal XML character is gone!

image

Thank you Task Factory Team!

For Loop Container in SSIS–Does File Exist (Part 2)

In this post I am going to explain how to set up and use the For Loop container inside SSIS in a real world scenario. In a previous post I showed how to set up a script task inside SSIS to check if a File Exist. Well if the file doesn’t exists I want to keep checking over and over again until that file does exists. The For Loop container is just what I need. Let’s get started.

This post assumes you are building on my previous post, found here: Does File Exist

Walkthrough

Open the package created in Part 1 and pull a For Loop container task into the control flow.

  1. Drag the script task created in part 1 into the For Loop container.
  2. Right click on the For Loop Container and click “edit”.
  3. Under For Loop properties you will see three options.
    1. InitExpression – This value is optional. Here you can assign an initial value. (More on this shortly)
    2. EvalExpression – This value is required. As long as this expression evaluates to true the loop will continue to loop. Once it evaluates to false the loop will stop.
    3. AssignExpression – This value is optional. This value is used to change the value of the variable that you use in the EvalExpression. For example you can add 1 to each iteration of the loop.
  4. For InitExpression enter the following: @bolDoesFileExist = False.
    1. Notice I used one (=) sign here. In SSIS you use one = for assigning values.
  5. For EvalExpression enter the following: @bolDoesFileExist == False.
    1. As long as this expressions evaluates to true the loop will continue. Once the variable no longer is false (meaning that the file does exists) the loop will stop.
  6. For Assign Expression, leave this value empty. Click ok to close out the For Loop Editor.
  7. image
  8. That’s it!

You might want to also add a delay between loops, maybe 5 seconds maybe 5 minutes, completely up to you. But here is an easy way you can add that functionality.

Pull an Execute SQL Task into your For Loop container.

  1. Choose OLEDB for connection type
  2. Select any OLEDB connection, doesn’t matter which one you choose.
  3. Select Direct Input for SQLSourceType and type the following code for your delay statement:
      • WAITFOR DELAY ‘00:00:05’

image

Here is how the final package will look:

image

Thanks for looking! Enjoy.

SSIS Script Task–Does File Exists (Part 1)

So in this blog I am going to show you how you can write a quick Script Task inside of SSIS to see if a file exists. In my next blog in this two part series I will show you how you can then loop over this script task until the file does exist.

You can find Part 2 in this series here:

Business Problem:

So you know that you are going to receive a file that you need to process, and you want to process and load that file the moment it “exists”. If you run your package before the file exists the package will fail, so you want to first check to see if the file exist and then run your data flow task that will subsequently load the file.

Walkthrough

Open up your Visual Studios environment and create a new package.

  1. Pull the Script Task component into the data flow.
  2. First let’s create two variables, we will use these in our script task.
  3. image
    1. The value for the strFileName variable is the location of the file, please change yours as needed.
  4. Open the script task and make sure the Script Language is set to Visual Basic.
  5. Select “strFileName” for the read only variables.
  6. Select ‘”bolDoesFileExist” for read/write variables.
  7. image
  8. Click “Edit Script…” to open visual studios.
  9. Ok. Inside the VB project we first need to add a new class of code.
  10. In the Imports section add the following code: Imports System.IO
  11. image
  12. Now we enter our VB code. Find the section titled “Public Sub Main” and type the following code where it says ‘Add Code Here’.
  13. Dts.Variables(“bolDoesFileExist”).Value = File.Exists(Dts.Variables(“strFileName”).Value)
  14. Here is a screenshot of the final result:
  15. image
  16. Now simply click the x at the top right to close the visual studio project to close it out, you don’t have to click save.
  17. Back in the script task editor main screen click ok to compile the code. All Done!

 

Explanation of Visual Basic Code

The code inside the script task is in two parts.

  • The first part is the function “File.Exists”. The file exists function is reading the value of the strFileName variable and checking to see if that file exists. It will return a value of either True or False.
  • The second part is writing the result of the File Exists function to the variable bolDoesFileExist. So everytime this package runs the value of that variable will change based on whether that file does or does not exist.

Thanks for looking and I hope this helps. In my next post I will explain how you can loop over this task over and over again until the file does exist.

Remove unwanted characters in SSIS with Task Factory!

Recently I was tasked with removing special characters that randomly showed up in one column of a data set, these characters were coming from a proprietary source. The characters were the hash tag (#) and semi-colon (;). One or both characters would appear at the beginning, end or middle of the column. There are a few different ways you could deal with this type of situation, each of them varying in their degree of complexity and difficulty.

Fortunately for the customer we provided a very easy and quick solution. In this short blog we are going to take a look at the Data Cleansing transform offered in Task Factory. First let’s take a look at a before and after picture so you can visualize the data and end result.

Setting up the component

  1. Pull the Task Factory Data Cleansing Transform into your Data Flow.
  2. Connect your source data to the Data Cleansing transform.
  3. Right click on the Data Cleansing Transform and click Edit.
  4. First select the column you would like to “Replace”.
  5. Next select Replace Specified characters or words with user defined value. (Screenshot 1)

  6. After you select an action you will receive a parameter box where you can specify the characters you would like to replace and what you want to replace them with.

  7. Now we will finish configuring the rest of the component. Please see screenshot below for final settings.

  8. Finally click ok to save your settings and execute the package!

Conclusion:

That’s it. I tried to drag it out as much as I could but some things are just that simple. Take a look at some of the other actions included and enjoy! As always thanks for looking.

SSIS Performance Tuning Webinar–Q & A

Thank you everyone for attending my webinar on SSIS Performance Tuning, if you missed that presentation you can watch it here: http://pragmaticworks.com/LearningCenter/FreeTrainingWebinars/PastWebinars.aspx?ResourceId=683

Below are some of the questions that I received from the Webinar:

Q: Can you give some performance tips in using script tasks too?

A: Yea, don’t use them! Script tasks can be really bad for performance and many times I have been able to replace someone else’s code with native SSIS components. For example, I had situation where the client was using some very complicated .net code to parse out columns based on logic. The package was taking over an hour to process 1 million records. I replaced this with some conditional splits and derived columns and it now runs in 3 minutes.

Q:  I am assuming that the file formats must be the same for all files when using the MultiFlatFile transform, correct?

A: You are absolutely correct. The metadata in each file must match.

Q: PW delivers a ‘TF Advanced Lookup Cache Transform” component.  What are the benefits of using this component over the Cache Transform covered earlier?  It seems that the TF components cannot make use of the same result set when the data set is role based.

A: For basic caching of data I would use the native SSIS cache transform. The major advantage you get from the Task Factory component is you can do very difficult range lookups with ease and  they will perform at a high level. Please see my blog post on this.

Q: What version of SQL Server is being used?

A: I was using SQL 2012, but everything in the presentation is applicable to 2005 and 2008.

Q: With the multi flatfile connection manager can you specify specific types?

A:  Yes, the wild card character can be anywhere in the connection string property. So you could do test*.txt to only pull in text files where the file name begins with test.

Q: Why would you ever not use table or view (fast load) option in the OLEDB Destination?

A: Well I personally would always use that option. However, with the fast load option all records are committed for the entire batch. So if there is a record that is bad and causes the failure you will not know which record caused the error. With table or view option each record is committed individually so you know exactly which record caused the failure.

Q: is the volume on?

A: Sorry can you ask that again, I couldn’t hear you….

Thanks again!

How to solve a a tricky SSIS Problem with the TF Surrogate Key transform!

Recently I was able to solve a really tricky problem in SSIS using the TF Surrogate key transform. Let’s discuss the business problem:

The client was using some complicated .net code to loop through each line in a file and look for a very specific string of code. Once the string was found they then wanted to return all the values found six rows below that row. This process continues through the file. A picture is worth a thousand words:

image

So now that we understand the business problem we can solve this with SSIS. The solution is actually quite simple. Essentially we need to get the string we are searching for (5909045743) and the row with the values on the same row so we can let SSIS work its’ magic. This is where the Surrogate Key transform comes in.

To solve this problem I used not one but two Flat File source components in my data flow and I brought the same file in twice. I then used the surrogate key transform to assign row numbers to the rows coming from each flat file source component.

  • For the first flat file source I started the row number at 6 and then incremented by one.
  • For the second flat file source I started the row number at 1 and incremented by one.
  • Setting up the surrogate key transform is extremely easy, 3 easy steps, see below:

image

Now I use the merge join component in SSIS and join the two data sets on the newly created SK_Columns. This will put our search string on the same row with the values we were looking for.

Below is a picture of the data flow thus far:

image

And here is a screenshot of the new result set after the Merge Join transform, notice how the two rows are now on the same row. From here we can do some basic SSIS operations to split out all the records we don’t need and then we can use a derived column transform to parse out the values in “Column 2”.

image

Thanks for looking!

How to properly sort your source query in SSIS.

Unfortunately when you have a sorted result set in SSIS using T-SQL this sorting is not automatically picked up in the metadata. This subsequently causes additional issues with components like the Merge Join that expect the result set to be sorted and the metadata to pick this up. Because of this fact there are two additional steps that must be performed in the OLE DB Source.

    1. Add your sorted select statement to your OLE DB source.
    2. Right click and go to Show Advanced Editor.
    3. Click the last tab on top, “Input and Output Properties”.
    4. Click OLE DB Source Output
    5. Change the IsSorted property to True, the default is false.
    6. image
    7. Now expand the tree view for “OLE DB Source Output” and expand “Output Columns”
    8. Find the column that you sorted in your result set and assign it a SortKeyPosition of 1.
    9. image

Thanks for looking.

Use File Cache and the Cache Transform to help mitigate network issues.

Why does the Cache Transform in SSIS have a file cache option? When would I use this, especially when compared with using the default option which is to store the results in memory? One downside of the default option of storing the results in memory is that the cache is cleared immediately once the package is completed and that memory cache cannot be shared among other packages.

However, the File connection option will allow the results to be reused among multiple packages. But hold up, that’s not quite as good as you might think, each package still must read from the file and then subsequently load that file into memory!

So why are we discussing this if it’s essentially doing the same thing as the default option? Good point, and well received, did you read the title?

Imagine now that you have a rather large table that you need to access many times during your nightly ETL, this is very common in data warehousing situations. You can now lessen the effect of a poor performing network by using the file cache option in the cache transform. You can bring the data across the network only once, load it into a file, and in the future you can reference that file on your local machine and avoid  going across the network. As you can imagine this can be quite advantageous!

Set up is quite simple as well, see below:

    1. Bring the cache transform into your data flow and connect it to your source.
    2. Open the cache transform for editing.
    3. Select NEW for cache connection manager.
    4. image
    5. Select “Use file Cache” and then browse to the location where the file will be stored.
    6. Next click the Column tab at the top and specify at least one index column.
    7. image
    8. Click OK, map the columns, Click OK to close out the transform.

Now that the initial set up is complete I will demonstrate how to reference this file:

    1. In the next data flow I set up an OLE DB source and connection that output to a Lookup transform.
    2. Inside the lookup transform select “Cache Connection Manager” on the general screen.
    3. image
    4. On the connection screen select your cache connection manager from the drop down list.
    5. image
    6. And that’s it!, set up the rest of  your lookup transform like you normally would.

Thanks for looking!