SSIS Performance Tuning Webinar–Q & A

Thank you everyone for attending my webinar on SSIS Performance Tuning, if you missed that presentation you can watch it here: http://pragmaticworks.com/LearningCenter/FreeTrainingWebinars/PastWebinars.aspx?ResourceId=683

Below are some of the questions that I received from the Webinar:

Q: Can you give some performance tips in using script tasks too?

A: Yea, don’t use them! Script tasks can be really bad for performance and many times I have been able to replace someone else’s code with native SSIS components. For example, I had situation where the client was using some very complicated .net code to parse out columns based on logic. The package was taking over an hour to process 1 million records. I replaced this with some conditional splits and derived columns and it now runs in 3 minutes.

Q:  I am assuming that the file formats must be the same for all files when using the MultiFlatFile transform, correct?

A: You are absolutely correct. The metadata in each file must match.

Q: PW delivers a ‘TF Advanced Lookup Cache Transform” component.  What are the benefits of using this component over the Cache Transform covered earlier?  It seems that the TF components cannot make use of the same result set when the data set is role based.

A: For basic caching of data I would use the native SSIS cache transform. The major advantage you get from the Task Factory component is you can do very difficult range lookups with ease and  they will perform at a high level. Please see my blog post on this.

Q: What version of SQL Server is being used?

A: I was using SQL 2012, but everything in the presentation is applicable to 2005 and 2008.

Q: With the multi flatfile connection manager can you specify specific types?

A:  Yes, the wild card character can be anywhere in the connection string property. So you could do test*.txt to only pull in text files where the file name begins with test.

Q: Why would you ever not use table or view (fast load) option in the OLEDB Destination?

A: Well I personally would always use that option. However, with the fast load option all records are committed for the entire batch. So if there is a record that is bad and causes the failure you will not know which record caused the error. With table or view option each record is committed individually so you know exactly which record caused the failure.

Q: is the volume on?

A: Sorry can you ask that again, I couldn’t hear you….

Thanks again!

Advertisements

How to solve a a tricky SSIS Problem with the TF Surrogate Key transform!

Recently I was able to solve a really tricky problem in SSIS using the TF Surrogate key transform. Let’s discuss the business problem:

The client was using some complicated .net code to loop through each line in a file and look for a very specific string of code. Once the string was found they then wanted to return all the values found six rows below that row. This process continues through the file. A picture is worth a thousand words:

image

So now that we understand the business problem we can solve this with SSIS. The solution is actually quite simple. Essentially we need to get the string we are searching for (5909045743) and the row with the values on the same row so we can let SSIS work its’ magic. This is where the Surrogate Key transform comes in.

To solve this problem I used not one but two Flat File source components in my data flow and I brought the same file in twice. I then used the surrogate key transform to assign row numbers to the rows coming from each flat file source component.

  • For the first flat file source I started the row number at 6 and then incremented by one.
  • For the second flat file source I started the row number at 1 and incremented by one.
  • Setting up the surrogate key transform is extremely easy, 3 easy steps, see below:

image

Now I use the merge join component in SSIS and join the two data sets on the newly created SK_Columns. This will put our search string on the same row with the values we were looking for.

Below is a picture of the data flow thus far:

image

And here is a screenshot of the new result set after the Merge Join transform, notice how the two rows are now on the same row. From here we can do some basic SSIS operations to split out all the records we don’t need and then we can use a derived column transform to parse out the values in “Column 2”.

image

Thanks for looking!

Create efficient Range Lookups with the Task Factory Advanced Lookup Transform

There are three generally accepted methods to perform the tricky range lookup in SSIS. A great blog post that outlines all three of these methods and how they perform can be found here. Per the post the best performing solution is unfortunately the one that is arguably the most difficult to set up. Fortunately for us we can get all the performance of scripting without all the work!

Scenario:

We need to perform a lookup to the segment table based on the Weekday and Time that the transaction took place. For example, assume we had a transaction take place on Tuesday at 15:30:00. This transaction would need to be mapped to the SegmentKey of 3. Screenshot of segment table below:

image

Now that we have identified our business requirement, let’s look at how we can use the Advanced Lookup Transform in Task Factory to solve this tricky problem.

  1. For this example we will need two data flow tasks.
    1. The first data flow task will load our segment table into the TF ADV Lookup Cache Transform.
      1. Inside the first data flow task I bring in an OLEDB Source and select my Segment table.
      2. Next I bring in the TF Advanced Lookup Cache transform and connect the two transforms. (Screenshot 1)

image

      1. Open the TF Cache transform and select “Create New Lookup Cache Transform” (Screenshot 2)

image

      1. Click the Input Parameter tab at the top. We are going to add two parameters, one for day and one for the time of the actual transaction. In the bottom dialog box we are going to define our conditions for when a lookup should occur. For this example we have three criteria.
        1. First the day of the transaction must equal the day in the Segment table.
        2. Second the begin time must be less than or equal to the time of the transaction.
        3. Finally the end time must be greater than or equal to the time of the transaction.

image

      1. Once the criteria have been entered click ok twice to close out of the cache connection manager.
      2. Now in our second data flow we will pull in our transactional data and perform our lookup. For this example I have written a simple select statement to simulate a transaction.
        SELECT 'Tuesday' AS DayOfWeek, '15:30:00' AS TIME
      3. Now that the source has been set up we need to bring in the TF Advanced Lookup Transform and open it for editing.
      4. Select our Cache Connection manager from the drop down and then map our input columns to our input parameters. Screenshot below.

image

      1. Now execute the package!

image

Thanks for looking.