Recently I was working on a project where we needed to process upward of 12,000 files per day. As most would expect we used the Foreach Loop container in SSIS to loop through all 12,000 files. The results? A whopping 77 minutes to run. WOW! (Incase you were wondering this is WOW that’s AWEFUL!, not wow that’s great)
I assume the excessive time is here is due to the time it takes to dynamically set the flat file connection manager and close/reopen the connection. I don’t know exactly how long this process takes but hypothetically let’s say it takes 1 second for every 3 iterations of the loop.
Now some simple math. 12,000 files / 3 (files per second) / 60 seconds = 67 minutes.
So how did we get around this problem? Well we tried two different design patterns. The first and the one I will be blogging about here is the “MultiFlatfile” connection manager. The second, more complicated to set up, but extremely efficient can be found here: Anthony Martin.
Let’s performance test the MultiFlatFile connection manager!
For this post I created three packages.
- The first package generates 1000 flat files for testing.
- The second package iterates through all 1000 files. Runtime (22 Minutes)
- The third package uses the multiflatfile connection to bring in all the files instantaneously. (13 Seconds)
The Foreach loop got utterly destroyed in this contest. The multiflatfile connection manager was able to process all 1000 files in only 13 seconds!! Let’s look at how to set this up.
- Right click in the connection manager window and select New Connection > MultiFlatFile > Click Add
- Set up the general screen like your typical Flat File connection, you will need to hardcode an existing file.
- Click ok to close the connection manager editor, verify the connection manager is selected and then open up the properties window.
- In the properties window, scroll down to Expressions > Click the Ellipsis “…” and select “Connection String” from the drop down menu.
- Next click the ellipsis to the right to open up the Expression editor. This part is critical because this is where we set an expression that will determine which files are extracted from the directory. For this simple example my expression was “C:\\Blogs\\1000_Flat_Files\\Test*” (Don’t miss the asterisk here!). The connection manager will now retrieve all files that start with “Test”. Screenshots for set up below.
This is an awesome alternative to the For Each Loop and it’s much easier to set up, however it has some obvious drawbacks with how flexible it is. If you discover this pattern does not perform as well as you hoped or is not flexible enough please review the blog post I linked above by Anthony Martin. It takes a little more set up but offers incredible flexibility and is the best performing solution!
As always, thanks for looking!