There are quite a few different scenarios for when cache transforms can be used to improve usability and performance inside SSIS. In this post I am going to quickly outline a scenario where cache connection managers can greatly improve performance when loading a fact table.
During the pre-execution phase of package execution SSIS “caches” the lookups inside your package into memory, unfortunately if you are using OLE DB connections in your Data Flow Task this caching happens in a serialized format (one after the other). Typically this isn’t a big deal because you may only have one or two large dimension lookups and all others are very small in size and very fast. But what if this isn’t the case? What if you have 5, 6, 7, or more dimensions all with millions of records that you must cache before the package begins execution? Here is how we can get around this little issue.
- We can have all of our lookups cached in parallel if we use the cache transform in a data flow task prior to the loading of the fact load. If you have never used the cache transform before you can find a quick demo here: http://beyondrelational.com/modules/2/blogs/82/posts/13209/msbi-60-ssis-24-data-flow-task-10-cache-transform-task-detailed-information-amp-explanation-with-exa.aspx
- Inside the data flow tasks we will set each source to point to our cache transforms. Each one of the Cache Transforms will require a connection manager to store the data.
- Below is a screenshot of the connection managers created by the Cache transforms:
- Now that the cache connection managers have been set up, you will need to point each of your lookup transforms to the “Cache Connection manager” instead of the default of OLE DB connection manager.
- Now to the last step. Now click on the connection tab and point to the cache connection manager that stores the information for your lookup:
- All other configuration steps are the same as normal. Thanks for looking, and hope this helped.