How to get Line Item Detail information with Totals in T-SQL (Percent of Parent).

Hi All,

This blog post is a follow up to a question I received when I gave my Advanced TSQL Webinar for Pragmatic Works. If you haven’t seen that yet and would like to see it you can view the webinar here: 

Question: How can we get SubTotals in TSQL using the CTE method you mentioned?

In my webinar I showed how to get totals and sub totals in the same result set as line item detailed information (See screenshot below). The method I used involved using the OVER clause and it kept the SQL very clean and easy to read. Unfortunately this method is not the best performing option available and that is because the over clause without framing uses disk. (I have pasted the TSQL example with the over clause at the bottom of this blog post for comparison and reference purposes.)

image

Sub Totals in SQL with CTE method:

First of all can I just preface this by saying I love CTEs? (Common Table Expressions). Let’s jump right in and write some code! For this example I’m going to be using the AdventureWorks2012 database but this should work with all versions of Adventure Works.

Step 1) – Create CTE with total information: 

WITH TOTALS AS
(
    SELECT CustomerID, SUM(TotalDue) AS Total
    FROM Sales.SalesOrderHeader
    GROUP BY CustomerID
)
SELECT * FROM Totals ORDER BY CustomerID

 

Results:

image

Step 2: Create an SQL query with line item detail information.

SELECT CustomerID, SalesOrderID, OrderDate, TotalDue
FROM Sales.SalesOrderHeader

image

Step 3: Join them together!

WITH TOTALS AS
(
    SELECT CustomerID, SUM(TotalDue) AS Total
    FROM Sales.SalesOrderHeader
    GROUP BY CustomerID
)
SELECT soh.CustomerID, SalesOrderID, OrderDate, TotalDue, Total
FROM Sales.SalesOrderHeader soh
JOIN Totals t
    ON t.CustomerID = soh.CustomerID

Final Result:

image

As I mentioned above you can get the same results using the OVER Clause in TSQL. I have pasted the code below for that example:

SELECT
    CustomerID, 
    SalesOrderID, 
    OrderDate, 
    TotalDue, 
    SUM(TotalDue) OVER(Partition By CustomerID) AS CustomerTotal
FROM    
    Sales.SalesOrderHeader

Final Thoughts: This method will generally perform better than simply using the over clause method, but it takes a more code and work. If the over clause function gets the job done and performance is not an issue I would recommend using that method to keep the code simpler and easier to read!

Thanks for looking!

Advanced T-SQL Webinar / Free Training

Thank You!

First of all thank you to everyone who attended my T-SQL presentation on 5/10/2016. We had over 1400 people registered for the training and 871 people joined and listened in, that’s a lot of people! I also received a lot of good and encouraging feedback so thank you for that as well.

Resources

The number one question I received is will the T-SQL scripts be available for download. Of course they will. If you want to download the T-SQL scripts that I used for my presentation you can find those here:

Free Recording:

The full one hour webinar that I did for Pragmatic Works on 5/11/2016 can be found here:

Questions and Answers:

I just received the list of questions from the webinar so I will get this section updated in the next couple days. I wanted to go ahead and post this blog so you could have access to the SQL Scripts.

Question:

Les Said: BTW, best presentation EVER!!!! Very clear and straight to the point in each case. Congrats!!!

Answer:

Thank you!, Best question ever!

Question:Recursive CTEs in SQL

Ken Asked: Should anchor be unique? What happens if there are two rows returned in the Anchor member?

Answer:

Hey Ken, the anchor member does not need to be unique here. In our example we were specifying that the CEO is at the first level or Level 0. If there are multiple members in the Anchor then multiple members would show up at Level 0.

Question: Pivot in SQL

Travis asked: Can you do multiple columns, such as minutes and cost?

Answer:

Travis I believe you are asking about the pivot example that I showed in my webinar. The answer is yes, you can definitely do a double pivot or pivot on multiple columns. I will write a blog on how to do this in the next week, so please check back!

Question: Merge in SQL vs. SSIS Update

Vineet asked: How does the merge statement compare to updates done in SSIS.

Answer:

The only native built in update capability that we have in SSIS is the OLE DB Command so I assume that this question is in regards to comparing the merge with the OLE DB Command. The merge pattern will perform light years better than doing updates in SSIS using the OLE DB Command. The merge pattern I showed in the webinar is a very popular design pattern used for loading data warehouses.

Question: SubTotals with CTEs in SQL Server

Vineet asked: Can you share CTE Examples on your blog.

Answer:

Hi again, you were not the only one to ask this question as I have also received a few emails requesting this example. I will write a blog post with an example on this in the next week, so please check back!

Question: What SQL Books would you recommend?

Answer:

There are a lot of great SQL Books out there. I will stick with one author here. I have three books by Itzik Ben-Gan.

  • T-SQL Fundamentals
  • T-SQL Querying
  • T-SQL Programming

How to use the Optimize For hint to force the execution plan you want.

Quite some time back I found myself fighting with an Execution Plan generated by SQL Server for one of my stored procedures. The execution plan always returned an estimated row of “1” when processing for the current day. I won’t go into details on why this one specific stored procedure didn’t use the older cached plans as expected. I will however tell you like most things with SQL Server there is more than one way to solve a problem Smile.

This method is something I have personally wanted to blog because it’s something I have only used a handful of times when I just couldn’t get the execution plan to work the way I wanted it to. Note that using this hint we are forcing the SQL Server Optimizer to use the statistics for the specific variable value that we provide. However if the table was to grow significantly in the future we may be hurting performance by forcing a bad execution plan and that is a drawback to using this hint, so now you know!

Take a look at the two screenshots below. The first is the estimated rows from the Fact Internet Sales table and the second is the estimated execution plan.

image

image

What I actually want to see for this execution plan is HASH MATCH. This will perform significantly better for the number of records that I will have. Unfortunately due to out of date statistics I’m getting a bad plan.

So let’s note two things.

  1. First, in most situations the best solution here is to simply update statistics. This should be part of ANY database maintenance plan.
  2. Second, The example I am using here is not great. I am simply forcing the plans to do what I want for demo purposes.

Let’s take a look at the original query:

DECLARE @ShipDate DATE = '1/1/2008'

SELECT 
       [EnglishProductName] AS Product
      ,[SalesOrderNumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM 
    [dbo].[FactInternetSales_Backup] FIS
  JOIN
    [dbo].[DimProduct] DP
  ON
    DP.ProductKey = FIS.ProductKey
  WHERE ShipDate > @ShipDate

Now we are going to modify this query quickly to use the Optimize For hint. This hint is going to allow us to optimize our Execution Plan in SQL Server using the specified parameter. In my instance this is going to be a previous date where I know the statistics are reflective of what I want to see in my execution plan.

Here is the modified query:

DECLARE @ShipDate DATE = '1/1/2008'

SELECT 
       [EnglishProductName] AS Product
      ,[SalesOrderNumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM 
    [dbo].[FactInternetSales_Backup] FIS
  JOIN
    [dbo].[DimProduct] DP
  ON
    DP.ProductKey = FIS.ProductKey
  WHERE ShipDate > @ShipDate

  OPTION (OPTIMIZE FOR (@ShipDate = '1/1/2005'))
GO

In this query the result set returned will still be for the original value of the variable “1/1/2008’. However the SQL Server optimizer is going to generate the plan using the OPTIMIZE FOR hint that we provided. (Highlighted in Yellow).

Now let’s take a look at our new Estimated Execution plan:

image

This time we are getting a Hash Match which is much more applicable for our table and the number of records that will be queried.

As always, Thanks Smile

SQL Tips! Generate DDL Scripts with the data from the table. (Installment 2)

Welcome back to this series on SQL Tips. Sometimes I find myself needing to not only generate the DDL for a table but also the data that is stored in that table. For example imagine I have created and populated a few tables for a blog post. It would be much easier if I could share the table definition and the SQL statement to populate that table, Agree? Great!

Scenario

In this scenario I am going to show you how to generate the DDL to create the DimCurrency table from the AdventureWorksDW2012 database. Along with the DDL script I will also show you how you can generate the script to populate the table.

Walkthrough

    1. Open SSMS and connect to the server where your table is located at.
    2. Right click on the database where your table is stored, I’m using AdventureWorksDW2012.
    3. Next click on Tasks
    4. Next click Generate Scripts
    5. image
    6. Inside the Generate Scripts wizard click next on the introduction screen.
    7. For Choose Objects screen –> Click “Select specific database objects”.
    8. Expand the tables list.
    9. Select the DimCurrency Table.
    10. image
    11. On the “Scripting Options screen” I’m just going to load the script into a new query window. Please select “Save to new query window”. This option is found at the bottom of the screen.
    12. On the same screen select the “Advanced” button found on the right side.
    13. Now scroll down to “Types of data to script” and change it from schema only to Schema and Data.
    14. image
    15. Click Ok. Click Next, Click Next. The script will now be created and generated in a new query window. Click Finish once it has completed. Below is a partial screenshot of the completed product.
    16. image

 

As always, thanks for looking!

Use the aggregate transform in SSIS to out perform T-SQL!

I know I got your attention with that title. If you speak to any experienced BI Developer you will always be informed that you should perform all of your aggregations in T-SQL where possible in the source and not using the SSIS components. To this point I agree, well most of the time anyway. There are however times when using SSIS will out perform T-SQL if all the stars align. Let me explain.

Recently I was using a CTE and the row_number function in T-SQL to bring back only distinct customers from a list of transactions. This result set was then used to insert inferred members into the customer dimension, if that customer did not yet exist of course. Once the result set returned the list of distinct customers we performed a lookup against the customer dimension and we loaded the non matching customers into the customer dimension.

Now let me note, the T-SQL query honestly wasn’t that bad, it was only taking around 3 minutes to run. The bigger problem here was that the query was using temp db and this was causing locking issues with a more advanced design pattern (Partition switch) that the SQL Server Engine could not resolve.

So, we turned to good ole SSIS to help us out.

  • First, bring all records from the source exactly as they were, all 30 million transactions.
  • Next, perform the lookup to the customer dimension and redirect non matching rows to aggregate transform.
  • Finally, use the aggregate transform to group the new customers and then load the distinct values into the customer dimension.

image

This works because we filter out all the existing customers before doing the aggregation in SSIS and therefore only small percentage of the records require aggregation. The data flow task to load inferred members runs in approximately 80 seconds, where as the original T-SQL query alone was taking over 3 minutes.

Thanks for looking!