Advanced TSQL Takeover

Thanks for joining the 3 hour webinar I did for Pragmatic Works on Advanced TSQL. There were, as could be expected, many questions regarding the webinar so in this blog post I will try to answer as many of those questions as I can.

The TSQL Scripts used in the class can be found HERE: Please note that for one of the examples you will need to create and load the MicrosoftStockHistory example and I have provided a script to do just that.

Common Table Expressions

Question: Why does a common table expression begin with a semi-colon?

Answer: The WITH keyword is used for other purposes in TSQL other than just CTEs so to avoid confusion any statements prior to the CTE must be terminated using a semi-colon.

Question: What are the performance differences between CTEs and Temp tables?

Answer: Unfortunately CTEs and Temp Tables have entirely different use case scenarios so this is like comparing apples to oranges. CTEs are logical tables used to simplify code, essentially they are like views. Temp tables on the other hand are physically materialized on disk in the temp db. They also have column statistics and can be indexed for optimizing your TSQL Code. The answer to this question is really much more in depth then what I have provided here but this is the really really short answer.

Question: Is there a performance difference with CTE and Derived tables?

Answer: Rarely will you see a performance difference here, I have always seen identical plans personally when comparing performance. However, I called and talked to the smartest DBA I know, Bradley Ball. There are some situations with self-referencing CTEs where the SQL Server optimizer can generate a bad SQL Plan due to the Cardinality Estimator not being able to read the statistics of the underlying table. This is the same behavior that you would see if you were nesting views on top of views. In those situations a Derived table or a Temp Table would be a better option.

Window Functions: Framing

Question: When using Range for the frame could the duplicates values be eliminated with the Partition By Clause?

Answer: Range can generate unusual results when used to generate running totals. The unusual results occur when the Order By clause within a partition can contain duplicate values therefore duplicating the “running total” for multiple cells. The fix here is to either use the ROWS framing option or to choose a column for the Order BY clause that is unique.

Question: Can you call LAG(LAG(ClosePrice)) to get the close price from 2 days previous?

Answer: The default behavior of lag is to go back one record. The default behavior can be overridden by using the optional parameter [offset] when writing your lag function. For example the code here would look like: LAG(ClosePrice, 2) OVER (ORDER BY DATE)

Question: Is the performance gain of framing window functions only applicable for the aggregate window functions, i.e. not important for the other window functions?

Answer: The performance gain can be achieved also with ranking and analytical window functions. Anytime an order by clause is required we can specify ROWS instead of the default behavior of RANGE. We showed a demo of using ROWS with the Last_Value analytical window function in the webinar.

Question: Will there be a bathroom break?

Answer: I don’t plan on making it three hours without taking a break! Smile

Working with XML in SQL Server

Question: Can you query a XML File?

Answer: Absolutely. Querying an XML file is very easy to do and this is also referred to as Shredding XML. I will try to write a specific blog walking through how to do this in the coming weeks. Please note that I didn’t have enough time to show this in our extended 3 hour presentation but in the Pragmatic Works Advanced TSQL course this is covered.

Question: Can you create element centric XML but include some columns as Attributes?

Answer: This can be done easily by using the XML Path mode. XML Path generates element centric XML by default, this is in contrast to XML Auto and Raw which generates attribute centric XML. With XML Path you can convert any of the columns into attributes by simply giving it an alias name preceded by the @ symbol. For example FirstName as [@FirstName].

On-Demand Training

Question: Can we see a syllabus of the on demand courses and what material is presented in each class/section without signing up for a trial?

Answer: Please visit http://pragmaticworks.com/Training/On-Demand-Training, once there scroll down to the bottom of the page and then click on one of the individual class names to get the class outline and other information.

Question: Does the on-demand training include the ability to ask questions when something isn’t clear?

Answer: Yes, you can send questions to our training director directly. Another great feature of our On-Demand training platform is that it comes with two free hours of virtual mentoring. This is one on one time with a Pragmatic Works consultant.

How to get Line Item Detail information with Totals in T-SQL (Percent of Parent).

Hi All,

This blog post is a follow up to a question I received when I gave my Advanced TSQL Webinar for Pragmatic Works. If you haven’t seen that yet and would like to see it you can view the webinar here: 

Question: How can we get SubTotals in TSQL using the CTE method you mentioned?

In my webinar I showed how to get totals and sub totals in the same result set as line item detailed information (See screenshot below). The method I used involved using the OVER clause and it kept the SQL very clean and easy to read. Unfortunately this method is not the best performing option available and that is because the over clause without framing uses disk. (I have pasted the TSQL example with the over clause at the bottom of this blog post for comparison and reference purposes.)

image

Sub Totals in SQL with CTE method:

First of all can I just preface this by saying I love CTEs? (Common Table Expressions). Let’s jump right in and write some code! For this example I’m going to be using the AdventureWorks2012 database but this should work with all versions of Adventure Works.

Step 1) – Create CTE with total information: 

WITH TOTALS AS
(
    SELECT CustomerID, SUM(TotalDue) AS Total
    FROM Sales.SalesOrderHeader
    GROUP BY CustomerID
)
SELECT * FROM Totals ORDER BY CustomerID

 

Results:

image

Step 2: Create an SQL query with line item detail information.

SELECT CustomerID, SalesOrderID, OrderDate, TotalDue
FROM Sales.SalesOrderHeader

image

Step 3: Join them together!

WITH TOTALS AS
(
    SELECT CustomerID, SUM(TotalDue) AS Total
    FROM Sales.SalesOrderHeader
    GROUP BY CustomerID
)
SELECT soh.CustomerID, SalesOrderID, OrderDate, TotalDue, Total
FROM Sales.SalesOrderHeader soh
JOIN Totals t
    ON t.CustomerID = soh.CustomerID

Final Result:

image

As I mentioned above you can get the same results using the OVER Clause in TSQL. I have pasted the code below for that example:

SELECT
    CustomerID, 
    SalesOrderID, 
    OrderDate, 
    TotalDue, 
    SUM(TotalDue) OVER(Partition By CustomerID) AS CustomerTotal
FROM    
    Sales.SalesOrderHeader

Final Thoughts: This method will generally perform better than simply using the over clause method, but it takes a more code and work. If the over clause function gets the job done and performance is not an issue I would recommend using that method to keep the code simpler and easier to read!

Thanks for looking!

Advanced T-SQL Webinar / Free Training

Thank You!

First of all thank you to everyone who attended my T-SQL presentation on 5/10/2016. We had over 1400 people registered for the training and 871 people joined and listened in, that’s a lot of people! I also received a lot of good and encouraging feedback so thank you for that as well.

Resources

The number one question I received is will the T-SQL scripts be available for download. Of course they will. If you want to download the T-SQL scripts that I used for my presentation you can find those here:

Free Recording:

The full one hour webinar that I did for Pragmatic Works on 5/11/2016 can be found here:

Questions and Answers:

I just received the list of questions from the webinar so I will get this section updated in the next couple days. I wanted to go ahead and post this blog so you could have access to the SQL Scripts.

Question:

Les Said: BTW, best presentation EVER!!!! Very clear and straight to the point in each case. Congrats!!!

Answer:

Thank you!, Best question ever!

Question:Recursive CTEs in SQL

Ken Asked: Should anchor be unique? What happens if there are two rows returned in the Anchor member?

Answer:

Hey Ken, the anchor member does not need to be unique here. In our example we were specifying that the CEO is at the first level or Level 0. If there are multiple members in the Anchor then multiple members would show up at Level 0.

Question: Pivot in SQL

Travis asked: Can you do multiple columns, such as minutes and cost?

Answer:

Travis I believe you are asking about the pivot example that I showed in my webinar. The answer is yes, you can definitely do a double pivot or pivot on multiple columns. I will write a blog on how to do this in the next week, so please check back!

Question: Merge in SQL vs. SSIS Update

Vineet asked: How does the merge statement compare to updates done in SSIS.

Answer:

The only native built in update capability that we have in SSIS is the OLE DB Command so I assume that this question is in regards to comparing the merge with the OLE DB Command. The merge pattern will perform light years better than doing updates in SSIS using the OLE DB Command. The merge pattern I showed in the webinar is a very popular design pattern used for loading data warehouses.

Question: SubTotals with CTEs in SQL Server

Vineet asked: Can you share CTE Examples on your blog.

Answer:

Hi again, you were not the only one to ask this question as I have also received a few emails requesting this example. I will write a blog post with an example on this in the next week, so please check back!

Question: What SQL Books would you recommend?

Answer:

There are a lot of great SQL Books out there. I will stick with one author here. I have three books by Itzik Ben-Gan.

  • T-SQL Fundamentals
  • T-SQL Querying
  • T-SQL Programming

SQL Tips! How to edit the query for Edit TOP 200 Rows in Management Studio (SSMS)

It has been some time since I have written one of my SQL Tip blogs. I like to use the “Edit Top 200” rows functionality provided by SSMS when making quick but different updates across like records in SQL Server. Did you know you can edit the top 200 rows query and get the records you’re interested in updating?  We can do this in just a few easy steps.

First open up the “Edit Top 200 Rows” window in SSMS for the table you want to edit.

image

Second, Right click at the top left of the table and from the drop down list select Pane > SQL.

image

Next, now we see a SQL statement that we can edit. Here I simply added the where clause WHERE MakeFlag = ‘TRUE’

image

Once the SQL has been updated it needs to be executed. Right click on the top left corner of the table again. This time select “Execute SQL”.

image

Below you can see the new result set available now for editing!

image

As always thanks for looking.

How to use the Optimize For hint to force the execution plan you want.

Quite some time back I found myself fighting with an Execution Plan generated by SQL Server for one of my stored procedures. The execution plan always returned an estimated row of “1” when processing for the current day. I won’t go into details on why this one specific stored procedure didn’t use the older cached plans as expected. I will however tell you like most things with SQL Server there is more than one way to solve a problem Smile.

This method is something I have personally wanted to blog because it’s something I have only used a handful of times when I just couldn’t get the execution plan to work the way I wanted it to. Note that using this hint we are forcing the SQL Server Optimizer to use the statistics for the specific variable value that we provide. However if the table was to grow significantly in the future we may be hurting performance by forcing a bad execution plan and that is a drawback to using this hint, so now you know!

Take a look at the two screenshots below. The first is the estimated rows from the Fact Internet Sales table and the second is the estimated execution plan.

image

image

What I actually want to see for this execution plan is HASH MATCH. This will perform significantly better for the number of records that I will have. Unfortunately due to out of date statistics I’m getting a bad plan.

So let’s note two things.

  1. First, in most situations the best solution here is to simply update statistics. This should be part of ANY database maintenance plan.
  2. Second, The example I am using here is not great. I am simply forcing the plans to do what I want for demo purposes.

Let’s take a look at the original query:

DECLARE @ShipDate DATE = '1/1/2008'

SELECT 
       [EnglishProductName] AS Product
      ,[SalesOrderNumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM 
    [dbo].[FactInternetSales_Backup] FIS
  JOIN
    [dbo].[DimProduct] DP
  ON
    DP.ProductKey = FIS.ProductKey
  WHERE ShipDate > @ShipDate

Now we are going to modify this query quickly to use the Optimize For hint. This hint is going to allow us to optimize our Execution Plan in SQL Server using the specified parameter. In my instance this is going to be a previous date where I know the statistics are reflective of what I want to see in my execution plan.

Here is the modified query:

DECLARE @ShipDate DATE = '1/1/2008'

SELECT 
       [EnglishProductName] AS Product
      ,[SalesOrderNumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM 
    [dbo].[FactInternetSales_Backup] FIS
  JOIN
    [dbo].[DimProduct] DP
  ON
    DP.ProductKey = FIS.ProductKey
  WHERE ShipDate > @ShipDate

  OPTION (OPTIMIZE FOR (@ShipDate = '1/1/2005'))
GO

In this query the result set returned will still be for the original value of the variable “1/1/2008’. However the SQL Server optimizer is going to generate the plan using the OPTIMIZE FOR hint that we provided. (Highlighted in Yellow).

Now let’s take a look at our new Estimated Execution plan:

image

This time we are getting a Hash Match which is much more applicable for our table and the number of records that will be queried.

As always, Thanks Smile

The transaction log for database is full due to ‘REPLICATION’. “Replication not enabled.” CDC

Welcome to Monday Morning Madness. What do you do when your Transaction Log runs out of space? Let me share my most recent experience.

My transaction log had a hard limit of 2TB in size. There are different opinions out there among the DBA elite on whether you should have a limit on the growth of your transaction log or if you should leave it as unlimited. As a BI Developer I do not have an opinion here, just note that I am happy that in this instance I had a hard limit.

The Error Message:

The transaction log for database <Database Name> is full due to ‘REPLICATION’.

Oh that’s it?? Easy fix right. Let’s walk through troubleshooting this problem.

  • My database is in SIMPLE recover mode, so why is it waiting to checkpoint?
  • I run the following command to find out the reason the log is waiting to clear.
USE master;
GO
SELECT name, log_reuse_wait_desc, * FROM sys.databases
WHERE name = '<Database Name>';

image

Replication, really? Who turned on REPLICATION!!?? I had no idea replication was turned on. Let’s find out.The following code will show if replication is turned on for any of my databases:

SELECT name, is_published, is_subscribed, is_merge_published, is_distributor
FROM sys.databases
WHERE    is_published = 1 or is_subscribed = 1 or
        is_merge_published = 1 or is_distributor = 1
      Ok so here is my result set from the above query:

image

  • So to recap, the transaction log is not purging itself because of a long open transaction due to replication but replication is not turned on?? DBCC OPENTRAN will show open transactions.

Well after some further research I discovered that my Change Data Capture job had failed. More specifically the CDC_Capture job. We turned on CDC to capture some otherwise hard to capture changes. The way CDC works is it crawls the transaction log and once it has completed with a section of the log that part of the log can then be cleared and reused. In my case CDC job failed and as a result none of the transaction log was being allowed to clear.

Note: By default when you enable CDC it will create two jobs. One that captures the changes and the other that cleans up the tables where the changes are stored. The schedule to run these jobs is configured to ONLY start upon starting of SQL Agent. I would recommend modifying the scheduling of this job.

So finally we discovered the problem. Now how do we fix this?

  1. First try turning on the CDC Capture job. (This doesn’t work because the transaction log is full.)
  2. Try shrinking the database. (Well this won’t work for two reasons. One transaction log is full, secondly because there is no free space to shrink.)
  3. Third you can try changing the max limit on your transaction log. Bet you can’t guess what happens when you do this? This doesn’t work because the transaction log is full.

None of the above situations will work here because they all require at least some space on the transaction log to complete.

  1. Ultimately I ended up creating a new transaction log on my database.
    1. This is why I was glad my original transaction log had a “Hard” limit. If there was no hard limit then I ultimately would have completely run out of space on my disk and I would not have been able to simply create a second transaction log.
  2. Once the new transaction log was in place I then disabled CDC on the database. I disabled it because the client was no longer using it. Alternatively I could have turned on the job and just let it crawl the entire transaction log, all two TB Smile.
  3. Once CDC was disabled I ran the following checkpoint command to checkpoint and clear the transaction log. (See below).
  4. Once the transaction log was cleared I ran DBCC SHRINKFILE (DatabaseName_LOG) to reclaim the empty space.
  5. Finally I deleted the backup transaction log from above!
    1. In general having more than one transaction log can hurt performance. I added the additional transaction log temporarily and once it was no longer needed I removed it.
  6. USE <DatabaseName>;
    GO
    CHECKPOINT;
    GO
    CHECKPOINT; -- run twice to ensure file wrap-around
    GO

As Always thanks for looking!

SQL Tips! Generate DDL Scripts with the data from the table. (Installment 2)

Welcome back to this series on SQL Tips. Sometimes I find myself needing to not only generate the DDL for a table but also the data that is stored in that table. For example imagine I have created and populated a few tables for a blog post. It would be much easier if I could share the table definition and the SQL statement to populate that table, Agree? Great!

Scenario

In this scenario I am going to show you how to generate the DDL to create the DimCurrency table from the AdventureWorksDW2012 database. Along with the DDL script I will also show you how you can generate the script to populate the table.

Walkthrough

    1. Open SSMS and connect to the server where your table is located at.
    2. Right click on the database where your table is stored, I’m using AdventureWorksDW2012.
    3. Next click on Tasks
    4. Next click Generate Scripts
    5. image
    6. Inside the Generate Scripts wizard click next on the introduction screen.
    7. For Choose Objects screen –> Click “Select specific database objects”.
    8. Expand the tables list.
    9. Select the DimCurrency Table.
    10. image
    11. On the “Scripting Options screen” I’m just going to load the script into a new query window. Please select “Save to new query window”. This option is found at the bottom of the screen.
    12. On the same screen select the “Advanced” button found on the right side.
    13. Now scroll down to “Types of data to script” and change it from schema only to Schema and Data.
    14. image
    15. Click Ok. Click Next, Click Next. The script will now be created and generated in a new query window. Click Finish once it has completed. Below is a partial screenshot of the completed product.
    16. image

 

As always, thanks for looking!

SQL Tips! Set your default database in SSMS (Installment 1)

In my time as a Business Intelligence consultant and trainer I have picked up some pretty cool tricks and thought I would start blogging on these tips two or three times a week as I catch myself using them.

In this blog I am going to show you how you can set your default database when you connect to a server. I find this to be an extremely helpful little tip. Please see below.

Walkthrough:

    1. Open SSMS and from the object explorer click “Connect”.
    2. From the Connect to Server prompt: Enter your server and Authentication information.
    3. Click “Options”, located at the bottom right.
    4. Next click the “Connection Properties” tab found at the top.
    5. Locate “Connect to Database” and then click the drop down menu.
    6. From the dropdown menu select <Browse> and then select the database that you want to be your default. Click Connect..
    7. Voila! Now every time you connect to that server it will default to that database instead of master!

image

As always, thanks for looking!