Power BI–Problem, Design, Solution (Text Search)

 

There is a video that walks you through the solution in depth, please find the video at the end of this post along with the completed solution.

In this installment of the Problem, Design, Solution series we are going to show you how to perform a text search using slicers in Power BI, this simulates a “LIKE” type search. In the following screenshot you can see that when “Tax” is selected all records in the table that have “Tax” anywhere in the record are returned, likewise whenever “IT” is selected from the slicer all records in the table that have IT in them are returned. Hope you enjoy this post!

Solution_LikeSearch

Problem

Normally slicers (visual filters) only filter records that are an exact match, for example, if you select “IT” from the slicer then you would only expect to see records that have IT and only IT in the record, if it does not match IT exactly then those records are not returned. This is the intended behavior of a slicer and in general, this is exactly what we want. However, there are times when you might want to do a “LIKE” type search similar to the like function in T-SQL. In this scenario, we had a client who needed to do this exact thing. The screenshot below is our simulated problem for this example.

In order to bring back all records that contain “IT” you are required to click each occurrence of “IT” in your slicer, of course, this isn’t a big problem if you have a small number of possible combinations like our sample size. In larger environments, this may be an unreasonable expectation!

Problem_LikeSearch

Design

Now that we have discussed both the problem and what we want the ultimate solution to look like, it is time to lay out the design. The design includes a four step process.

Let’s break this down a little bit:

  • The first step in the process is to identify all the unique values that we need to filter on and put those values into their own distinct table, in this case, we are looking at departments (HR, IT, FIN, Audit, and Tax).
  • The second step is to create a measure that uses the distinct departments in our new table as a filter. We will call this measure IsFiltered.
  • The third step is to create a new slicer using the values listed in our disconnected table.
  • The final step is to tie our new measure (IsFiltered) to the specific visual that we want to be affected by our disconnected table of distinct values.

Step 1 – Loading unique values in a distinct table

I would recommend storing this list of values in a file or in a table in a database, for this example I will quickly put these values into a table in my PBIX file. Navigate to the home ribbon and select “Enter Data”, I will name the column Departments and the table Slicer, next I will add the five distinct departments as records for the departments’ column (IT, HR, Tax, FIN, Audit).

Note: This is using the disconnected table design, we are not going to set up a relationship between this new table and any other table in our model.

Step 1_LikeSearch

Step 2 – Creating the IsFiltered measure

This IsFiltered measure is going to give us the “LIKE” type search that we are looking for! Excited yet?? Well, you should be! In this example I am going to use the FIND function, the find function is exactly like the SEARCH function with the key difference being that the FIND function is case-sensitive and the SEARCH function is case-insensitive. The code below will complete step 2 in our design.

IsFiltered =
IF(
SUMX(‘Slicer’,
FIND(
‘Slicer'[Departments],
MAX(‘Department Goals'[Affected Departments]),,0)) > 0,
True“,
False
)

What is this measure doing? The FIND function returns the starting position of the text string we pass in, if the value is greater than 0 then that means that the text exists and this is how we generate a LIKE type search. If the text is not found then nothing is returned and we give that record a default value of 0.

Step 3 – Create a slicer from the disconnected table

Now we can create a slicer from our distinct list of values. At this point the slicer does not filter anything, this is because the filtering occurs through the measure we created in step 2 and needs to be added as a filter to a visual or report in our Power BI Report.

image

Step 4 – Use the IsFiltered measure to filter a visual in the report

The last step in this process is about as easy as step 3, first we need to figure out which visuals we want to be filtered with the “LIKE” type operation and then we need to use the IsFiltered measure as a filter on those visuals. The criteria here will be to only show the records where the result is “True”. Remember the DAX calculation from above? If a value was found, then a number higher than 0 is returned and we flag the record as “True”!

Finally, I am going to add the IsFiltered measure as a filter to my table visualization, this is done in the Fields section of the visualizations pane. See the below image:

Step 2_LikeSearch

 

Solution

With the 4 step design completed, the solution is also complete! Now we can apply a filter that does a text search rather than an exact match, see the image below:

Solution_LikeSearch

Summary

By leveraging the DAX expression language we are able to override the default behavior of slicers and filters in Power BI. Please note that this solution would also work in an SSAS Tabular model, the implementation would be slightly altered as you may be consuming that model from excel rather than Power BI but it would work nonetheless!

Resources

Video of the solution:

Completed Example:

http://tinyurl.com/yd2qon7x

Advertisements

SSIS Performance Tuning

This is my much overdue follow-up blog from the presentation I gave to the New York SQL Server User Group. In this post, I am going to provide some additional resources to supplement the presentation, check out the following blog post on different performance tuning techniques that can be used for SSIS.

SSIS Performance Tuning the For Each Loop. Retain Same Connection.

https://mitchellpearson.com/2015/01/12/ssis-performance-tuning-the-for-each-loop-retain-same-connection/

Use File Cache and the Cache Transform to help mitigate network issues.

https://mitchellpearson.com/2014/06/30/use-file-cache-and-the-cache-transform-to-help-mitigate-network-issues/

Create efficient Range Lookups with the Task Factory Advanced Lookup Transform

https://mitchellpearson.com/2014/06/27/create-efficient-range-lookups-with-the-task-factory-advanced-lookup-transform/

Load thousands of files in SSIS lighting fast! MultiFlatFile Connection Manager.

https://mitchellpearson.com/2014/06/27/load-thousands-of-files-in-ssis-lighting-fast-multiflatfile-connection-manager/

Why are Asynchronous components so bad in SSIS?

http://tinyurl.com/ycnvr5ef

Using cache transforms in SSIS to improve Performance (Part 1)

https://mitchellpearson.com/2014/06/20/using-cache-transforms-in-ssis-to-improve-performance-part-1/

Use the aggregate transform in SSIS to out perform T-SQL!

https://mitchellpearson.com/2014/06/22/use-the-aggregate-transform-in-ssis-to-improve-performance-of-t-sql/

Advanced DAX: Problem, Design, Solution

Thanks for watching my latest webinar on DAX. If you missed the webinar you can watch the recording here:

http://pragmaticworks.com/Training/Details/Advanced-DAX-Problem-Design-Solution

I didn’t have a lot of questions from this webinar but I will answer a the ones I did have.

Question: What is the difference between the Find function and the Search function.

Answer: The FIND function is case-sensitive and the search function is case-insensitive. In the webinar I used the FIND function in my first demo but really this could have been replaced with the SEARCH function since I was not concerned with case sensitivity.

Question: Where did you find the School Grade demo that you used in your webinar?

Answer: The data I used in the webinar can be found at the link below:

http://schoolgrades.fldoe.org/reports/#2014

Question: Might there be an upcoming webinar regarding DAX as it pertains to SSAS cubes rather than Power BI?

Answer: I am not aware of any webinars upcoming that will be specific to Tabular Cubes. It is important to point out that the examples used in this webinar are also applicable with Tabular SSAS Cubes since it uses the same backend engine as Power BI Desktop (xVelocity). If you implement these solutions in SSAS Tabular, the consumption of the data would be achieved through excel and then the necessary filters and slicers would be applied there instead of the report view provided in Power BI Desktop.

Thanks again!

Advanced DAX Webinar

Hey Everyone,

Thanks for attending the free 1 hour DAX Webinar by Pragmatic Works. In this blog I am going to focus on the primary question that I saw come up over and over again in the chat logs. The webinar can be found here:

  • How come the totals for Forecast and Dynamic are the same?
  • The Grand Total is different as well.
  • Are we sure all those totals are right? for The MLS data?

The totals that are generated in DAX are usually an area that cause confusion, especially if you are still learning about filter context. Totals do NOT simply add up the values that appear in the rows on the report. This is confusing because most of the time it looks like this is exactly what is occurring in the background.

If the Totals cell in your table does not sum the rows then how does it come up with a total?? Great question! The total cell also executes the DAX formula defined in your measure. Let’s start with a visual, one from the webinar that prompted the questions leading to this post. Notice in this picture you can clearly see that the “TOTAL” is not the sum of the rows. The values represented in the total row look completely wrong, however they are not.

SNAGHTML1a38ee9e

Remember this question? How come the totals for Forecast and Dynamic are the same? Let’s try to answer this question now. We can begin with Forecast YTD Sales.

Forecast YTD Sales is a simple calculation in the model. It is simply Prior YTD Sales * 2. In the screenshot below I have added the calculated measure Prior YTD Sales to the table so we can visualize exactly how the total cell is being calculated.

The total value is $19,582,120 which is $9,791,060 * 2.

image

Remember that in our original table there was only data up until June of 2008. Therefore, why is the total row taking Prior YTD Sales from December? Why is the total calculation taking the value of $9,791,060 from the Prior YTD Sales measure instead of $3,037,501 (June)? The table visual only displayed data up until June but this was because we don’t have any sales past June of 2008. The total row is looking at the entire 2008 year and therefore gets the Prior YTD Sales value as of the last date in 2008 ($9.791,060). Although this may not be the desired behavior it is the correct behavior.

How come the totals for Forecast and Dynamic are the same?

Well all of that is great, but that still doesn’t explain why the Forecast YTD Sales has the same value as our Dynamic Measure at the total row. The Dynamic Measure calculation always takes YTD Sales unless there are no sales on the last day of the current month in which case it will take the Forecast YTD Sales. This was explained in the webinar so please revisit that if you don’t remember the why behind this calculation.

IF(
    LastSaleDate <> LastDayOfMonth,
    [Forecast YTD Sales],
    [YTD Sales])

At the total row the calculation would read like this:

IF(
    “6/20/2008” <> “12/31/2008”,
    [Forecast YTD Sales],
    [YTD Sales])

Based on this interpretation of the measure we can see that the last day a sale took place in our model does not equal the last date in our model for the current year. That means that the total ROW is going to display the measure [Forecast YTD Sales]. This explains why we see the following values in our total row:

image

Once again understanding the total row is not easy, especially if you are transitioning to DAX from working with excel. If the total row seems incorrect it’s important to validate the calculated measure at the total level.

Thanks!

Advanced TSQL Takeover

Thanks for joining the 3 hour webinar I did for Pragmatic Works on Advanced TSQL. There were, as could be expected, many questions regarding the webinar so in this blog post I will try to answer as many of those questions as I can.

The TSQL Scripts used in the class can be found HERE: Please note that for one of the examples you will need to create and load the MicrosoftStockHistory example and I have provided a script to do just that.

Common Table Expressions

Question: Why does a common table expression begin with a semi-colon?

Answer: The WITH keyword is used for other purposes in TSQL other than just CTEs so to avoid confusion any statements prior to the CTE must be terminated using a semi-colon.

Question: What are the performance differences between CTEs and Temp tables?

Answer: Unfortunately CTEs and Temp Tables have entirely different use case scenarios so this is like comparing apples to oranges. CTEs are logical tables used to simplify code, essentially they are like views. Temp tables on the other hand are physically materialized on disk in the temp db. They also have column statistics and can be indexed for optimizing your TSQL Code. The answer to this question is really much more in depth then what I have provided here but this is the really really short answer.

Question: Is there a performance difference with CTE and Derived tables?

Answer: Rarely will you see a performance difference here, I have always seen identical plans personally when comparing performance. However, I called and talked to the smartest DBA I know, Bradley Ball. There are some situations with self-referencing CTEs where the SQL Server optimizer can generate a bad SQL Plan due to the Cardinality Estimator not being able to read the statistics of the underlying table. This is the same behavior that you would see if you were nesting views on top of views. In those situations a Derived table or a Temp Table would be a better option.

Window Functions: Framing

Question: When using Range for the frame could the duplicates values be eliminated with the Partition By Clause?

Answer: Range can generate unusual results when used to generate running totals. The unusual results occur when the Order By clause within a partition can contain duplicate values therefore duplicating the “running total” for multiple cells. The fix here is to either use the ROWS framing option or to choose a column for the Order BY clause that is unique.

Question: Can you call LAG(LAG(ClosePrice)) to get the close price from 2 days previous?

Answer: The default behavior of lag is to go back one record. The default behavior can be overridden by using the optional parameter [offset] when writing your lag function. For example the code here would look like: LAG(ClosePrice, 2) OVER (ORDER BY DATE)

Question: Is the performance gain of framing window functions only applicable for the aggregate window functions, i.e. not important for the other window functions?

Answer: The performance gain can be achieved also with ranking and analytical window functions. Anytime an order by clause is required we can specify ROWS instead of the default behavior of RANGE. We showed a demo of using ROWS with the Last_Value analytical window function in the webinar.

Question: Will there be a bathroom break?

Answer: I don’t plan on making it three hours without taking a break! Smile

Working with XML in SQL Server

Question: Can you query a XML File?

Answer: Absolutely. Querying an XML file is very easy to do and this is also referred to as Shredding XML. I will try to write a specific blog walking through how to do this in the coming weeks. Please note that I didn’t have enough time to show this in our extended 3 hour presentation but in the Pragmatic Works Advanced TSQL course this is covered.

Question: Can you create element centric XML but include some columns as Attributes?

Answer: This can be done easily by using the XML Path mode. XML Path generates element centric XML by default, this is in contrast to XML Auto and Raw which generates attribute centric XML. With XML Path you can convert any of the columns into attributes by simply giving it an alias name preceded by the @ symbol. For example FirstName as [@FirstName].

On-Demand Training

Question: Can we see a syllabus of the on demand courses and what material is presented in each class/section without signing up for a trial?

Answer: Please visit http://pragmaticworks.com/Training/On-Demand-Training, once there scroll down to the bottom of the page and then click on one of the individual class names to get the class outline and other information.

Question: Does the on-demand training include the ability to ask questions when something isn’t clear?

Answer: Yes, you can send questions to our training director directly. Another great feature of our On-Demand training platform is that it comes with two free hours of virtual mentoring. This is one on one time with a Pragmatic Works consultant.

How to get Line Item Detail information with Totals in T-SQL (Percent of Parent).

Hi All,

This blog post is a follow up to a question I received when I gave my Advanced TSQL Webinar for Pragmatic Works. If you haven’t seen that yet and would like to see it you can view the webinar here: 

Question: How can we get SubTotals in TSQL using the CTE method you mentioned?

In my webinar I showed how to get totals and sub totals in the same result set as line item detailed information (See screenshot below). The method I used involved using the OVER clause and it kept the SQL very clean and easy to read. Unfortunately this method is not the best performing option available and that is because the over clause without framing uses disk. (I have pasted the TSQL example with the over clause at the bottom of this blog post for comparison and reference purposes.)

image

Sub Totals in SQL with CTE method:

First of all can I just preface this by saying I love CTEs? (Common Table Expressions). Let’s jump right in and write some code! For this example I’m going to be using the AdventureWorks2012 database but this should work with all versions of Adventure Works.

Step 1) – Create CTE with total information: 

WITH TOTALS AS
(
    SELECT CustomerID, SUM(TotalDue) AS Total
    FROM Sales.SalesOrderHeader
    GROUP BY CustomerID
)
SELECT * FROM Totals ORDER BY CustomerID

 

Results:

image

Step 2: Create an SQL query with line item detail information.

SELECT CustomerID, SalesOrderID, OrderDate, TotalDue
FROM Sales.SalesOrderHeader

image

Step 3: Join them together!

WITH TOTALS AS
(
    SELECT CustomerID, SUM(TotalDue) AS Total
    FROM Sales.SalesOrderHeader
    GROUP BY CustomerID
)
SELECT soh.CustomerID, SalesOrderID, OrderDate, TotalDue, Total
FROM Sales.SalesOrderHeader soh
JOIN Totals t
    ON t.CustomerID = soh.CustomerID

Final Result:

image

As I mentioned above you can get the same results using the OVER Clause in TSQL. I have pasted the code below for that example:

SELECT
    CustomerID, 
    SalesOrderID, 
    OrderDate, 
    TotalDue, 
    SUM(TotalDue) OVER(Partition By CustomerID) AS CustomerTotal
FROM    
    Sales.SalesOrderHeader

Final Thoughts: This method will generally perform better than simply using the over clause method, but it takes a more code and work. If the over clause function gets the job done and performance is not an issue I would recommend using that method to keep the code simpler and easier to read!

Thanks for looking!

Advanced T-SQL Webinar / Free Training

Thank You!

First of all thank you to everyone who attended my T-SQL presentation on 5/10/2016. We had over 1400 people registered for the training and 871 people joined and listened in, that’s a lot of people! I also received a lot of good and encouraging feedback so thank you for that as well.

Resources

The number one question I received is will the T-SQL scripts be available for download. Of course they will. If you want to download the T-SQL scripts that I used for my presentation you can find those here:

Free Recording:

The full one hour webinar that I did for Pragmatic Works on 5/11/2016 can be found here:

Questions and Answers:

I just received the list of questions from the webinar so I will get this section updated in the next couple days. I wanted to go ahead and post this blog so you could have access to the SQL Scripts.

Question:

Les Said: BTW, best presentation EVER!!!! Very clear and straight to the point in each case. Congrats!!!

Answer:

Thank you!, Best question ever!

Question:Recursive CTEs in SQL

Ken Asked: Should anchor be unique? What happens if there are two rows returned in the Anchor member?

Answer:

Hey Ken, the anchor member does not need to be unique here. In our example we were specifying that the CEO is at the first level or Level 0. If there are multiple members in the Anchor then multiple members would show up at Level 0.

Question: Pivot in SQL

Travis asked: Can you do multiple columns, such as minutes and cost?

Answer:

Travis I believe you are asking about the pivot example that I showed in my webinar. The answer is yes, you can definitely do a double pivot or pivot on multiple columns. I will write a blog on how to do this in the next week, so please check back!

Question: Merge in SQL vs. SSIS Update

Vineet asked: How does the merge statement compare to updates done in SSIS.

Answer:

The only native built in update capability that we have in SSIS is the OLE DB Command so I assume that this question is in regards to comparing the merge with the OLE DB Command. The merge pattern will perform light years better than doing updates in SSIS using the OLE DB Command. The merge pattern I showed in the webinar is a very popular design pattern used for loading data warehouses.

Question: SubTotals with CTEs in SQL Server

Vineet asked: Can you share CTE Examples on your blog.

Answer:

Hi again, you were not the only one to ask this question as I have also received a few emails requesting this example. I will write a blog post with an example on this in the next week, so please check back!

Question: What SQL Books would you recommend?

Answer:

There are a lot of great SQL Books out there. I will stick with one author here. I have three books by Itzik Ben-Gan.

  • T-SQL Fundamentals
  • T-SQL Querying
  • T-SQL Programming