A Day in the Life of Performance Tuning

“Can I have all the things in instantly”?

“Make this slow thing fast”

“I want more, faster!”

Okay, yes, I am heinously boiling down the various questions we get asked for on a daily basis for comic effect, but in reality this is not far off the truth. You are about to get an insight into the journey I took to reduce a query from taking minutes to taking seconds.

Let’s start with the culprit;

DECLARE @startDate DATETIME = '2008-01-01',

        @endDate DATETIME = '2016-01-01'

BEGIN

       SELECT tdd.[Date Key],

              dbo.getDate(tdd.[Date Key])

       FROM [Date Dimension] tdd

       WHERE tdd.[Date] BETWEEN @startDate AND @endDate

END

It’s a simple snippet, it selects some keys from a table and passes one of those keys into a function to get another key between two dates.

It looks very innocent, nothing wrong here. One could ask a broader question about its usage in the scope on entire query but we are just going to take it in isolation here.

It looks innocent until you run it and find out that it takes 36 seconds!

Naturally my eye was drawn to the scalar function as at the moment it’s a black box and could be doing anything. I looked at it and determined that it was doing something reasonable but I could do it quicker.

DECLARE @startDate DATETIME = '2008-01-01',

        @endDate DATETIME = '2016-01-01'

BEGIN

       SELECT tdd.[Date Key],

       STUFF([Date Key], LEN([Date Key]) - 1, 2, '24')

       FROM [Date Dimension] tdd

       WHERE tdd.[Date] BETWEEN @startDate AND @endDate

END

Low and behold this returned its values 163 milliseconds! I went ahead and compared all the data and it matches as I expected it to. I wasn’t satisfied yet as while this is a great improvement, it is not entirely correct from a data point of view because I have assumptions about what is happening and the structure of data. I also needed to understand why this is quicker, without the understanding of why, you leave your code open to bugs and unforeseen interactions.

So next I asked the question; “Is it the scalar function call itself that’s the problem or is it the work the function is doing?” This directed me to create a function with the STUFF code within it. Creatively I called this “getDate2”.

DECLARE @startDate DATETIME = '2008-01-01',

        @endDate DATETIME = '2016-01-01'

BEGIN

       SELECT tdd.[Date Key],

              dbo.getDate2(tdd.[Date Key])

       FROM [Date Dimension] tdd

       WHERE tdd.[Date] BETWEEN @startDate AND @endDate

END

No joy. I had hoped it would be as simple as calling a scalar function multiple times was causing the problem but the execution plan and statistics tell me is isn’t with it executing in a measly 267ms. So we have to dig deeper. Let’s try in-lining the original function call statement with the method.

DECLARE @startDate DATETIME = '2008-01-01',

        @endDate DATETIME = '2016-01-01'

BEGIN

       SELECT tdd.[Date Key],

              (     

              SELECT tdd_day.[Date Key]

              FROM [Date Dimension] tdd_day

              WHERE tdd_day.[Date Key] = tdd.[Date Key]

              ) as Bleh

       FROM [Date Dimension] tdd

       WHERE tdd.[Date] BETWEEN @startDate AND @endDate

END

I was convinced that it would be this. In my mind doing this select for each result should be really slow but SQL server is smarter than I gave it credit for, it has looked ahead and created a plan that means it doesn’t have to do this. With this query returning its data in 163ms.

I was scratching my head at this point as the direct execution of a scalar function wasn’t slow and the direct execution of the work the scalar function was doing wasn’t slow. Maybe the scalar function has to interact with a table in some manner? So I created another function to do the same select as shown above but within a scalar function.

DECLARE @startDate DATETIME = '2008-01-01',

        @endDate DATETIME = '2016-01-01'

BEGIN

       SELECT tdd.[Date Key],

dbo.getDate4(tdd.[Date Key])

       FROM [Date Dimension] tdd

       WHERE tdd.[Date] BETWEEN @startDate AND @endDate

END

(With hindsight I didn’t need to do this because it’s the same as the original, but it was part of the process.)

At last! A result that shows the problem! This query took 22 seconds to return the results. So at this point it appears the SQL server truly treats scalar value functions as black boxes and this shows in the execution plan. The function call is in fact not shown.

It shows the estimated cost as 0 if you check the estimated plan but it is lying. The cost cannot and is not 0 as shown by all the above examples.

So what’s the answer? It’s not exactly good practise to just use the STUFF method. It makes various assumptions about how the data structure will continue to work and completely destroys any reusability we had before. Cue the entry of table valued functions!

Using a simple table valued function SQL can incorporate it into its plan. We can mimic the scalar function with a table valued function very simply by aggregating a result set with only one result in it.

SELECT MAX(tdd_day.[Date Key]) as [Date Key]

FROM   [Date Dimension] tdd_day

WHERE  tdd_day.[Date Key] = @transactionDateKey

We can then use the table valued function in place of our scalar function call;

DECLARE @startDate DATETIME = '2008-01-01',

        @endDate DATETIME = '2016-01-01'

BEGIN

       SELECT tdd.[Date Key], tdd2.[Date Key]

       FROM [Date Dimension] tdd

       CROSS APPLY dbo.GetDateForKey(tdd.[Date Key]) as tdd2

       WHERE tdd.[Date] BETWEEN @startDate AND @endDate

END

Looking at the execution plan we can now have visibility of the previous behaviour and how it’s been considered into the plan and it executes and returns its results in a cool 323ms down from the original 36 seconds, while maintaining a modicum of good coding principles as we are no longer forced to code in assumptions about structure or reusability failings.

The lessons to be learned here is never make assumptions, check everything. Although sometimes, when you don’t know what you don’t know, this can be hard…

Matt Bird, Developer

Leave a Reply