In this post, I show you how to determine the statistics objects used by the query optimizer in producing an execution plan.
Note: This technique only applies to queries compiled using the original (70) cardinality estimation model.
In this post, I show you how to determine the statistics objects used by the query optimizer in producing an execution plan.
Note: This technique only applies to queries compiled using the original (70) cardinality estimation model.
The SQL Server documentation has this to say about page splits:
When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.
Given that, how can a SELECT
statement be responsible for page splits?
Well, I suppose we could SELECT
from a function that adds rows to a table variable as part of its internal implementation, but that would clearly be cheating, and no fun at all from a blogging point of view.
The following table summarizes the results from my last two articles, Enforcing Uniqueness for Performance and Avoiding Uniqueness for Performance. It shows the CPU time used when performing 5 million clustered index seeks into a unique or non-unique index:
In test 1, making the clustered index unique improved performance by around 40%.
In test 2, making the same change reduced performance by around 70% (on 64-bit systems – more on that later).
In my last post, Enforcing Uniqueness for Performance, I showed how using a unique index could speed up equality seeks by around 40%.
A little while back, I posted a short series on seeks and scans:
One of the things I highlighted in the middle post was the difference between a singleton seek and a range scan:
A singleton equality seek always retrieves exactly one row, and is guaranteed to do so because a unique index exists to enforce it.
A range scan seeks down the B-tree to a starting (or ending) point, and scans forward (or backward) from that point using the next or previous page pointers.
Today’s short post shows how much faster a singleton seek is, compared with a range scan, even when both return exactly the same number of records.
You probably already know that it’s important to be aware of data types when writing queries, and that implicit conversions between types can lead to poor query performance.
Some people have gone so far as to write scripts to search the plan cache for CONVERT_IMPLICIT
elements, and others routinely inspect plans for that type of thing when tuning.
Now, that’s all good, as far as it goes. It may surprise you to learn that not all implicit conversions are visible in query plans, and there are other important factors to consider too.
Can a parallel query use less CPU than the same serial query, while executing faster?
The answer is yes! To demonstrate, I’ll use the following two (heap) tables, each containing a single column typed as integer
:
As usual, here’s a sample table:
CREATE TABLE #Example
(
pk numeric IDENTITY PRIMARY KEY NONCLUSTERED,
col1 sql_variant NULL,
col2 sql_variant NULL,
thing sql_variant NOT NULL,
);
Some sample data:
And an index that will be useful shortly:
CREATE INDEX nc1
ON #Example
(col1, col2, thing);
There’s a complete script to create the table and add the data at the end of this post. There’s nothing special about the table or the data (except that I wanted to have some fun with values and data types).
The diagram below shows two data sets, with differences highlighted:
To find changed rows using T-SQL, we might write a query like this:
The logic is clear: Join rows from the two sets together on the primary key column, and return rows where a change has occurred in one or more data columns.
Unfortunately, this query only finds one of the expected four rows:
The problem is that our query does not correctly handle NULLs.
You might have noticed that January was a quiet blogging month for me.
Part of the reason was that I was working on an article for Simple Talk, looking at how parallel query execution really works. The first part is published today at:
Understanding and Using Parallelism in SQL Server.
This introductory piece is not quite as technical as normal, but I hope there be enough interesting material there to make it worth a read.
© Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi