There are interesting things to be learned from even the simplest queries.
For example, imagine you are asked to write a query that lists AdventureWorks product names, where the product has at least one entry in the transaction history table, but fewer than ten.
This article is for SQL Server developers who have experienced the special kind of frustration that only comes from spending hours trying to convince the query optimizer to generate a parallel execution plan.
This situation often occurs when making an apparently innocuous change to the text of a moderately complex query — a change which somehow manages to turn a parallel plan that executes in ten seconds, into a five-minute serially-executing monster.
I came across a SQL Server optimizer bug recently that made me wonder how on earth I never noticed it before.
As the title of this post suggests, the bug occurs in common JOIN and GROUP BY queries. While it does not cause incorrect results to be returned, it will often cause a poor query plan to be selected by the optimizer.
If you are just interested in the bug itself, you will find a description in the section headed “the bug revealed”. It relates to cardinality estimation for serial partial aggregates.
As the regular reader will be expecting though, I am going to work up to it with a bit of background. The lasting value of this post (once the bug is fixed) is in the background details anyway.
Marc shows a way to work around the poor performance that often results simply from adding the keyword DISTINCT to an otherwise perfectly reasonable aggregate function in a query.
This post is an update to that work, presenting a query optimizer enhancement in SQL Server 2012 that reduces the need to perform the suggested rewrite manually.
The SQL Server documentation has this to say about page splits:
When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations.
Given that, how can a SELECT statement be responsible for page splits?
Well, I suppose we could SELECT from a function that adds rows to a table variable as part of its internal implementation, but that would clearly be cheating, and no fun at all from a blogging point of view.
One of the things I highlighted in the middle post was the difference between a singleton seek and a range scan:
A singleton equality seek always retrieves exactly one row, and is guaranteed to do so because a unique index exists to enforce it.
A range scan seeks down the B-tree to a starting (or ending) point, and scans forward (or backward) from that point using the next or previous page pointers.
Today’s short post shows how much faster a singleton seek is, compared with a range scan, even when both return exactly the same number of records.