This article describes the structure of a sql_handle
and shows how the batch text hash component is calculated.
Sunday, 11 October 2020
sql_handle and the SQL Server batch text hash
Tuesday, 4 August 2020
SQL Server 2019 Aggregate Splitting
The SQL Server 2019 query optimizer has a new trick available to improve the performance of large aggregations. The new exploration abilities are encoded in two new closely-related optimizer rules:
GbAggSplitToRanges
SelOnGbAggSplitToRanges
The extended event query_optimizer_batch_mode_agg_split
is provided to track when this new optimization is considered. The description of this event is:
Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.
Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.
Sunday, 26 July 2020
A bug with Halloween Protection and the OUTPUT Clause
Background
The OUTPUT
clause can be used to return results from an INSERT
, UPDATE
, DELETE
, or MERGE
statement. The data can be returned to the client, inserted to a table, or both.
There are two ways to add OUTPUT
data to a table:
- Using
OUTPUT INTO
- With an outer
INSERT
statement.
For example:
-- Test table
DECLARE @Target table
(
id integer IDENTITY (1, 1) NOT NULL,
c1 integer NULL
);
-- Holds rows from the OUTPUT clause
DECLARE @Output table
(
id integer NOT NULL,
c1 integer NULL
);
Sunday, 5 July 2020
How MAXDOP Really Works
A few days ago I ran a Twitter poll:
The most popular answer gets highlighted by Twitter at the end of the poll, but as with many things on social media, that doesn’t mean it is correct:
Sunday, 31 May 2020
Pulling Group By Above a Join
One of the transformations available to the SQL Server query optimizer is pulling a logical Group By (and any associated aggregates) above a Join.
Visually, this means transforming a tree of logical operations from:
…to this:
The above diagrams are logical representations. They need to be implemented as physical operators to appear in an execution plan. The options are:
- Group By
- Hash Match Aggregate
- Stream Aggregate
- Distinct Sort
- Join
- Nested Loops Join
- Nested Loops Apply
- Hash Match Join
- Merge Join
When the optimizer moves a Group By above a Join it has to preserve the semantics. The new sequence of operations must be guaranteed to return the same results as the original in all possible circumstances.
One cannot just pick up a Group By and arbitrarily move it around the query tree without risking incorrect results.
Saturday, 31 August 2013
Nested Loops Prefetching
Nested loops join query plans can be a lot more interesting (and complicated) than is commonly realized.
One query plan area I get asked about a lot is prefetching. It is not documented in full detail anywhere, so this seems like a good topic to address in a blog post.
The examples used in this article are based on questions asked by Adam Machanic.
Wednesday, 24 July 2013
Two Partitioning Peculiarities
Table partitioning in SQL Server is essentially a way of making multiple physical tables (row sets) look like a single table. This abstraction is performed entirely by the query processor, a design that makes things simpler for users, but which makes complex demands of the query optimizer.
This post looks at two examples which exceed the optimizer’s abilities in SQL Server 2008 onward.
Monday, 8 July 2013
Working Around Missed Optimizations
In my last post, we saw how a query featuring a scalar aggregate could be transformed by the optimizer to a more efficient form. As a reminder, here’s the schema again:
Friday, 8 March 2013
Execution Plan Analysis: The Mystery Work Table
I love SQL Server execution plans. It is often easy to spot the cause of a performance problem just by looking at one closely. That task is considerably easier if the plan includes run-time information (a so-called ‘actual’ execution plan), but even a compiled plan can be very useful.
Nevertheless, there are still times when the execution plan does not tell the whole story, and we need to think more deeply about query execution to really understand a problem. This post looks at one such example, based on a question I answered.
Wednesday, 6 February 2013
Incorrect Results with Indexed Views
If you use MERGE
, indexed views and foreign keys, your queries might return incorrect results. Microsoft have released a fix for incorrect results returned when querying an indexed view. The problem applies to:
- SQL Server 2012
- SQL Server 2008 R2
- SQL Server 2008
The Knowledge Base article does not go into detail, or provide a reproduction script, but this blog post does.
Saturday, 26 January 2013
Optimizing T-SQL queries that change data
Most tuning efforts for data-changing operations concentrate on the SELECT
side of the query plan. Sometimes people will also look at storage engine considerations (like locking or transaction log throughput) that can have dramatic effects. A number of common practices have emerged, such as avoiding large numbers of row locks and lock escalation, splitting large changes into smaller batches of a few thousand rows, and combining a number of small changes into a single transaction in order to optimize log flushes.
This is all good, but what about the data-changing side of the query plan — the INSERT
, UPDATE
, DELETE
, or MERGE
operation itself — are there any query processor considerations we should take into account? The short answer is yes.
The query optimizer considers different plan options for the write-side of an execution plan, though there isn’t a huge amount of T-SQL language support that allows us to affect these choices directly. Nevertheless, there are things to be aware of, and things we can look to change.
Friday, 31 August 2012
Deletes that Split Pages and Forwarded Ghosts
Can DELETE
operations cause pages to split?
Yes. It sounds counter-intuitive on the face of it. Deleting rows frees up space on a page, and page splitting occurs when a page needs additional space. Nevertheless, there are circumstances when deleting rows causes them to expand before they can be deleted.
Friday, 17 August 2012
Temporary Table Caching Explained
SQL Server (since 2005) caches temporary tables and table variables referenced in stored procedures for reuse, reducing contention on tempdb allocation structures and catalogue tables.
A number of things can prevent this caching (none of which are allowed when working with table variables):
- Named constraints (bad idea anyway, since concurrent executions can cause a name collision)
- DDL after creation (though what is considered DDL is interesting)
- Creation using dynamic SQL
- Table created in a different scope
- Procedure executed using
WITH RECOMPILE
Temporary objects are often created and destroyed at a high rate in production systems, so caching can be an important optimization.
Wednesday, 15 August 2012
Temporary Table Caching in Stored Procedures
Introduction
Ask anyone what the primary advantage of temporary tables over table variables is, and the chances are they will say that temporary tables support statistics and table variables do not.
This is true, of course. The indexes that enforce PRIMARY KEY
and UNIQUE
constraints on table variables do not have populated statistics associated with them. Neither do any non-constraint table variable indexes (using inline index definitions, available starting with SQL Server 2014). Finally, it is not possible to manually create statistics on table variables.
Intuitively, then, any query that has alternative execution plans to choose from ought to benefit from using a temporary table rather than a table variable. This is also true, up to a point.
Tuesday, 1 May 2012
Query Optimizer Deep Dive - Part 4
This is the final part in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 1 Part 2 Part 3
Beating the Optimizer
Our AdventureWorks test query produces an optimized physical execution plan that is quite different from the logical form of the query.
The estimated cost of the execution plan shown below is 0.0295 units.
Since we know the database schema very well, we might wonder why the optimizer did not choose to use the unique nonclustered index on Name
in the Product
table to filter rows based on the LIKE
predicate.
Sunday, 29 April 2012
Query Optimizer Deep Dive – Part 3
This is the third in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 1 Part 2 Part 4
Storage of Alternative Plans
We saw in part 2 how optimizer rules are used to explore logical alternatives for parts of the query tree, and how implementation rules are used to find physical operations to perform each logical steps.
To keep track of all these options, the cost-based part of the SQL Server query optimizer uses a structure called the Memo. This structure is part of the Cascades general optimization framework developed by Goetz Graefe.
Saturday, 28 April 2012
Query Optimizer Deep Dive – Part 2
This is the second in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 1 Part 3 Part 4
Cost-Based Optimization Overview
The input to cost-based optimization is a tree of logical operations produced by the previous optimization stages discussed in part one.
Cost-based optimization takes this logical tree, explores logical alternatives (different logical tree shapes that will always produce the same results), generates physical implementations, assigns an estimated cost to each, and finally chooses the cheapest physical option overall.
The goal of cost-based optimization is not to find the best possible physical execution plan by exploring every possible alternative. Rather, the goal is to find a good plan quickly.
Query Optimizer Deep Dive - Part 1
This is the first in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.
Links to other parts of this series: Part 2 Part 3 Part 4
Introduction
The motivation behind writing these sessions is finding that relatively few people have a good intuition for the way the optimizer works. This is partly because the official documentation is rather sparse, and partly because what information is available is dispersed across many books and blog posts.
The content presented here is very much geared to my preferred way of learning. It shows the concepts in what seems to me to be a reasonably logical sequence, and then provides tools to enable the interested reader to explore further, if desired.
Friday, 23 December 2011
Forcing a Parallel Query Execution Plan
This article is for SQL Server developers who have experienced the special kind of frustration that only comes from spending hours trying to convince the query optimizer to generate a parallel execution plan.
This situation often occurs when making an apparently innocuous change to the text of a moderately complex query — a change which somehow manages to turn a parallel plan that executes in ten seconds, into a five-minute serially-executing monster.
Wednesday, 21 September 2011
Finding the Statistics Used to Compile an Execution Plan
In this post, I show you how to determine the statistics objects used by the query optimizer in producing an execution plan.
Note: This technique only applies to queries compiled using the original (70) cardinality estimation model.