About This Blog

SQL Server internals information including my content from SQLBlog.com and SQLPerformance.com.
Showing posts with label Query Optimizer. Show all posts
Showing posts with label Query Optimizer. Show all posts

Tuesday, 4 August 2020

SQL Server 2019 Aggregate Splitting

SQL Server 2019 Aggregate Splitting

The SQL Server 2019 query optimizer has a new trick available to improve the performance of large aggregations. The new exploration abilities are encoded in two new closely-related optimizer rules:

  • GbAggSplitToRanges
  • SelOnGbAggSplitToRanges

The extended event query_optimizer_batch_mode_agg_split is provided to track when this new optimization is considered. The description of this event is:

Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.

Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.

Sunday, 26 July 2020

A bug with Halloween Protection and the OUTPUT Clause

A bug with Halloween Protection and the OUTPUT Clause

Background

The OUTPUT clause can be used to return results from an INSERT, UPDATE, DELETE, or MERGE statement. The data can be returned to the client, inserted to a table, or both.

There are two ways to add OUTPUT data to a table:

  1. Using OUTPUT INTO
  2. With an outer INSERT statement.

For example:

-- Test table
DECLARE @Target table
(
    id integer IDENTITY (1, 1) NOT NULL, 
    c1 integer NULL
);

-- Holds rows from the OUTPUT clause
DECLARE @Output table 
(
    id integer NOT NULL, 
    c1 integer NULL
);

Sunday, 31 May 2020

Pulling Group By Above a Join

Pulling Group By Above a Join

One of the transformations available to the SQL Server query optimizer is pulling a logical Group By (and any associated aggregates) above a Join.

Visually, this means transforming a tree of logical operations from:

Group By Below Join

…to this:

Group By Above Join

The above diagrams are logical representations. They need to be implemented as physical operators to appear in an execution plan. The options are:

  • Group By
    • Hash Match Aggregate
    • Stream Aggregate
    • Distinct Sort
  • Join
    • Nested Loops Join
    • Nested Loops Apply
    • Hash Match Join
    • Merge Join

When the optimizer moves a Group By above a Join it has to preserve the semantics. The new sequence of operations must be guaranteed to return the same results as the original in all possible circumstances.

One cannot just pick up a Group By and arbitrarily move it around the query tree without risking incorrect results.

Sunday, 9 June 2019

Apply versus Nested Loops Join

Apply versus Nested Loops Join

SQL is a declarative language. We use SQL to write a logical query specification that defines the results we want. For example, we might write a query using either APPLY or JOIN that logically describes exactly the same results.

It is up to the query optimizer to find an efficient physical implementation of that logical requirement. SQL Server is free to choose any plan it likes, so long as the results are guaranteed to be the same as specified in the original SQL.

The optimizer is capable of transforming an apply to a join and vice versa. It generally tries to rewrite apply to join during initial compilation to maximize the searchable plan space during cost-based optimization. Having transformed an apply to a join early on, it may also consider a transformation back to an apply shape later on to assess the merits of e.g. an index loops join.

Tuesday, 15 April 2014

Cardinality Estimation for Disjunctive (OR) Predicates in SQL Server 2014 Onward

Cardinality Estimation for Disjunctive Predicates in SQL Server 2014

Introduction

Back in January 2014, I wrote an article called Cardinality Estimation for Multiple Predicates that described the cardinality estimation process for queries with multiple predicates, from the point of view of the old and new cardinality estimators.

The article describes the various behaviours and formulas involved, along with the usual sprinkling of documented and undocumented trace flags. I described the formula SQL Server 2014 uses to calculate a cardinality estimate for multiple predicates connected with AND (conjunctive predicates), which was already relatively well-known.

Despite some fairly energetic research, and basic-to-intermediate skills with Excel, I was unable to deduce a similar formula for the disjunctive case, where predicates are connected by OR. The trace flag output I describe in the other article clearly showed that exponential backoff was used in the new 2014 cardinality estimator, but the precise formula eluded me.

Wednesday, 21 August 2013

Incorrect Results Caused By Adding an Index

Incorrect Results Caused By Adding an Index

Say you have the following two tables, one partitioned and one not:

CREATE PARTITION FUNCTION PF (integer)
AS RANGE RIGHT
FOR VALUES (1000, 2000, 3000, 4000, 5000);

CREATE PARTITION SCHEME PS
AS PARTITION PF
ALL TO ([PRIMARY]);

-- Partitioned
CREATE TABLE dbo.T1
(
    T1ID    integer NOT NULL,
    SomeID  integer NOT NULL,

    CONSTRAINT [PK dbo.T1 T1ID]
        PRIMARY KEY CLUSTERED (T1ID)
        ON PS (T1ID)
);

-- Not partitioned
CREATE TABLE dbo.T2
(
    T2ID    integer IDENTITY (1,1) NOT NULL,
    T1ID    integer NOT NULL,

    CONSTRAINT [PK dbo.T2 T2ID]
        PRIMARY KEY CLUSTERED (T2ID)
        ON [PRIMARY]
);

Wednesday, 24 July 2013

Two Partitioning Peculiarities

Two Partitioning Peculiarities

Table partitioning in SQL Server is essentially a way of making multiple physical tables (row sets) look like a single table. This abstraction is performed entirely by the query processor, a design that makes things simpler for users, but which makes complex demands of the query optimizer.

This post looks at two examples which exceed the optimizer’s abilities in SQL Server 2008 onward.

Thursday, 18 July 2013

Aggregates and Partitioning

Aggregates and Partitioning

The changes in the internal representation of partitioned tables between SQL Server 2005 and SQL Server 2008 resulted in improved query plans and performance in the majority of cases (especially when parallel execution is involved).

Unfortunately, the same changes caused some things that worked well in SQL Server 2005 to suddenly not work so well in SQL Server 2008 and later.

This post looks at a one example where the SQL Server 2005 query optimizer produced a superior execution plan compared with later versions.

Monday, 8 July 2013

Working Around Missed Optimizations

Working Around Missed Optimizations

In my last post, we saw how a query featuring a scalar aggregate could be transformed by the optimizer to a more efficient form. As a reminder, here’s the schema again:

Wednesday, 26 June 2013

Optimization Phases and Missed Opportunities

Optimization Phases and Missed Opportunities

There are two complementary skills that are very useful in query tuning. One is the ability to read and interpret execution plans. The second is knowing a bit about how the query optimizer works to translate SQL text into an execution plan.

Putting the two things together can help us spot times when an expected optimization was not applied, resulting in an execution plan that is not as efficient as it could be.

The lack of documentation around exactly which optimizations SQL Server can apply (and in what circumstances) means that a lot of this comes down to experience, however.

Tuesday, 11 June 2013

Hello Operator, My Switch Is Bored

Hello Operator, My Switch Is Bored

This post is in two parts. The first part looks at the Switch execution plan operator. The second part is about an invisible plan operator and cardinality estimates on filtered indexes.

Thursday, 4 April 2013

Optimizer Limitations with Filtered Indexes

Optimizer Limitations with Filtered Indexes

One of the filtered index use cases mentioned in the product documentation concerns a column that contains mostly NULL values. The idea is to create a filtered index that excludes the NULLs, resulting in a smaller nonclustered index that requires less maintenance than the equivalent unfiltered index.

Another popular use of filtered indexes is to filter NULLs from a UNIQUE index, giving the behaviour users of other database engines might expect from a default UNIQUE index or constraint: Uniqueness enforced only for non-NULL values.

Unfortunately, the query optimizer has limitations where filtered indexes are concerned. This post looks at a couple of less well-known examples.

Wednesday, 20 March 2013

The Problem with Window Functions and Views

The Problem with Window Functions and Views

Introduction

Since their introduction in SQL Server 2005, window functions like ROW_NUMBER and RANK have proven to be extremely useful in solving a wide variety of common T-SQL problems. In an attempt to generalize such solutions, database designers often look to incorporate them into views to promote code encapsulation and reuse.

Unfortunately, a limitation in the SQL Server query optimizer often means that views1 containing window functions do not perform as well as expected. This post works through an illustrative example of the problem, details the reasons, and provides a number of workarounds.

Wednesday, 5 September 2012

Compute Scalars, Expressions and Execution Plan Performance

Compute Scalars, Expressions and Execution Plan Performance

The humble Compute Scalar is one of the least well-understood of the execution plan operators, and usually the last place people look for query performance problems. It often appears in execution plans with a very low (or even zero) cost, which goes some way to explaining why people ignore it.

Compute Scalar

Some readers will already know that a Compute Scalar can contain a call to a user-defined function, and that any T-SQL function with a BEGIN…END block in its definition can have truly disastrous consequences for performance (see When is a SQL function not a function? by Rob Farley for details).

This post is not about those sorts of concerns.

Tuesday, 1 May 2012

Query Optimizer Deep Dive - Part 4

Query Optimizer Deep Dive - Part 4

This is the final part in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.

Links to other parts of this series: Part 1 Part 2 Part 3

Beating the Optimizer

Our AdventureWorks test query produces an optimized physical execution plan that is quite different from the logical form of the query.

The estimated cost of the execution plan shown below is 0.0295 units.

Optimizer plan

Since we know the database schema very well, we might wonder why the optimizer did not choose to use the unique nonclustered index on Name in the Product table to filter rows based on the LIKE predicate.

Sunday, 29 April 2012

Query Optimizer Deep Dive – Part 3

Query Optimizer Deep Dive – Part 3

This is the third in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.

Links to other parts of this series: Part 1 Part 2 Part 4

Storage of Alternative Plans

We saw in part 2 how optimizer rules are used to explore logical alternatives for parts of the query tree, and how implementation rules are used to find physical operations to perform each logical steps.

To keep track of all these options, the cost-based part of the SQL Server query optimizer uses a structure called the Memo. This structure is part of the Cascades general optimization framework developed by Goetz Graefe.

Saturday, 28 April 2012

Query Optimizer Deep Dive – Part 2

Query Optimizer Deep Dive – Part 2

This is the second in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.

Links to other parts of this series: Part 1 Part 3 Part 4

Cost-Based Optimization Overview

The input to cost-based optimization is a tree of logical operations produced by the previous optimization stages discussed in part one.

Cost-based optimization takes this logical tree, explores logical alternatives (different logical tree shapes that will always produce the same results), generates physical implementations, assigns an estimated cost to each, and finally chooses the cheapest physical option overall.

The goal of cost-based optimization is not to find the best possible physical execution plan by exploring every possible alternative. Rather, the goal is to find a good plan quickly.

Query Optimizer Deep Dive - Part 1

Query Optimizer Deep Dive - Part 1

This is the first in a series of posts based on the content of the Query Optimizer Deep Dive presentations I have given over the last month or so at the Auckland SQL Users’ Group, and SQL Saturday events in Wellington, New Zealand and Adelaide, Australia.

Links to other parts of this series: Part 2 Part 3 Part 4

Introduction

The motivation behind writing these sessions is finding that relatively few people have a good intuition for the way the optimizer works. This is partly because the official documentation is rather sparse, and partly because what information is available is dispersed across many books and blog posts.

The content presented here is very much geared to my preferred way of learning. It shows the concepts in what seems to me to be a reasonably logical sequence, and then provides tools to enable the interested reader to explore further, if desired.

Monday, 12 March 2012

Fun with Scalar and Vector Aggregates

Fun with Scalar and Vector Aggregates

There are interesting things to be learned from even the simplest queries.

For example, imagine you are asked to write a query that lists AdventureWorks product names, where the product has at least one entry in the transaction history table, but fewer than ten.

Wednesday, 18 January 2012

Dynamic Seeks and Hidden Implicit Conversions

Dynamic Seeks and Hidden Implicit Conversions

A LIKE predicate with only a trailing wildcard can usually use an index seek, as the following AdventureWorks sample database query shows:

SELECT 
    P.[Name]
FROM Production.Product AS P
WHERE 
    P.[Name] LIKE N'D%';