Thursday, 04 April 2013

Optimizer Limitations with Filtered Indexes

One of the filtered index use cases mentioned in the product documentation concerns a column that contains mostly NULL values. The idea is to create a filtered index that excludes the NULLs, resulting in a smaller nonclustered index that requires less maintenance than the equivalent unfiltered index.

Another popular use of filtered indexes is to filter NULLs from a UNIQUE index, giving the behaviour users of other database engines might expect from a default UNIQUE index or constraint: Uniqueness enforced only for non-NULL values.

Unfortunately, the query optimizer has limitations where filtered indexes are concerned. This post looks at a couple of less well-known examples.

Sample Tables

We will use two tables (A & B) that have the same structure:

A surrogate clustered primary key;
A mostly-NULL column that is unique (disregarding NULLs); and
A padding column that represents the all other columns that might be in a real table.

The column of interest is the mostly-NULL one, which I have declared as SPARSE. The sparse option is not required, I just include it because I don’t get much chance to use it. In any case, SPARSE probably makes sense in a lot of scenarios where the column data is expected to be mostly NULL. Feel free to remove the sparse attribute from the examples if you prefer.

CREATE TABLE dbo.TableA
(
    pk integer IDENTITY PRIMARY KEY CLUSTERED,
    [data] bigint SPARSE NULL,
    padding binary(250) NOT NULL DEFAULT 0x
);

CREATE TABLE dbo.TableB
(
    pk integer IDENTITY PRIMARY KEY CLUSTERED,
    [data] bigint SPARSE NULL,
    padding binary(250) NOT NULL DEFAULT 0x
);

Each table contains the numbers from 1 to 2,000 in the data column with an additional 40,000 rows where the data column is NULL:

-- Numbers 1 - 2,000
INSERT dbo.TableA
    WITH (TABLOCKX)
    ([data])
SELECT TOP (2000)
    [data] = ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.columns AS C1
CROSS JOIN sys.columns AS C2
ORDER BY [data];

-- 4,000 NULLs
INSERT TOP (40000)
    dbo.TableA 
    WITH (TABLOCKX)
    ([data])
SELECT
    [data] = CONVERT(bigint, NULL)
FROM sys.columns AS C1
CROSS JOIN sys.columns AS C2;

-- Copy into TableB
INSERT dbo.TableB 
    WITH (TABLOCKX)
    ([data])
SELECT
    TA.[data]
FROM dbo.TableA AS TA;

Both tables get a UNIQUE filtered index for the 2,000 non-NULL data values:

CREATE UNIQUE NONCLUSTERED INDEX uqA
ON dbo.TableA ([data])
WHERE [data] IS NOT NULL;

CREATE UNIQUE NONCLUSTERED INDEX uqB
ON dbo.TableB ([data])
WHERE [data] IS NOT NULL;

The output of DBCC SHOW_STATISTICS summarizes the situation:

DBCC SHOW_STATISTICS ('dbo.TableA', uqA) WITH STAT_HEADER;
DBCC SHOW_STATISTICS ('dbo.TableB', uqB) WITH STAT_HEADER;

Sample Data Statistics Header

Example Query

The query below joins the two tables. Imagine the tables are in some sort of parent-child relationship and many of the foreign keys are NULL. Something along those lines anyway:

SELECT
    TA.[data],
    TB.[data]
FROM dbo.TableA AS TA
JOIN dbo.TableB AS TB
    ON TA.[data] = TB.[data];

Default execution plan

With SQL Server in its default configuration (and the original cardinality estimation model), the optimizer chooses an execution plan featuring a parallel nested loops join:

Parallel Nested Loops

This plan has an estimated cost of 7.7768 magic optimizer units™.

Weirdness

There are some strange things about this plan. The Index Seek uses our filtered index on table B, but the query is driven by a Clustered Index Scan of table A. The join predicate is an equality test on the data columns, which will reject NULLs (regardless of the ANSI_NULLS setting).

We might have hoped the optimizer would perform some advanced reasoning based on that observation, but no. This plan reads every row from table A (including the 40,000 NULLs), performs a seek into the filtered index on table B for each one, relying on the fact that NULL will not match NULL in that seek. This is a tremendous waste of effort.

The odd thing is that the optimizer must have realised the join rejects NULLs in order to choose the filtered index for the table B seek, but it did not think to filter NULLs from table A first. Or better still, to simply scan the NULL-free filtered index on table A.

You might wonder if this is a cost-based decision, maybe the statistics are not very good? Perhaps we should force the use of the filtered index with a hint? Hinting the filtered index on table A just results in the same plan with the roles reversed, scanning table B and seeking into table A.

Forcing the filtered index for both tables produces an error:

Msg 8622, Level 16, State 1
Query processor could not produce a query plan because of the hints defined in this query. 
Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Adding a NOT NULL predicate

Suspecting the cause to be related to the implied null-rejection of the join predicate, we add an explicit NOT NULL predicate to the ON clause (or the WHERE clause if you prefer, it comes to the same thing here):

SELECT
    TA.[data],
    TB.[data]
FROM dbo.TableA AS TA
JOIN dbo.TableB AS TB
    ON TA.[data] = TB.[data]
    AND TA.[data] IS NOT NULL;

We added the NOT NULL check to the table A column because the original plan scanned that table’s clustered index rather than using our filtered index (the seek into table B was fine – it did use the filtered index). The new query is semantically exactly the same as the previous one, but the execution plan is different:

Serial Nested Loops

Now we have the hoped-for scan of the filtered index on table A, producing 2,000 non-null rows to drive the nested loop seeks into table B. Both tables are using our filtered indexes apparently optimally now. The new plan costs 0.362835 units (down from 7.7768).

Adding two NOT NULL predicates

The redundant NOT NULL predicate for table A worked well. What happens if we add one for table B as well?

SELECT
    TA.[data],
    TB.[data]
FROM dbo.TableA AS TA
JOIN dbo.TableB AS TB
    ON TA.[data] = TB.[data]
    AND TA.[data] IS NOT NULL
    AND TB.[data] IS NOT NULL;

This query is still logically the same as the two previous efforts, but the execution plan is different again:

Hash Match Plan

This plan builds a hash table for the 2,000 rows from table A, then probes for matches using the 2,000 rows from table B. The estimated number of rows returned is much better than the previous plan (did you notice the 7,619 estimate there?) and the estimated execution cost has dropped again, from 0.362835 to 0.0772056.

You could try forcing a hash join using a hint on the original or single extra predicate queries, but you won’t get the low-cost plan shown above. The optimizer just does not have the ability to fully reason about the null-rejecting behaviour of the join as it applies to our filtered indexes without both redundant predicates.

You are allowed to be surprised by this — even if it’s just the idea that one redundant predicate was not enough (surely if TA.data is NOT NULL and TA.data = TB.data, it follows that TB.data is also NOT NULL, right?)

Still not perfect

It is a little surprising to see a hash join there. If you are familiar with the main differences between the three physical join operators, you probably know that hash join is a top candidate where:

Pre-sorted input is not available.
The hash build input is smaller than the probe input.
The probe input is quite large.

None of these things is true here. The best physical join for this query and data set would be a merge join, exploiting the ordered input available from our two filtered indexes. We can try hinting a merge join, retaining the two extra ON clause predicates:

SELECT
    TA.[data],
    TB.[data]
FROM dbo.TableA AS TA
JOIN dbo.TableB AS TB
    ON TA.[data] = TB.[data]
    AND TA.[data] IS NOT NULL
    AND TB.[data] IS NOT NULL
OPTION (MERGE JOIN);

The plan shape is as we hoped:

Merge Join

An ordered scan of both filtered indexes, and great cardinality estimates. Just one small problem: The estimated cost has jumped from 0.0772056 to 0.741527. The reason for the increase in cost is revealed by checking the properties of the merge join operator:

Merge Join Properties

This is a many-to-many merge join, where the execution engine must keep track of duplicates from the outer input in a worktable, and rewind as necessary. Duplicates? We are scanning a unique index!

It turns out the optimizer does not know that a filtered unique index produces unique values. This is a one-to-one join, but the optimizer costs it as if it were many-to-many. The higher estimated cost of a many-to-many merge join explains why a hash join plan is chosen.

An Alternative Strategy

It seems we keep coming up against optimizer limitations when using filtered indexes here (despite it being a highlighted use case in the documentation). What happens if we try using views instead?

Using Views

The following two views filter each base table to return only the rows where the data column is not null:

CREATE VIEW dbo.VA
WITH SCHEMABINDING AS
SELECT
    TA.pk,
    TA.[data],
    TA.padding
FROM dbo.TableA AS TA
WHERE TA.[data] IS NOT NULL;
GO
CREATE VIEW dbo.VB
WITH SCHEMABINDING AS
SELECT
    TB.pk,
    TB.[data],
    TB.padding
FROM dbo.TableB AS TB
WHERE TB.[data] IS NOT NULL;

Rewriting the original query to use the views:

SELECT 
    V1.[data],
    V2.[data]
FROM dbo.VA AS V1
JOIN dbo.VB AS V2
    ON V2.[data] = V1.[data];

Remember this query originally produced a parallel nested loops plan costed at 7.7768 units. With the view references, we get this execution plan:

View default plan

This is the same hash join plan we had to add redundant NOT NULL predicates to get with the filtered indexes. The cost is 0.0772056 units as before. This is expected, because all we have essentially done here is to push the extra NOT NULL predicates from the query into the views.

Indexing the views

We can materialize the views by creating a unique clustered index on the pk column:

CREATE UNIQUE CLUSTERED INDEX cuq ON dbo.VA (pk);
CREATE UNIQUE CLUSTERED INDEX cuq ON dbo.VB (pk);

Now add unique nonclustered indexes on the filtered data column in the indexed view:

CREATE UNIQUE NONCLUSTERED INDEX ix ON dbo.VA ([data]);
CREATE UNIQUE NONCLUSTERED INDEX ix ON dbo.VB ([data]);

Note: The filtering is performed in the view, these nonclustered indexes are not themselves filtered.

The perfect plan

We are now ready to run our query against the materialized views, using the NOEXPAND hint:

SELECT 
    V1.[data],
    V2.[data]
FROM dbo.VA AS V1 WITH (NOEXPAND)
JOIN dbo.VB AS V2 WITH (NOEXPAND)
    ON V2.[data] = V1.[data];

The execution plan is:

Indexed views merge join plan

Merge join properties

The optimizer can see the unfiltered nonclustered view indexes are unique, so a many-to-many merge join is avoided.

This final execution plan has an estimated cost of 0.0310929 units, which is even lower than the hash join plan (0.0772056 units). This validates the expectation that a merge join ought to have the lowest estimated cost for this query, sample data set, and schema.

The NOEXPAND hints are needed even in Enterprise Edition to ensure the uniqueness guarantee provided by the view indexes is utilized by the optimizer.

Summary

This post highlights two important optimizer limitations with filtered indexes:

Redundant join predicates can be necessary to match filtered indexes.
Filtered unique indexes do not provide uniqueness information to the optimizer.

In some cases it may be practical to add the redundant predicates to every query. The alternative is to encapsulate the desired implied predicates in an unindexed view.

The hash match plan in this post was much better than the default plan, even though the optimizer ought to be able to find the slightly better merge join plan.

Sometimes, you may need to index the view and use NOEXPAND hints (required anyway for Standard Edition).

Thanks for reading.