Thursday, 17 February 2011

Is it a Seek or a Scan?

Introduction

You might be most familiar with the terms ‘Seek’ and ‘Scan’ from the graphical plans produced by SQL Server Management Studio (SSMS). You might look to the SSMS tool-tip descriptions to explain the differences between them:

Scan and Seek tooltips

Both mention scans and ranges (nothing about seeks) and the Index Seek description maybe implies that it will not scan the index entirely (which isn’t necessarily true). Not massively helpful.

In my last post, When is a Seek not a Seek? we saw two Clustered Index Seek operations doing very different things. The first Seek operator actually performed 63 single-row seeking operations. The second Seek performed a Range Scan (more on that later on).

I hope you agree that those were two very different operations, and perhaps you are wondering why there aren’t different graphical plan icons for Range Scans and Seeks?

I have often wondered about that myself. The first person to mention it after yesterday’s post was Erin Stellato:

Tweet by Erin Stellato

Seeks and Scans

Before we go on to make sense of all this, let’s look at another example of how SQL Server confusingly mixes the terms ‘Scan’ and ‘Seek’ in different contexts.

The diagram below shows a heap table with two columns, one of which is the non-clustered primary key. The other column has a non-unique non-clustered index defined on it.

The right hand side of the diagram shows a query, its associated execution plan, and a couple of extracts from the SSMS tool-tip and Properties windows.

Heap table, data, query, and execution plan

Notice the Scan Direction entry in the Properties window snippet.

Is this a seek or a scan? The different references to Scans and Seeks are even more pronounced in the XML plan output that the graphical plan is based on.

The following fragment is what lies behind the single Index Seek icon shown above:

Seek XML show plan fragment

You can find the same confusing references to Seeks and Scans throughout the product and its documentation.

Making Sense of Seeks

Let’s forget all about scans for a moment, and think purely about seeks.

Loosely speaking, a seek is the process of navigating an index B-tree to find a particular index record, most often at the leaf level. A seek starts at the root and navigates down through the levels of the index to find the point of interest:

Seek diagram

Singleton Lookups

The simplest sort of seek predicate performs that b-tree traversal to find (at most) a single record.

This is the case when we search for a single value using a unique index and an equality predicate.

It should be readily apparent that this type of search will either find one record, or none at all. This operation is known as a singleton lookup.

Given the example table from before, the following query is an example of a singleton lookup seek:

Singleton lookup

Sadly, there’s nothing in the graphical plan or XML output to show that this is a singleton lookup. You have to infer it from the fact that this is a single-value equality seek on a unique index.

The other common examples of a singleton lookup are bookmark lookups. Both the RID and Key Lookup forms are singleton lookups.

An RID Lookup finds a single record in a heap from the unique row locator. A Key Lookup does much the same thing on a clustered table).

If you happen to run your query with STATISTICS IO ON, you will notice that Scan Count is always zero for a singleton lookup.

Range Scans

The other type of seek predicate is a ‘seek plus range scan’, which I will refer to simply as a range scan.

The seek operation makes an initial descent into the index structure to find the first leaf row that qualifies, and then performs a range scan (either backwards or forwards in the index) until it reaches the end of the scan range.

The ability of a range scan to proceed in either direction comes about because index pages at the same level are connected by a doubly-linked list. Each page has a pointer to the previous page (in logical key order) as well as a pointer to the following page. The doubly-linked list is represented by the green and red dotted arrows in the index diagram presented earlier.

One subtle (but important) point is that the notion of a ‘forward’ or ‘backward’ scan applies to the logical key order defined when the index was built.

In the present case, the non-clustered primary key index was created as follows:

CREATE TABLE dbo.Example
(
    key_col integer NOT NULL,
    [data] integer NOT NULL,

    CONSTRAINT [PK dbo.Example key_col]
        PRIMARY KEY NONCLUSTERED (key_col ASC)
);

Notice that the primary key index specifies an ascending sort order for the single key column.

This means that a forward scan of the index will retrieve keys in ascending order, while a backward scan would retrieve keys in descending key order.

If the index had been created instead on key_col DESC, a forward scan would retrieve keys in descending order, and a backward scan would return keys in ascending order.

A range scan seek predicate may have a Start condition, an End condition, or both. Where one is missing, the scan starts (or ends) at one extreme end of the index, depending on the scan direction.

Some examples might help clarify all that. The following diagram shows four queries, each of which performs a single seek against a column holding every integer from 1 to 100 inclusive.

The results from each query are shown in the blue columns, and relevant attributes from the Properties window appear on the right:

Forward and backward scans

Query 1 specifies that all key_col values less than 5 should be returned in ascending order.

The execution plan achieves this by seeking to the start of the index leaf (there is no explicit starting value) and scanning forward until the End condition key_col_ < 5)is no longer satisfied (SQL Server knows it can stop looking as soon as it finds a key_col value that isn’t less than 5 because all later index entries are guaranteed to sort higher).

Query 2 asks for key_col values greater than 95, in descending order.

SQL Server returns these results by seeking to the end of the index, and scanning backwards (in descending key order) until it comes across a row that isn’t greater than 95.

Sharp-eyed readers may notice that the end-of-scan condition is shown as a Start range value. This is a bug in the XML show plan which bubbles up to the Properties window. When a backward scan is performed, the roles of the Start and End values are reversed, but the plan does not reflect that. Oh well.

Query 3 looks for key_col values that are greater than or equal to 10, and less than 15, in ascending order.

This time, SQL Server seeks to the first index record that matches the Start condition key_col_ >= 10 and then scans forward through the leaf pages until the End condition key_col_ < 15 is no longer met.

Query 4 performs much the same sort of operation as Query 3, but requests the output in descending order.

Again, we have to mentally reverse the Start and End conditions because of the bug, but otherwise the process is the same as always. SQL Server finds the highest-sorting record that meets the condition key_col_ < 25 and scans backward until key_col_ >= 20 is no longer true.

One final point to note: seek operations always have the Ordered: True attribute. This means that the operator always produces rows in a sorted order, either ascending or descending depending on how the index was defined, and whether the scan part of the operation is forward or backward.

You cannot rely on this sort order in your queries of course (you must always specify an ORDER BY clause if order is important) but SQL Server can make use of the sort order internally.

In the four queries above, the query optimizer was able to avoid an explicit Sort operator to honour the ORDER BY clause, for example.

Multiple Seek Predicates

As we saw yesterday, a single Index Seek plan operator can contain one or more seek predicates. These seek predicates can either be all singleton seeks or all range scans—SQL Server does not mix them.

For example, you might expect the following query to contain two seek predicates, a singleton seek to find the single record in the unique index where key_col = 10, and a range scan to find the key_col values between 15 and 20:

SELECT
    E.key_col
FROM dbo.Example AS E
WHERE
    E.key_col = 10
    OR E.key_col BETWEEN 15 AND 20
ORDER BY
    E.key_col ASC;

In fact, SQL Server transforms the singleton seek key_col = 10 to the equivalent range scan:

Start:[key_col >= 10], End:[key_col <= 10]

This allows both range scans to be evaluated by a single seek operator. To be clear, this query results in two range scans — one from 10 to 10, and one from 15 to 20.

Final Thoughts

That is it for today. The next post in this mini-series will look at monitoring singleton lookups and range scans, and I’ll show you a seek on a heap table.

Yes, a seek. On a heap. No index!

If you would like to run the queries in this post for yourself, there is a script below.

Complete test script

IF OBJECT_ID(N'dbo.Example', N'U') IS NOT NULL
BEGIN
    DROP TABLE dbo.Example;
END;

-- Test table is a heap
-- Non-clustered primary key on 'key_col'
CREATE TABLE dbo.Example
(
    key_col integer NOT NULL,
    [data] integer NOT NULL,

    CONSTRAINT [PK dbo.Example key_col]
        PRIMARY KEY NONCLUSTERED (key_col ASC)
);

-- Non-unique non-clustered index on the 'data' column
CREATE NONCLUSTERED INDEX 
    [IX dbo.Example data]
ON dbo.Example ([data]);

-- Add 100 rows
INSERT dbo.Example 
    WITH (TABLOCKX)
    (
        key_col, 
        [data]
    )
SELECT
    key_col = V.number, 
    [data] = V.number
FROM master.dbo.spt_values AS V
WHERE
    V.[type] = N'P'
    AND V.number BETWEEN 1 AND 100;

-- ================
-- Singleton lookup
-- ================

-- Single value equality seek in a unique index
-- Scan count = 0 when STATISTIS IO is ON
-- Check the XML SHOWPLAN

SELECT E.key_col 
FROM dbo.Example AS E
WHERE E.key_col = 32;

-- ===========
-- Range Scans
-- ===========

-- Query 1
SELECT
    E.key_col 
FROM dbo.Example AS E
WHERE
    E.key_col <= 5 
ORDER BY
    E.key_col ASC;

-- Query 2
SELECT
    E.key_col 
FROM dbo.Example AS E
WHERE
    E.key_col > 95 
ORDER BY
    E.key_col DESC;

-- Query 3
SELECT
    E.key_col 
FROM dbo.Example AS E
WHERE
    E.key_col >= 10 
    AND E.key_col < 15 
ORDER BY
    E.key_col ASC;

-- Query 4
SELECT
    E.key_col 
FROM dbo.Example AS E
WHERE
    E.key_col >= 20 
    AND E.key_col < 25 
ORDER BY
    E.key_col DESC;

-- Final query (singleton + range = 2 range scans)
SELECT
    E.key_col
FROM dbo.Example AS E
WHERE
    E.key_col = 10
    OR E.key_col BETWEEN 15 AND 20
ORDER BY
    E.key_col ASC;

-- === TIDY UP ===
DROP TABLE dbo.Example;