Wednesday, 28 July 2010

The Segment and Sequence Project Iterators

In my last post I promised to cover the Segment iterator in more detail, so here we go.

Segment

The Segment iterator partitions rows into groups as they flow through a query plan, checking whether the current row belongs in the same group as the previous row. For this to work, the incoming rows must be presented in an order which guarantees that all members of a group are received sequentially.

Segment has a “Group By” argument to specify how it should partition its input. The iterator adds an additional column to the rows that flow through it. This new column is used to communicate with its parent iterator, and is named something like [Segment1003].

The new column is visible in the graphical query plan properties window, or by hovering your mouse cursor over the Segment operator. The attribute name in both cases is Segment Column).

Segment Top

Like almost all iterators, Segment processes rows one at a time, and signals the start of a new group in the Segment Column. Let’s look again at the query plan from my last post showing Segment followed by Top:

Segment Top Plan

The Index Scan produces rows in an order suitable for the Segment to split into groups. The Top iterator passes on some fraction of the rows it receives as normal, and uses the information in the Segment Column to decide when it should reset its count for the next group.

The effect is as if the Segment passed a group of rows (all at once), and the Top operation occurred once for each group. You can think of the iterator logically passing a group at a time, whereas physically it processes row-by-row, communicating the state of play with the following Top via the Segment Column.

In case you are wondering if it wouldn’t be more efficient for the Segment to pass a group a time to the Top, consider what the Segment would do with the rows that were piling up, waiting for the group to change. We could imagine a situation where a very large number of wide rows occur for a particular group. In that case, Segment would require either a very large memory grant, or a worktable in tempdb to buffer the rows.

We know that Segment does not request a memory grant, because it would show up in the query plan after execution, and in the sys.dm_exec_query_memory_grants dynamic management view.

We can see that a tempdb worktable is not used by examining the output of the query with SET STATISTICS IO ON. With this option enabled, any worktables used would be reported—but nothing is shown. With nowhere to store a potentially huge number of pending group rows, the physical implementation of Segment must be row-by-row.

An interesting consequence of this physical processing is that the Top iterator is ‘Segment aware’. The Segment Column is bound to the Top at compilation time, so that the Top knows which Segment Column it should use, in a plan with multiple such iterators.

Segment and Sequence Project

In the ranking function example in my last post, I used DENSE_RANK to determine which rows to output. The query plan also included the Segment iterator, along with a Sequence Project.

In this next section, we’ll look at the interaction between Segment and Sequence Project, and discover why DENSE_RANK requires two Segment iterators:

Plan with two Segment iterators

The two Segments and the Sequence Project work together to produce the result of the expression DENSE_RANK() OVER (PARTITION BY Shelf, Bin ORDER BY Quantity).

The first Segment has a GROUP BY argument of Shelf and Bin—mirroring the PARTITION BY clause. It uses a Segment Column labelled [Segment1003] to indicate whether the current row is the first of a new group or not.

The second Segment is required because the value of DENSE_RANK changes when the current value is different from the previous row, as well as resetting at the start of a new group. To accommodate this, the query plan needs a second flag, stored in the second Segment Column, named [Segment1004] in this case.

With these two flags (‘start of group’ and ‘value changed’) the Sequence Project can decide whether to increment, reset, or leave unchanged the current internal value returned by the DENSE_RANK function.

The Sequence Project adds the current internal value as a new column in the data stream. This can be seen in the Defined Values argument of the Sequence Project. In this example, the new column is called [Expr1002].

Final thoughts

In the next post in this series, I take a look at common subexpression spools—as used by windowed aggregates like COUNT...OVER, and by per-index (wide) update plans.