Page Free Space

SQL Server Parallel Index Builds

2024-06-20T00:00:00.007+12:00

Parallel Index Building Execution Plan

SQL Server doesn't support parallel modifications to a b-tree index.

That might sound surprising. After all, you can certainly write to the same b-tree index from multiple sessions concurrently. For example, two sessions can happily write alternating odd and even numbers to the same integer b-tree index. So long as both sessions execute on different schedulers and take row locks, there will be no blocking and you'll get true concurrency.

No, what I mean is: A single session can't write to a b-tree index using more than one thread. No parallel plan modifications of a b-tree index, in other words. It's a bit like the lack of parallel backward ordered scans. There's no reason it couldn't be implemented, but it hasn't been so far.

You may have thought SQL Server would use a regular parallel scan to read the index source data, optionally sort it into index key order, then add those rows to the index in parallel. This would indeed work, even without sorting, but SQL Server just can't do it.

In case you're wondering, sorting into destination key order is an optimization. The resulting index would still be correct without it, but you'd be inserting rows essentially at random into a b-tree, with all the random I/O and page splitting that would entail.

Ok, you say, but what about parallel index builds? They've been around for a long time in premium editions and certainly seem to modify a single b-tree in parallel. Yes, they do seem to, but SQL Server cheats.

Read the full article on 𝕏.

https://t.co/0f59kSlLzQ
— Paul White (@SQL_Kiwi) June 20, 2024

Impossible Execution Plan Timings

2024-05-31T05:35:00.002+12:00

Erik Darling (@erikdarlingdata) shared an interesting SQL Server execution plan with me recently. The demo script is at the end of this article.

The important section is shown below:

Impossible timings?

The Gather Streams operator appears to execute for less time (2.16s) than the Sort operator below it (5.431s). This seems impossible on the face of it.

The Parallelism (Gather Streams) operator runs in row mode (as always), while the Sort and Hash Match (Inner Join) operators both run in batch mode. This mixed mode plan adds a little complexity to interpreting plan timings because:

A batch mode operator reports CPU and elapsed times for that operator alone
A row mode operator reports times for itself and all its children

I've written about those aspects before in Understanding Execution Plan Operator Timings, which also covers a confusing situation that can arise in exclusively row mode parallel plans.

I showed a hidden option to make all operators report only their individual times in More Consistent Execution Plan Timings in SQL Server 2022. That feature isn't complete yet, so the results aren't perfect, and it's not documented or supported.

I mention all that in case you are interested in the background. None of the foregoing explains what we see in this mixed mode plan. The row mode Gather Streams elapsed time ought to include its children. The batch mode Sort should just be reporting its own elapsed time. With that understanding in mind, there's no way the Sort could run for longer than the Gather Streams. What's going on here?

Read the full article on 𝕏.

https://t.co/DrANqdwngA
— Paul White (@SQL_Kiwi) May 30, 2024

Setting a Fixed Size for Transaction Log Virtual Log Files (VLFs)

2023-11-17T22:18:00.011+13:00

Setting a Fixed Size for Transaction Log VLFs

The documentation has this to say about virtual log file (VLF) sizes:

The SQL Server Database Engine divides each physical log file internally into several virtual log files (VLFs). Virtual log files have no fixed size, and there’s no fixed number of virtual log files for a physical log file. The Database Engine chooses the size of the virtual log files dynamically while it’s creating or extending log files. The Database Engine tries to maintain a few virtual files. The size of the virtual files after a log file has been extended is the sum of the size of the existing log and the size of the new file increment. The size or number of virtual log files can’t be configured or set by administrators.

It then goes on to describe the problems having too many VLFs can cause, and how the database owner can arrange things so a reasonable number of VLFs are created. There’s even a (mostly accurate) formula for the number and size of VLFs SQL Server will create when asked to extend a transaction log file.

This is all very familiar, of course, but it is also dumb. Why on earth should we have to worry about internal formulas? It seems ridiculous to have to provision or grow a transaction log in pieces just to get a reasonable VLF outcome.

Wouldn’t it be better to be able to specify a fixed size for VLFs instead?

Starting with SQL Server 2022, there is now a way though it is undocumented and unsupported for the time being at least.

You can’t use it in a production database and there’s a real risk of it damaging your database beyond repair. Aside from those warnings, there’s no reason not to play around with it in a development environment. Or, if you’re simply curious to know more, read on.

Page Free Space

SQL Server Parallel Index Builds

Impossible Execution Plan Timings

Setting a Fixed Size for Transaction Log Virtual Log Files (VLFs)

Why Batch Mode Sort Spills Are So Slow

Fast Key Optimization for Row Mode Sorts

Importing a File in Batches

Reducing Contention on the NESTING_TRANSACTION_FULL latch

More Consistent Execution Plan Timings in SQL Server 2022

Be Careful with LOBs and OPTION (RECOMPILE)

Empty Parallel Zones

Incorrect Results with Parallel Eager Spools and Batch Mode

sql_handle and the SQL Server batch text hash

Closest Match with Sort Rewinds

SQL Server 2019 Aggregate Splitting

A bug with Halloween Protection and the OUTPUT Clause

Background

How MAXDOP Really Works

Pulling Group By Above a Join

Batch Mode Bitmap Demos

Apply versus Nested Loops Join

SQL Server Temporary Object Caching

Cardinality Estimation for Disjunctive (OR) Predicates in SQL Server 2014 Onward

Introduction

Nested Loops Prefetching

Parameter Sniffing, Embedding, and the RECOMPILE Options

Parameter Sniffing

Incorrect Results Caused By Adding an Index

Two Partitioning Peculiarities