tag:blogger.com,1999:blog-37027029235920932882024-03-13T12:15:59.437+13:00Page Free SpaceSQL Server internals by Paul WhitePaul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.comBlogger94125tag:blogger.com,1999:blog-3702702923592093288.post-56735446892774128492023-11-17T22:18:00.011+13:002023-11-18T19:33:03.926+13:00Setting a Fixed Size for Transaction Log Virtual Log Files (VLFs)<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Setting a Fixed Size for Transaction Log VLFs</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>The <a href="https://learn.microsoft.com/en-us/sql/relational-databases/sql-server-transaction-log-architecture-and-management-guide#virtual-log-files-vlfs">documentation</a> has this to say about virtual log file (VLF) sizes:</p>
<blockquote>
<p>The SQL Server Database Engine divides each physical log file internally into several virtual log files (VLFs). Virtual log files have no fixed size, and there’s no fixed number of virtual log files for a physical log file. The Database Engine chooses the size of the virtual log files dynamically while it’s creating or extending log files. The Database Engine tries to maintain a few virtual files. The size of the virtual files after a log file has been extended is the sum of the size of the existing log and the size of the new file increment. The size or number of virtual log files can’t be configured or set by administrators.</p>
</blockquote>
<p>It then goes on to describe the problems having too many VLFs can cause, and how the database owner can arrange things so a reasonable number of VLFs are created. There’s even a (mostly accurate) formula for the number and size of VLFs SQL Server will create when asked to extend a transaction log file.</p>
<p>This is all very familiar, of course, but it is also dumb. Why on earth should we have to worry about internal formulas? It seems ridiculous to have to provision or grow a transaction log in pieces just to get a reasonable VLF outcome.</p>
<p>Wouldn’t it be better to be able to specify a fixed size for VLFs instead?</p>
<p>Starting with <strong>SQL Server 2022</strong>, there is now a way though it is <strong>undocumented and unsupported</strong> for the time being at least.</p>
<p>You can’t use it in a production database and there’s a real risk of it damaging your database beyond repair. Aside from those warnings, there’s no reason not to play around with it in a development environment. Or, if you’re simply curious to know more, read on.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2023/11/fixed-size-vlfs.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-56755904924284932062023-11-13T18:53:00.003+13:002023-11-13T19:09:16.143+13:00Why Batch Mode Sort Spills Are So Slow<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Why Batch Mode Sort Spills Are So Slow</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
<meta name="twitter:image" content="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhywIa-CpfRpAIwCsDSNl9XqXdx6iNTeAFir9sjUy1fz0sIdFkgHa7ShkGIJ2vqxVWsA8oWIOLU2B1ybfzIpYt9_Zk6a8visGnmG2OVtmbTSBwL72raI5TtdSFEUScbV6shwYpzz1iSJjJx16PJfZxDSHUYoDk3kaaI4j8nX6T5Fiq7KfDCQAnsGT3jHAI/s1600/01.png">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>Batch mode sorting was added to SQL Server in the 2016 release under compatibility level 130. Most of the time, a batch mode sort will be much faster than the row mode equivalent.</p>
<p>This post is about an important exception to this rule, as recently <a href="https://feedback.azure.com/d365community/idea/159757c3-8751-ee11-a81c-002248544521">reported</a> by Erik Darling (<a href="https://erikdarling.com/performance-regression-with-batch-mode-sorts-when-spilling-to-disk/">video</a>).</p>
<p>No doubt you’ll visit both links before reading on, but to summarize, the issue is that batch mode sorts are <strong>very slow when they spill</strong>—much slower than an equivalent row mode sort.</p>
<p>This also seems like a good opportunity to write down some sorting details I haven’t really covered before. If you’re not interested in those details and background to the current issue, you can skip down to the section titled, “Erik’s Demo”.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2023/11/batch-sort-spills-slow.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com2tag:blogger.com,1999:blog-3702702923592093288.post-21029428163996902242023-10-20T15:52:00.003+13:002023-10-20T16:21:26.891+13:00Fast Key Optimization for Row Mode Sorts<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Fast Key Optimization for Row Mode Sorts</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>SQL Server row-mode sorts generally use a custom implementation of the well-known <a href="https://en.wikipedia.org/wiki/Merge_sort">merge sort</a> algorithm to order data.</p>
<p>As a comparison-based algorithm, this performs a large number of value comparisons during sorting—usually many more than the number of items to sort.</p>
<p>Although each comparison is typically not expensive, even moderately sized sorting can involve a very large number of comparisons.</p>
<p>SQL Server can be called upon to sort a variety of data types. To facilitate this, the sorting code normally calls out to a specific comparator to determine how two compared values should sort: lower, higher, or equal.</p>
<p>Although calling comparator code has low overhead, performing enough of them can cause noticeable performance differences.</p>
<p>To address this, SQL Server has always (since at least version 7) supported a <em>fast key</em> optimization for simple data types. This optimization performs the comparison using highly optimized inline code rather than calling out to a separate routine.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2023/10/fast-key-row-mode-sorts.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-71721964364699623002023-08-02T20:32:00.008+12:002023-08-03T18:44:58.844+12:00Importing a File in Batches<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Importing a File in Batches</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>There are a million ways to import data into SQL Server. Most of the time, we want to ingest the new data as quickly and efficiently possible but that’s not always the case.</p>
<p>Sometimes, we need to accept data at a rate that will not dominate resource usage on the target system or cause excessive transaction log growth. In other cases, each row from the data source needs specific server-side processing to validate and persist the data across multiple relational tables, perhaps involving foreign keys and identity columns.</p>
<p>All this can be achieved with client-side tools and programming. It can also be done server-side by importing the raw data into a staging table before processing using T-SQL procedures.</p>
<p>Other times, the need arises to ingest data <strong>without using client-side tools</strong> and <strong>without making a complete copy</strong> of the raw data on the server. This article describes one possible approach in that situation.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2023/08/importing-file-in-batches.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-85349161025920395392022-08-30T00:15:00.002+12:002023-02-01T21:22:24.393+13:00Reducing Contention on the NESTING_TRANSACTION_FULL latch<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Reducing Contention on the NESTING_TRANSACTION_FULL latch</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>Each additional worker thread in a parallel execution plan executes inside a <em>nested transaction</em> associated with the single parent transaction.</p>
<p>Parallel worker access to shared parent transaction structures is protected by a latch. A <a href="https://www.sqlskills.com/help/latches/nesting_transaction_readonly/"><code>NESTING_TRANSACTION_READONLY</code></a> latch is used for a read-only transaction. A <a href="https://www.sqlskills.com/help/latches/nesting_transaction_full/"><code>NESTING_TRANSACTION_FULL</code></a> latch is used if the transaction has modified the database.</p>
<p>This design has its roots in SQL Server 7, where read-only query parallelism was introduced. SQL Server 2000 built on this with parallel index builds, which for the first time allowed multiple threads to cooperate to change a persistent database structure. Many improvements have followed since then, but the fundamental parent-child transaction design remains today.</p>
<p>Though lightweight, a latch can become a point of contention when requested sufficiently frequently in incompatible modes by many different threads. Some contention on shared resources is to be expected; it becomes a problem when latch waits start to affect CPU utilisation and throughput.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2022/08/reduce-contention-nesting-transaction-full.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-35100958699644598372022-07-23T23:45:00.009+12:002022-09-06T06:48:56.242+12:00More Consistent Execution Plan Timings in SQL Server 2022<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>More Consistent Execution Plan Timings in SQL Server 2022</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>The updated showplan schema shipped with <a href="https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms-19">SSMS 19 preview 2</a> contains an interesting comment:</p>
<blockquote>
<p>ExclusiveProfileTimeActive: true if the actual elapsed time (ActualElapsedms attribute) and the actual CPU time (ActualCPUms attribute) represent the time interval spent exclusively within the relational iterator.</p>
</blockquote>
<p>What does this mean?</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2022/07/consistent-plan-timings-2022.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com1tag:blogger.com,1999:blog-3702702923592093288.post-874891958648459482021-11-18T00:40:00.002+13:002021-12-03T04:52:18.329+13:00Be Careful with LOBs and OPTION (RECOMPILE)<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Be Careful with LOBs and OPTION (RECOMPILE)</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>It sometimes makes sense to add <code>OPTION (RECOMPILE)</code> to a query. Typically this will be when:</p>
<ul>
<li>A good enough plan for the query is very sensitive to one or more parameters</li>
<li>No good single value exists for the parameter to use in a hint</li>
<li>Optimize for unknown doesn’t give a good result</li>
<li>The plan might be expected to change over time</li>
<li>The cost of recompiling the statement is much less than the expected execution time</li>
<li>Recompiling every time is very likely to save more time and resources than it costs overall</li>
</ul>
<p>All that is fairly well-known. The point of this short post is to draw your attention to another side-effect of adding <code>OPTION (RECOMPILE)</code> — the parameter embedding optimization (PEO).</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2021/11/careful-lobs-recompile.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com1tag:blogger.com,1999:blog-3702702923592093288.post-79045539876294436182021-06-05T20:35:00.001+12:002021-06-05T20:35:59.689+12:00Empty Parallel Zones<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Empty Parallel Zones</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>An <strong>empty parallel zone</strong> is an area of the plan bounded by exchanges (or the leaf level) containing no operators.</p>
<p>How and why does SQL Server sometimes generate a parallel plan with an empty parallel zone?</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2021/06/empty-parallel-zones.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-90027269506849682302021-03-24T00:54:00.007+13:002023-09-09T21:08:31.339+12:00Incorrect Results with Parallel Eager Spools and Batch Mode<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Incorrect Results with Parallel Eager Spools and Batch Mode</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>You might have noticed a warning at the top of the release notes for <a href="https://support.microsoft.com/en-us/topic/kb5000645-cumulative-update-16-for-sql-server-2016-sp2-a3997fa9-ec49-4df0-bcc3-12dd58b78265">SQL Server 2016 SP2 CU 16</a>:</p>
<blockquote>
<p><strong>Note:</strong> After you apply CU 16 for SQL Server 2016 SP2, you might encounter an issue in which DML (insert/update/delete) queries that use parallel plans cannot complete any execution and encounter <strong>HP_SPOOL_BARRIER</strong> waits. You can use the trace flag 13116 or MAXDOP=1 hint to work around this issue. This issue is related to the introduction of fix for 13685819 and it will be fixed in the next Cumulative Update.</p>
</blockquote>
<p>That warning links to bug reference <a href="https://support.microsoft.com/en-us/topic/kb5000645-cumulative-update-16-for-sql-server-2016-sp2-a3997fa9-ec49-4df0-bcc3-12dd58b78265#bkmk_13685819">13685819</a> on the same page. There isn’t a separate KB article, only the description:</p>
<blockquote>
<p>Fixes an issue with insert query in SQL Server 2016 that reads the data from the same table and uses a parallel execution plan may produce duplicate rows</p>
</blockquote>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2021/03/spools-batch-mode-hp.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com10tag:blogger.com,1999:blog-3702702923592093288.post-44872391823669302322020-10-11T05:42:00.011+13:002021-02-25T05:31:59.564+13:00sql_handle and the SQL Server batch text hash<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>sql_handle and the SQL Server batch text hash</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>This article describes the structure of a <code>sql_handle</code> and shows how the batch text hash component is calculated.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2020/10/sqlhandle-and-sql-server-batch-text-hash.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com5tag:blogger.com,1999:blog-3702702923592093288.post-74463212658628463262020-10-08T03:01:00.000+13:002020-10-08T03:50:43.908+13:00Closest Match with Sort Rewinds<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Closest Match with Sort Rewinds</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>In <a href="https://sqlperformance.com/2020/10/sql-performance/when-do-sorts-rewind">When Do SQL Server Sorts Rewind?</a> I described how most sorts can only rewind when they contain at most one row. The exception is in-memory sorts, which can rewind at most 500 rows and 16KB of data.</p>
<p>These are certainly tight restrictions, but we can still make use of them on occasion.</p>
<p>To illustrate, I am going reuse a demo Itzik Ben-Gan provided in <a href="https://sqlperformance.com/2018/12/t-sql-queries/closest-match-part-1">part one of his <em>Closest Match</em> series</a>, specifically solution 2 (modified value range and indexing).</p>
<p>As Itzik’s title suggests, the task is to find the closest match for a value in one table in a second table.</p>
<p>As Itzik describes it:</p>
<blockquote>
<p>The challenge is to match to each row from T1 the row from T2 where the absolute difference between T2.<code>val</code> and T1.<code>val</code> is the lowest. In case of ties (multiple matching rows in T2), match the top row based on <code>val</code> ascending, <code>keycol</code> ascending order.</p>
<p>That is, the row with the lowest value in the <code>val</code> column, and if you still have ties, the row with the lowest <code>keycol</code> value. The tiebreaker is used to guarantee determinism.</p>
</blockquote>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2020/10/closest-match-with-sort-rewinds.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-55156334621091292392020-08-04T01:26:00.000+12:002020-10-07T23:48:07.589+13:00SQL Server 2019 Aggregate Splitting<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>SQL Server 2019 Aggregate Splitting</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>The SQL Server 2019 query optimizer has a new trick available to improve the performance of large aggregations. The new exploration abilities are encoded in two new closely-related optimizer rules:</p>
<ul>
<li><code>GbAggSplitToRanges</code></li>
<li><code>SelOnGbAggSplitToRanges</code></li>
</ul>
<p>The extended event <code>query_optimizer_batch_mode_agg_split</code> is provided to track when this new optimization is considered. The description of this event is:</p>
<blockquote>
<p>Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.</p>
</blockquote>
<p>Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2020/08/sql-server-2019-aggregate-splitting.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com2tag:blogger.com,1999:blog-3702702923592093288.post-13071378708608329572020-07-26T01:22:00.002+12:002021-08-16T23:23:28.509+12:00A bug with Halloween Protection and the OUTPUT Clause<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>A bug with Halloween Protection and the OUTPUT Clause</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><h2 id="background">Background</h2>
<p>The <a href="https://docs.microsoft.com/en-us/sql/t-sql/queries/output-clause-transact-sql"><code>OUTPUT</code> clause</a> can be used to return results from an <code>INSERT</code>, <code>UPDATE</code>, <code>DELETE</code>, or <code>MERGE</code> statement. The data can be returned to the client, inserted to a table, or both.</p>
<p>There are two ways to add <code>OUTPUT</code> data to a table:</p>
<ol>
<li>Using <code>OUTPUT INTO</code></li>
<li>With an outer <code>INSERT</code> statement.</li>
</ol>
<p>For example:</p>
<pre class=" language-sql"><code class="prism language-sql"><span class="token comment">-- Test table</span>
<span class="token keyword">DECLARE</span> <span class="token variable">@Target</span> <span class="token keyword">table</span>
<span class="token punctuation">(</span>
id <span class="token keyword">integer</span> <span class="token keyword">IDENTITY</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span><span class="token punctuation">,</span>
<span class="token number">c1</span> <span class="token keyword">integer</span> <span class="token boolean">NULL</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">-- Holds rows from the OUTPUT clause</span>
<span class="token keyword">DECLARE</span> <span class="token variable">@Output</span> <span class="token keyword">table</span>
<span class="token punctuation">(</span>
id <span class="token keyword">integer</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span><span class="token punctuation">,</span>
<span class="token number">c1</span> <span class="token keyword">integer</span> <span class="token boolean">NULL</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2020/07/a-bug-with-halloween-protection-and.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com5tag:blogger.com,1999:blog-3702702923592093288.post-59581021297822724492020-07-05T07:01:00.000+12:002020-10-07T23:48:44.618+13:00How MAXDOP Really Works<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>How MAXDOP Really Works</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>A few days ago I ran a Twitter poll:</p>
<p><img src="https://3.bp.blogspot.com/-apN0-wVbsCg/XwCsSmM0bMI/AAAAAAAAEbw/u5y50tXBdS4bV8CxN63KnRyb0TkrTVneQCLcBGAsYHQ/s1600/poll.png" alt="Twitter poll"></p>
<p>The <strong>most popular answer</strong> gets highlighted by Twitter at the end of the poll, but as with many things on social media, that doesn’t mean it is correct:</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2020/07/how-maxdop-really-works.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com1tag:blogger.com,1999:blog-3702702923592093288.post-65024483639230575542020-05-31T02:32:00.000+12:002020-10-07T23:48:57.010+13:00Pulling Group By Above a Join<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Pulling Group By Above a Join</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>One of the transformations available to the SQL Server query optimizer is pulling a logical <em>Group By</em> (and any associated aggregates) above a <em>Join</em>.</p>
<p>Visually, this means transforming a tree of logical operations from:</p>
<p><img src="https://1.bp.blogspot.com/-6dVyTKBeg5E/XtJKNY41gDI/AAAAAAAAEYU/WNgatqjUToMuQZcR91rPPvjZqttivEYKwCLcBGAsYHQ/s1600/01.png" alt="Group By Below Join" title="Group By Below Join"></p>
<p>…to this:</p>
<p><img src="https://2.bp.blogspot.com/-RQYQPYBQpXc/XtJKNX3SUTI/AAAAAAAAEYY/QXvpvUUMUwkyLqKTaQi0YXC7WPLrfRv4wCLcBGAsYHQ/s1600/02.png" alt="Group By Above Join" title="Group By Above Join"></p>
<p>The above diagrams are <em>logical representations</em>. They need to be implemented as <em>physical operators</em> to appear in an execution plan. The options are:</p>
<ul>
<li><strong>Group By</strong>
<ul>
<li>Hash Match Aggregate</li>
<li>Stream Aggregate</li>
<li>Distinct Sort</li>
</ul>
</li>
<li><strong>Join</strong>
<ul>
<li>Nested Loops Join</li>
<li>Nested Loops Apply</li>
<li>Hash Match Join</li>
<li>Merge Join</li>
</ul>
</li>
</ul>
<p>When the optimizer moves a <em>Group By</em> above a <em>Join</em> it has to preserve the semantics. The new sequence of operations must be <em>guaranteed</em> to return the same results as the original in all possible circumstances.</p>
<p>One cannot just pick up a <em>Group By</em> and arbitrarily move it around the query tree without risking incorrect results.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2020/05/pulling-group-by-above-join.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-17953328471273523922019-08-24T12:56:00.001+12:002020-10-07T23:49:07.146+13:00Batch Mode Bitmap Demos<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Batch Mode Bitmap Demos</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>This is a companion post to my main article <a href="https://sqlperformance.com/2019/08/sql-performance/batch-mode-bitmaps-in-sql-server">Batch Mode Bitmaps in SQL Server</a>. This post provides demos and illustrations to supplement the technical article.</p>
<p>The scripts presented here were run on SQL Server 2017 CU 16.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2019/08/batch-mode-bitmap-demos.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-19726292955604772702019-06-09T01:18:00.003+12:002022-09-20T05:04:25.127+12:00Apply versus Nested Loops Join<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Apply versus Nested Loops Join</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>SQL is a declarative language. We use SQL to write a <em>logical query specification</em> that defines the results we want. For example, we might write a query using either <code>APPLY</code> or <code>JOIN</code> that <em>logically</em> describes exactly the same results.</p>
<p>It is up to the query optimizer to find an efficient <em>physical</em> implementation of that <em>logical</em> requirement. SQL Server is free to choose any plan it likes, so long as the results are <em>guaranteed</em> to be the same as specified in the original SQL.</p>
<p>The optimizer is capable of transforming an <em>apply</em> to a <em>join</em> and <em>vice versa</em>. It generally tries to rewrite <em>apply</em> to <em>join</em> during initial compilation to maximize the searchable plan space during cost-based optimization. Having transformed an <em>apply</em> to a <em>join</em> early on, it may also consider a transformation back to an <em>apply</em> shape later on to assess the merits of e.g. an index loops join.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2019/06/apply-versus-nested-loops-join.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com1tag:blogger.com,1999:blog-3702702923592093288.post-25735430903579267052017-05-02T00:00:00.006+12:002023-12-19T00:13:24.175+13:00SQL Server Temporary Object Caching<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>SQL Server Temporary Object Caching</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>Creating a table is a relatively resource-intensive and time-consuming operation. The server must locate and allocate storage space for the new data and index structures and make the corresponding entries in multiple system metadata tables. All this work has to be done in ways that will always work correctly under high concurrency, and which meet all of the ACID guarantees expected of a relational database.</p>
<p>In SQL Server, this means taking the right kinds of locks and latches, in the correct sequence, while also ensuring that detailed transaction log entries are safely committed to persistent storage in advance of any physical changes to the database. These log entries ensure the system can bring the database back to a consistent state in the event of a transaction rollback or system crash.</p>
<p>Dropping a table is a similarly expensive operation. Luckily, most databases do not create or drop tables with any great frequency. The obvious exception to this is the system database <em>tempdb</em>. This single database contains the physical storage, allocation structures, system metadata, and transaction log entries for all temporary tables and table variables across the entire SQL Server instance.</p>
<p>It is in the nature of temporary tables and table variables to be created and dropped much more frequently than other database object types. When this naturally high frequency of creation and destruction is combined with the concentrating effect of all temporary tables and table variables being associated with a single database, it is hardly surprising that contention can arise in the allocation and metadata structures of the <em>tempdb</em> database.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2017/05/temp-object-caching.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-86015665362855928062014-04-15T19:13:00.000+12:002020-10-07T23:49:38.296+13:00Cardinality Estimation for Disjunctive (OR) Predicates in SQL Server 2014 Onward<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Cardinality Estimation for Disjunctive Predicates in SQL Server 2014</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><h2 id="introduction">Introduction</h2>
<p>Back in January 2014, I wrote an article called <a href="https://sqlperformance.com/2014/01/sql-plan/cardinality-estimation-for-multiple-predicates">Cardinality Estimation for Multiple Predicates</a> that described the cardinality estimation process for queries with multiple predicates, from the point of view of the old and new cardinality estimators.</p>
<p>The article describes the various behaviours and formulas involved, along with the usual sprinkling of documented and undocumented trace flags. I described the formula SQL Server 2014 uses to calculate a cardinality estimate for multiple predicates connected with <code>AND</code> (conjunctive predicates), which was already relatively well-known.</p>
<p>Despite some fairly energetic research, and basic-to-intermediate skills with Excel, I was unable to deduce a similar formula for the disjunctive case, where predicates are connected by <code>OR</code>. The trace flag output I describe in the other article clearly showed that exponential backoff was used in the new 2014 cardinality estimator, but the precise formula eluded me.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2014/04/cardinality-estimation-for-disjunctive-predicates-in-2014.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-51421843414157946062013-08-31T15:24:00.000+12:002020-08-06T21:12:32.690+12:00Nested Loops Prefetching<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Nested Loops Prefetching</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>Nested loops join query plans can be a lot more interesting (and complicated) than is commonly realized.</p>
<p>One query plan area I get asked about a lot is prefetching. It is not documented in full detail anywhere, so this seems like a good topic to address in a blog post.</p>
<p>The examples used in this article are based on questions asked by <a href="http://dataeducation.com/blog/">Adam Machanic</a>.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2013/08/sql-server-internals-nested-loops-prefetching.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-86131000422798002072013-08-28T09:30:00.001+12:002024-01-22T23:33:15.221+13:00Parameter Sniffing, Embedding, and the RECOMPILE Options<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Parameter Sniffing, Embedding, and the RECOMPILE Options</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><h2 id="parameter-sniffing">Parameter Sniffing</h2>
<p>Query parameterization promotes the reuse of cached execution plans, thereby avoiding unnecessary compilations, and reducing the number of ad-hoc queries in the plan cache.</p>
<p>These are all good things, <em>provided</em> the query being parameterized really ought to use the same cached execution plan for different parameter values. An execution plan that is efficient for one parameter value may <strong>not</strong> be a good choice for other possible parameter values.</p>
<p>When parameter sniffing is enabled (the default), SQL Server chooses an execution plan based on the particular parameter values that exist at compilation time. The implicit assumption is that parameterized statements are most commonly executed with the most common parameter values. This sounds reasonable enough (even obvious) and indeed it often works well.</p>
<p>A problem can occur when an automatic recompilation of the cached plan occurs. A recompilation may be triggered for all sorts of reasons, for example because an index used by the cached plan has been dropped (a <em>correctness</em> recompilation) or because statistical information has changed (an <em>optimality</em> recompile).</p>
<p>Whatever the exact <em>cause</em> of the plan recompilation, there is a chance that an <em>atypical</em> value is being passed as a parameter at the time the new plan is generated. This can result in a new cached plan (based on the sniffed atypical parameter value) that is not good for the majority of executions for which it will be reused.</p>
<p>It is not easy to predict when a particular execution plan will be recompiled (for example, because statistics have changed sufficiently) resulting in a situation where a good-quality reusable plan can be suddenly replaced by a quite different plan optimized for atypical parameter values.</p>
<p>One such scenario occurs when the atypical value is highly selective, resulting in a plan optimized for a small number of rows. Such plans will often use single-threaded execution, nested loops joins, and lookups. Serious performance issues can arise when this plan is reused for different parameter values that generate a much larger number of rows.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2013/08/parameter-sniffing-embedding-and.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-46397152229765389052013-08-21T05:11:00.000+12:002020-08-06T21:15:16.543+12:00Incorrect Results Caused By Adding an Index<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Incorrect Results Caused By Adding an Index</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>Say you have the following two tables, one partitioned and one not:</p>
<pre class=" language-sql"><code class="prism language-sql"><span class="token keyword">CREATE</span> <span class="token keyword">PARTITION</span> <span class="token keyword">FUNCTION</span> PF <span class="token punctuation">(</span><span class="token keyword">integer</span><span class="token punctuation">)</span>
<span class="token keyword">AS</span> RANGE <span class="token keyword">RIGHT</span>
<span class="token keyword">FOR</span> <span class="token keyword">VALUES</span> <span class="token punctuation">(</span><span class="token number">1000</span><span class="token punctuation">,</span> <span class="token number">2000</span><span class="token punctuation">,</span> <span class="token number">3000</span><span class="token punctuation">,</span> <span class="token number">4000</span><span class="token punctuation">,</span> <span class="token number">5000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">CREATE</span> <span class="token keyword">PARTITION</span> SCHEME PS
<span class="token keyword">AS</span> <span class="token keyword">PARTITION</span> PF
<span class="token keyword">ALL</span> <span class="token keyword">TO</span> <span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token keyword">PRIMARY</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">-- Partitioned</span>
<span class="token keyword">CREATE</span> <span class="token keyword">TABLE</span> dbo<span class="token punctuation">.</span>T1
<span class="token punctuation">(</span>
T1ID <span class="token keyword">integer</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span><span class="token punctuation">,</span>
SomeID <span class="token keyword">integer</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span><span class="token punctuation">,</span>
<span class="token keyword">CONSTRAINT</span> <span class="token punctuation">[</span>PK dbo<span class="token punctuation">.</span>T1 T1ID<span class="token punctuation">]</span>
<span class="token keyword">PRIMARY</span> <span class="token keyword">KEY</span> <span class="token keyword">CLUSTERED</span> <span class="token punctuation">(</span>T1ID<span class="token punctuation">)</span>
<span class="token keyword">ON</span> PS <span class="token punctuation">(</span>T1ID<span class="token punctuation">)</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token comment">-- Not partitioned</span>
<span class="token keyword">CREATE</span> <span class="token keyword">TABLE</span> dbo<span class="token punctuation">.</span>T2
<span class="token punctuation">(</span>
T2ID <span class="token keyword">integer</span> <span class="token keyword">IDENTITY</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span><span class="token punctuation">,</span>
T1ID <span class="token keyword">integer</span> <span class="token operator">NOT</span> <span class="token boolean">NULL</span><span class="token punctuation">,</span>
<span class="token keyword">CONSTRAINT</span> <span class="token punctuation">[</span>PK dbo<span class="token punctuation">.</span>T2 T2ID<span class="token punctuation">]</span>
<span class="token keyword">PRIMARY</span> <span class="token keyword">KEY</span> <span class="token keyword">CLUSTERED</span> <span class="token punctuation">(</span>T2ID<span class="token punctuation">)</span>
<span class="token keyword">ON</span> <span class="token punctuation">[</span><span class="token keyword">PRIMARY</span><span class="token punctuation">]</span>
<span class="token punctuation">)</span><span class="token punctuation">;</span>
</code></pre>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2013/08/incorrect-results-caused-by-adding-an-index.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-61283212268324632682013-07-24T09:00:00.000+12:002020-08-06T21:15:28.116+12:00Two Partitioning Peculiarities<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Two Partitioning Peculiarities</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>Table partitioning in SQL Server is essentially a way of making multiple physical tables (row sets) look like a single table. This abstraction is performed entirely by the query processor, a design that makes things simpler for users, but which makes complex demands of the query optimizer.</p>
<p>This post looks at two examples which exceed the optimizer’s abilities in SQL Server 2008 onward.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2013/07/two-partitioning-peculiarities.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-27029660202816153162013-07-18T10:45:00.000+12:002020-08-06T21:15:55.105+12:00Aggregates and Partitioning<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Aggregates and Partitioning</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>The changes in the internal representation of partitioned tables between SQL Server 2005 and SQL Server 2008 resulted in improved query plans and performance in the majority of cases (especially when parallel execution is involved).</p>
<p>Unfortunately, the same changes caused some things that worked well in SQL Server 2005 to suddenly not work so well in SQL Server 2008 and later.</p>
<p>This post looks at a one example where the SQL Server 2005 query optimizer produced a superior execution plan compared with later versions.</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2013/07/aggregates-and-partitioning.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0tag:blogger.com,1999:blog-3702702923592093288.post-66118507578722071212013-07-08T09:00:00.000+12:002020-08-06T21:16:11.892+12:00Working Around Missed Optimizations<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content=" width=device-width, initial-scale=1.0">
<title>Working Around Missed Optimizations</title>
<link rel="stylesheet" href="https://stackedit.io/style.css">
</head>
<body class="stackedit">
<div class="stackedit__html"><p>In <a href="https://www.sql.kiwi/2013/06/optimization-phases-and-missed.html">my last post</a>, we saw how a query featuring a scalar aggregate could be transformed by the optimizer to a more efficient form. As a reminder, here’s the schema again:</p>
</div></body></html></!DOCTYPE><a href="https://www.sql.kiwi/2013/07/working-around-missed-optimizations.html#more">Read the full article »</a>Paul Whitehttp://www.blogger.com/profile/04690243284528295117noreply@blogger.com0