About This Blog

Including my content from SQLBlog.com and some from SQLPerformance.com

Sunday 11 October 2020

sql_handle and the SQL Server batch text hash

sql_handle and the SQL Server batch text hash

This article describes the structure of a sql_handle and shows how the batch text hash component is calculated.

SQL handle structure

The 44-byte sql_handle has the following components (in order):

  1. 4 bytes: Byte-reversed integer code for the target cache e.g.
    • 0x02000000 (2) for SQL Plans (CACHESTORE_SQLCP)
    • 0x03000000 (3) for Object Plans (CACHESTORE_OBJCP)
  2. 4 bytes: Byte-reversed integer object id
    • For adhoc and prepared SQL this is an internal hash of the batch text as returned by @@PROCID or objectid in sys.dm_exec_plan_attributes.
  3. 16 bytes: MD5 hash of the batch text including parameter definitions as a prefix (if present).
  4. 20 bytes 0000000000000000000000000000000000000000

Batch Text Hash

The second component (object id) of the SQL handle is described in the documentation as:

This is the object ID stored in sys.objects for database objects (procedures, views, triggers, and so on). For plans of type “Adhoc” or “Prepared”, it is an internal hash of the batch text.

Modules are easy; it is the ad-hoc SQL and prepared statements we are interested in. What exactly is the “internal hash of the batch text”?

The first question to ask is, what is the batch text? For batches without parameters, it is exactly the text of the whole batch. For batches with parameters, the whole batch text is prefixed with the parameter definitions enclosed in parentheses.

In both cases, every character (including spaces, carriage returns and whatnot) is significant.

Computing the Hash

T-SQL lacks several functions commonly used by hashing routines, like bit shifting and integer arithmetic that ignores overflow. To make the hash computation somewhat easier to read and understand, I will start by creating several helper scalar functions. Performance is not the goal here, but each of these functions is capable of being in-lined on SQL Server 2019+.

-- Shift arithmetic right 2 places
CREATE FUNCTION dbo.SAR2 (@i integer)
RETURNS integer
WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING
AS
BEGIN
    RETURN IIF(@i >= 0, @i / 4, ~(~@i / 4));
END;
-- Shift left 5 places
CREATE FUNCTION dbo.SHL5 (@i integer)
RETURNS integer
WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING
AS
BEGIN
    RETURN
        CONVERT(integer, 
            CONVERT(binary(4), 
                CONVERT(bigint, @i) * 32));
END;
-- Integer add with wraparound
CREATE FUNCTION dbo.IntegerAdd (@x integer, @y integer)
RETURNS integer
WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING
AS
BEGIN
    RETURN
        CONVERT(integer,
            CONVERT(binary(4),
                CONVERT(bigint, @x) + CONVERT(bigint, @y)));
END;
-- Integer substraction with wraparound
CREATE FUNCTION dbo.IntegerSub (@x integer, @y integer)
RETURNS integer
WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING
AS
BEGIN
    RETURN
        CONVERT(integer,
            CONVERT(binary(4),
                CONVERT(bigint, @x) - CONVERT(bigint, @y)));
END;
-- Integer multiply with wraparound
CREATE FUNCTION dbo.IntegerMultiply (@x integer, @y integer)
RETURNS integer
WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING
AS
BEGIN
    RETURN
        CONVERT(integer,
            CONVERT(binary(4),
                CONVERT(bigint, @x) * CONVERT(bigint, @y)));
END;
GO

The hash computation itself cannot be in-lined, and contains several magic numbers:

CREATE OR ALTER FUNCTION dbo.BatchTextHash
(
    @BatchText nvarchar(max)
)
RETURNS integer
WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING
AS
BEGIN
    DECLARE 
        @bin varbinary(max) = CONVERT(varbinary(max), @BatchText, 0),
        @b integer = 0,
        @c integer = 0,
        @d integer = 0,
        @r integer = 0,
        @x integer = 1;

    DECLARE
        @8 integer = DATALENGTH(@bin) - 2,
        @9 integer = DATALENGTH(@bin);

    -- Step 1
    IF @8 <= 0 GOTO J2;

J1:
    SET @x += 4;

    SET @c = dbo.IntegerAdd(dbo.SHL5(@b), dbo.SAR2(@b));
    SET @c = dbo.IntegerAdd(@c, CONVERT(integer, SUBSTRING(@bin, @x - 3, 1) + SUBSTRING(@bin, @x - 4, 1)));

    SET @b ^= @c;

    SET @c = dbo.IntegerAdd(CONVERT(integer, SUBSTRING(@bin, @x - 1, 1) + SUBSTRING(@bin, @x - 2, 1)), dbo.SHL5(@d));

    SET @c = dbo.IntegerAdd(@c, dbo.SAR2(@d));

    SET @d ^= @c;

    IF @x < @8 GOTO J1;

J2:
    IF @x < @9
    BEGIN
        SET @c = dbo.IntegerAdd(dbo.SHL5(@b), dbo.SAR2(@b));
        SET @c = dbo.IntegerAdd(@c, CONVERT(integer, SUBSTRING(@bin, @x + 1, 1) + SUBSTRING(@bin, @x, 1)));

        SET @b ^= @c;
    END;

    -- Step 2
    SELECT 
        @d = dbo.IntegerMultiply(@d, 314159269),
        @b = dbo.IntegerMultiply(@b, 1179605760);

    SET @d = dbo.IntegerSub(@d, @b);

    SET @x = IIF(@d >= 0, 0, -1);
    SET @r = @d;
    SET @r ^= @x;
    SET @r = dbo.IntegerSub(@r, @x);

    SET @x = 
        CONVERT(integer, 
            SUBSTRING(
                CONVERT(binary(8), 
                    CONVERT(bigint, 1152921497) * 
                    CONVERT(bigint, @r)),
                1, 4));

    -- SAR (28) inlined
    SET @x = IIF(@x >= 0, @x / 268435456, ~(~@x / 268435456));

    SET @c = IIF(@x >= 0, 0, 1);
    SET @x = dbo.IntegerAdd(@x, @c);

    SET @x = dbo.IntegerMultiply(@x, 1000000007);
    SET @r = dbo.IntegerSub(@r, @x);
    IF @r = 0 SET @r = 1;

    RETURN @r;
END;

Test 1 — No parameters

This shows the @@PROCID for an ad-hoc SQL batch and the computed hash for the same text (including the carriage return and line feed):

GO
SELECT @@PROCID AS objectid;
GO
DECLARE @BatchText nvarchar(max) = 
N'SELECT @@PROCID AS objectid;
';

SELECT dbo.BatchTextHash(@BatchText);

Both return the value 836550104.

Test 2 — Prepared statement

This shows the @@PROCID for a parameterized batch executed with sp_executesql, and the same value computed using the hash function. Note the @Params text sent to sp_executesql is reused, after wrapping in parentheses:

DECLARE @SQL nvarchar(max) = 
    N'
    SELECT 
        cnt = COUNT_BIG(*), 
        objectid = @@PROCID 
    FROM master..spt_values AS N 
    WHERE 
        N.number < @n;';
DECLARE @Params nvarchar(max) = N'@n integer';

EXECUTE sys.sp_executesql 
    @SQL,
    @Params,
    @n = 100;

SELECT dbo.BatchTextHash
    (
        N'(' + 
        @Params + 
        N')' + 
        @SQL
    ) AS objectid;

Again, the @@PROCID and computes values are the same: 456334234.

Final Thoughts

The code above is provided as-is. I believe it is a faithful T-SQL translation of the SQL Server routine in sqllang!CSQLStringsStream::ComputeObjidAndLength, but it comes with no guarantees or support. I present for educational value, not for use in real applications.


This work is licensed under CC BY-NC-SA 4.0

5 comments:

  1. Possibly saving time to others.
    When creating the batch text to generate the hashes from, all char parameters must have also the size specified, ie. char(1), varchar(15) etc. varchar(max) also has to have a space after the closing bracket. I shit you not.

    example:
    (@Name varchar(max) )
    select * from Orders where Name = @Name

    ReplyDelete
  2. It looks like that it doesn't work when the query has some unicode characters such as € then the function generates different hash :(

    ReplyDelete
    Replies
    1. Despite the declaimer at the end of the text, I have found the problem and fixed it in the code above. There was an off-by-one error with SUBSTRING, and a failure to byte-swap for endianness. Works with double-byte characters now. Thank you for the feedback.

      Delete

All comments are reviewed before publication.