Search code examples
.netsql-serverperformancesearchlarge-data

Improve performance when Searching SQL by Large Data Type?


SCENARIO:

I'm using a fingerprint identification system on .Net. Each fingerprint template is captured as bytes array type with a length of 1632, but most of them doesn't use all of the length. I insert them to SQL database as VARBINARY data type "Template" column.

Now to perform authentication, I need to compare my input fingerprint sample with the "Template" column (which is VARBINARY data type) in each SQL records.

PROBLEM:

I know that there will be a performance issue, as the "Template" column contains a large data type. My question is: what is the best way to search for a matching record based on a large VARBINARY data type column in my scenario? Any help is appreciated.


Solution

  • The idea - fast search only use 16 bytes (stored computed column), use the index

    MSDN HashBytes

    MSDN Computed persisted column

    CREATE TABLE [Fingerprint] (
        [ID] int NOT NULL IDENTITY 
            CONSTRAINT PK_Fingerprint PRIMARY KEY CLUSTERED
    --  ,[UserID] int NOT NULL
        ,[FingerprintBin]  VARBINARY(1632) NULL
        ,[SearchCriteria] as CAST(HASHBYTES('MD5',[FingerprintBin]) as BINARY(16)) PERSISTED
    )
    
    CREATE NONCLUSTERED INDEX [IX_Fingerprint_SearchCriteria] ON [Fingerprint] ([SearchCriteria]);
    GO
    

    Test data

    INSERT INTO [Fingerprint] ([FingerprintBin])
        VALUES (0x010203040506)
    GO
    -- Insert 1000 random value
    INSERT INTO [Fingerprint] ([FingerprintBin])
        VALUES (NEWID())
    GO 1000
    

    Fast search

    DECLARE @FingerprintBin  VARBINARY(1632) = 0x010203040506
    
    -- Index seek in execution plan
    SELECT [ID]
    FROM [Fingerprint]
    WHERE [SearchCriteria] = CAST(HASHBYTES('MD5',@FingerprintBin) as BINARY(16))
        AND [FingerprintBin] = @FingerprintBin;
    GO