Search code examples
sqlt-sqlsql-server-2012hashbytes

Storing HASHBYTES output in NVARCHAR vs BYTES


I am going to create:

  • a table for storing IDs and unique text values (which are expected to be large)
  • a stored procedure which will have a text value as input parameter (it will check if the value exists in the above table and return the corresponding ID if it exists, or inserted a new record if not and return the new ID as well)

I want to optimize the search of text values using hash value of the text and created index on it. So, during the search I expect a non-clustered index to be used (not the clustered index).

I decided to use the HASHBYTES with SHA2_256 and I am wondering are there any differences/benefits if I am storing the hash value as BINARY(32) or NVARCHAR(16)?


Solution

  • You can't reasonably store a hash value as chars because binary data is not text. Various text processing and comparison functions interpret those chars. For example trailing whitespace is sometimes ignored leading to incorrect results.

    Since you've got 32 totally random unstructured bytes to store a binary(32) is the most natural format and it is the fastest one.