Search code examples
sql-serverprimary-keymd5collationcase-sensitive

SQL Server / hexadecimal string as primary key, which collation to use?


I am working on a database where MD5 hashes are used as primary keys on some tables, with the type char(32).

In the past, I only worked with databases using int for primary key but on this project, I have no choice. I wonder, what collation(s) should be used to deal with a char(32) primary key (which contains an hexadecimal string) ?

EDIT :

Currently, the collation of the database is French_CI_AS. I would like to know if it's OK for such kind of primary key or if I should switch to a CS or BIN2 collation to have better performance ?


Solution

  • If you haven't changed the collation during setup, according to this Microsoft Article "When SQL Server Setup detects that the computer is running the U.S. English system locale, Setup automatically selects the Dictionary order, case-insensitive, for use with 1252 character set SQL Server 2008 collation ..."

    There are many Sort order for a collation:

    • Binary (BIN): Sorts and compares data in SQL Server tables based on the bit patterns defined for each character. Binary sort order is case-sensitive and accent-sensitive. Binary is also the fastest sorting order
    • Binary-Point (BIN2): Sorts and compares data in SQL Server tables based on Unicode code points for Unicode data. For non-Unicode data, Binary-code point will use comparisons identical to binary sorts
    • Case Sensitive (CS): Distinguishes between uppercase and lowercase letters. If selected, lowercase letters sort ahead of their uppercase versions.
    • Accent Sensitive (AS): Distinguishes between accented and unaccented characters. For example, 'a' is not equal to 'ấ'.
    • Kana-sensitive (KS): Distinguishes between the two types of Japanese kana characters: Hiragana and Katakana.
    • Width-Sensitive (WS): Distinguishes between a single-byte character and the same character when represented as a double-byte character.

    And Hexadecimal values are not case sensitive nor width,kana sensitive. And as you provided in comments the default collation is French_CI_AS (which means French - case insensitive - Accent sensitive) so it is case insensitive (which is good) and it is accent sensitive (hexadecimal doesn't contains accented characters so this feature is not necessary, but if chosen it will not affect anything).

    So the French_CI_AS collation is good for your case

    References


    Side Note: selecting a HASH code as primary key is not recommended at all!! It is better to choose an integer as primary key