Search code examples
t-sqlunicodebytenvarchar

Why do commas take up two bytes?


select datalength(N','), unicode(N',') returns 2 in the first column and 44 in the second. This means that commas take up two bytes, but can be stored using just the number 44. , is in the first 128 characters of unicode (44 is less than 128), so it should only take up one byte, just as 0x2C (the encoding of 44) does. Why does T-SQL report that it takes up two bytes?


Solution

  • Because it does take up 2 bytes. As an nvarchar the character , is stored as 0x2C00, not 0x2C, and 0x2C00 is a 2 byte value. nvarchar stores all characters as USC-2 or UTF-16 (depends on if you are using a supplementary characters collation) in a 2 byte pair that can be (some will require more, especially if you aren't in a supplementary characters collation).

    If you want it to be stored using 1 byte, you could need to use a UTF-8 collation, and varchar, and then characters that need more bytes would use more bytes (rather than being lost).