select datalength(N','), unicode(N',')
returns 2
in the first column and 44
in the second. This means that commas take up two bytes, but can be stored using just the number 44. ,
is in the first 128 characters of unicode (44 is less than 128), so it should only take up one byte, just as 0x2C (the encoding of 44) does. Why does T-SQL report that it takes up two bytes?
Because it does take up 2 bytes. As an nvarchar
the character ,
is stored as 0x2C00
, not 0x2C
, and 0x2C00
is a 2 byte value. nvarchar
stores all characters as USC-2 or UTF-16 (depends on if you are using a supplementary characters collation) in a 2 byte pair that can be (some will require more, especially if you aren't in a supplementary characters collation).
If you want it to be stored using 1 byte, you could need to use a UTF-8 collation, and varchar
, and then characters that need more bytes would use more bytes (rather than being lost).