Search code examples
c#sql-serverencodingcharacter-encodinglatin1

Encoding issue on SQL Server


I'm working with SQL Server database; database instance encoding is "SQL_Latin1_General_CP1_CI_AS".

The following code:

UPDATE ...
SET field = CHAR(136)
WHERE...

puts in field the following symbol: ˆ

But! In Latin1 code table 127-159 codes are just not defined! How comes it inserts this symbol?

And what's more confusing, when I read this field value to string variable in C# and convert it to char, I get the code 710 instead of 136.

I tried to use encoding conversion:

var latin1Encoding = Encoding.GetEncoding("iso-8859-1");
var test = latin1Encoding.GetBytes(field); // field is a string read from db

But in this case I get code 94 which is ^ (looks similar, but it's not the same, and I need exactly same).


Solution

  • But! In Latin1 code table 127-159 codes are just not defined!

    In ISO-8859-1, character 136 is defined, but it's a seldom-used and largely pointless control character.

    But SQL_Latin1_General_CP1_CI_AS, despite the name “Latin1”, is not ISO-8859-1. It's the Western European ANSI code page, 1252, which is similar to ISO-8859-1, but has a bunch of different symbols in the range 128–159.

    Character 136 in code page 1252 is U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT, ˆ; decimal code point number 710.

    in this case I get code 94 which is ^

    Yes, you're asking for a conversion to ISO-8859-1, which doesn't include the character U+02C6, so you get the “best fit fallback”, which is a character that looks a bit like that one you wanted. This is usually a bad thing; many of the fallbacks chosen are highly questionable. You can change this behaviour using an EncoderFallback, for example to throw an exception instead.