I'm working with SQL Server database; database instance encoding is "SQL_Latin1_General_CP1_CI_AS".
The following code:
UPDATE ...
SET field = CHAR(136)
WHERE...
puts in field the following symbol: ˆ
But! In Latin1 code table 127-159 codes are just not defined! How comes it inserts this symbol?
And what's more confusing, when I read this field value to string variable in C# and convert it to char, I get the code 710 instead of 136.
I tried to use encoding conversion:
var latin1Encoding = Encoding.GetEncoding("iso-8859-1");
var test = latin1Encoding.GetBytes(field); // field is a string read from db
But in this case I get code 94 which is ^ (looks similar, but it's not the same, and I need exactly same).
But! In Latin1 code table 127-159 codes are just not defined!
In ISO-8859-1, character 136 is defined, but it's a seldom-used and largely pointless control character.
But SQL_Latin1_General_CP1_CI_AS, despite the name “Latin1”, is not ISO-8859-1. It's the Western European ANSI code page, 1252, which is similar to ISO-8859-1, but has a bunch of different symbols in the range 128–159.
Character 136 in code page 1252 is U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT, ˆ
; decimal code point number 710.
in this case I get code 94 which is ^
Yes, you're asking for a conversion to ISO-8859-1, which doesn't include the character U+02C6, so you get the “best fit fallback”, which is a character that looks a bit like that one you wanted. This is usually a bad thing; many of the fallbacks chosen are highly questionable. You can change this behaviour using an EncoderFallback, for example to throw an exception instead.