Search code examples
c#xmllocalizationculture

Why does "i" get replaced with "ı"


I received a crash report from an application which was trying to read XML from a file it had previously written. After requesting the user send me the file, I compared it with what should have been written and found a really odd problem I haven't come across before.

Some (but not all) of the i characters had been replaced with ı - a dotless i. For example, a node named "title" was fine, but a node named "initialdirectory" had the first i replaced, the second was left alone, i.e. ınitialdirectory.

Until today I wasn't even aware there was such a character, but now I do and I just don't know how it was written like that - the XML was written using an XmlWriter with UTF8 encoding. Just a normal everyday write, nothing complicated.

I normally (well, since getting Resharper and it yells at me for skipping the parameter) use StringComparison.OrdinalIgnoreCase when doing IndexOf etc, but I'm at a loss on how I'm supposed to do this when writing data, unless I'm supposed to start changing thread cultures.

Has anyone experienced a similar issue before, and if so, what's the best way to deal with it?


Solution

  • In Turkish there are two i's: one with a dot, i, and one without a dot, ı. In upper case the first one has a dot, İ, and the second one hasn't, I.

    At some point your program is converting InitialDirectory to lower case according to the default locale, which is known to be Turkish in some parts of the world. To fix the problem you can convert cases using a fixed, known locale, such as American English.

    Update: Even better, use the ToLowerInvariant() method which converts a string to lower case in the "invariant culture".