Search code examples
c#.netunicodeutf-16iso-8859-1

Using StreamWriter to write to file a C# string with accented letters using ISO-8859-1 encoding


I have been stumped with this problem of converting a string (which I am assuming is in UTF-16/Unicode) in C# to output to file using ISO-8859-1 encoding.

string s = "Gibt es ein Restaurant in der Nähe";
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding unicode = Encoding.Unicode;
byte[] unicodeBytes = Encoding.Unicode.GetBytes(s);
byte[] isoBytes = Encoding.Convert(unicode, iso, unicodeBytes);

// convert the new byte[] to char[]
char[] isoChars = new char[iso.GetCharCount(isoBytes, 0, isoBytes.Length)];
iso.GetChars(isoBytes, 0, isoBytes.Length, isoChars, 0);

StreamWriter sw = new StreamWriter(output, iso);
sw.Write(isoChars, 0, isoChars.Length);
sw.Write(Environment.NewLine, 0, Environment.NewLine.Length); '

My output text file shows the text with question mark:

Gibt es ein Restaurant in der N?he


Solution

  • One thing to understand here is .Net strings and characters are ALWAYS encoded using .Net's internal encoding (UTF-16 or the system code page for Framework and UTF-8 for Core). Therefore translating an exported byte array to a new encoding and loading back to a char[] will not help you if you need a specific encoding. Instead, you have to write and read raw bytes.

    However, using the correct encoding with the StreamWriter itself should handle everything you need, meaning you should be able to simplify the code like this:

    string s = "Gibt es ein Restaurant in der Nähe";
    var iso = Encoding.GetEncoding("iso-8859-1");
    using (var sw = new StreamWriter(output, iso))
    {
        sw.WriteLine(s);
    }
    

    Finally, in observing the result, make sure to use a text editor that will understand the chosen encoding. It's possible to do everything right in your code and still see the bad character if you check it in an editor or font that doesn't know how to display that glyph.