Search code examples
c#csvhelper

CSVHelper - Trouble with special Characters


For our current project, i am using the CSVHelper Nuget and everything works perfectly with it with the only exception when the field contains special characters (ä,ü,...). How can I change it to make it work and not show ? as the letter replacement? (I tried Current and Invariant Culture but it didn't matter).

I tried changing the Culture when reading the byte stream from the file and I tried using different Cultures when parsing the CSV.


Solution

  • I often have this issue when someone saves an Excel file as CSV (Comma delimited)(*.csv) rather than as CSV UTF-8 (Comma delimited)(*.csv). Depending on the country it is saved in, this often means it was saved as Windows 1252 encoding. In most cases, you can get away with using ISO-8859-1 encoding, also known as Latin-1 encoding, when reading the file with StreamReader. If you still have some characters that are not getting read correctly, you may have to use the exact encoding that was used to save the file.

    ISO-8859-1 (also called Latin-1) is identical to Windows-1252 (also called CP1252) except for the code points 128-159 (0x80-0x9F). ISO-8859-1 assigns several control codes in this range. Windows-1252 has several characters, punctuation, arithmetic and business symbols assigned to these code points. https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

    In .NET Core it looks like you are a bit limited as to the number of encodings available to you.

    The example produces the following output when run on .NET Core:

    Info.CodePage Info.Name Info.DisplayName
    1200 utf-16 Unicode
    1201 utf-16BE Unicode (Big-Endian)
    12000 utf-32 Unicode (UTF-32)
    12001 utf-32BE Unicode (UTF-32 Big-Endian)
    20127 us-ascii US-ASCII
    28591 iso-8859-1 Western European (ISO)
    65000 utf-7 Unicode (UTF-7)
    65001 utf-8 Unicode (UTF-8)
    void Main()
    {
        using var reader = new StreamReader(@"C:\Users\myName\Documents\TestUmlauts.csv", 
            Encoding.Latin1);
        using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
        
        var records = csv.GetRecords<Foo>();
    }
    
    public class Foo 
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }