Search code examples
csvhelper

How to remove non-ascii characters in CsvHelper


Is there a way to remove non-ascii characters with configuration in CsvHelper instead of writing the conversion in application code?

I saved an Excel to CSV and found some values like AbsMarketValue������������� and I would like to get rid of the non-ASCII characters.

csv.Configuration.Encoding = Encoding.ASCII did not work.

With reference to How can you strip non-ASCII characters from a string? (in C#)

string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);

The above approach works for me but I want to avoid this since this requires me to add this type of code in application for any text field.

I tried to do this in the conversion map but that did not work.


Solution

  • Using a type converter, you could have all string properties only output ASCII characters.

    void Main()
    {
        using (var reader = new StringReader("Id,Name\n1,AbsMarketValue�������������"))
        using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
        {
            csv.Context.TypeConverterCache.AddConverter<string>(new AsciiOnlyConverter());
            
            var records = csv.GetRecords<Foo>();
        }
    }
    
    public class Foo
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }
    
    
    public class AsciiOnlyConverter : StringConverter
    {
        public override object ConvertFromString(string text, IReaderRow row, MemberMapData memberMapData)
        {
            var ascii = Regex.Replace(text, @"[^\u0000-\u007F]+", string.Empty);
            
            return base.ConvertFromString(ascii, row, memberMapData);
        }
    }