Search code examples
c#.netencodingnon-ascii-charactersfile-encodings

Remove '�' from different encoded file when reading in C#


I can't control what encoding some of our clients save a file, and when it's ASCII the file may have missing characters that then show, '�'. How can I remove these characters, '�', after the file is read?

I am reading the file with the below line, but for each column would like to replace that character with a whitespace in C# .NET.

   using (var parser = new TextFieldParser("", Encoding.UTF8))

Solution

  • Looks like you can create a UTF-8 Encoding with a custom error replacement:

    var encoding = Encoding.GetEncoding(
        "UTF-8",
        null,
        new DecoderReplacementFallback(string.Empty));
    
    using (var parser = new TextFieldParser("", encoding)) {
        ⋮
    }
    

    I don’t know if the encoder fallback is allowed to be null. Replace it with new EncoderReplacementFallback(string.Empty) if not!