Search code examples
c#encodingdownloadfilestreamstreamwriter

Downloading a simple space delimited file with C# yields garbage characters


When trying to initiate a download of a fixed width file in C# and opening the downloaded file with notepad the content comes out complete gibberish. See below as an example.

????????\@@@@@@@@@@@@@@@@@@@@@@@@???????????????????@?????????@????????????@@???????@???????????????????@@@@@@@@@@@??@@@@??@?????????????@@@@@@@@@@@@@@@@?????@@@@@@@@@@@@@@??@@???????@??????????k?????@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@?????????????????????

Here is my code used to perform the download.

char[] buffer = new char[2048];
using (var reader = new StreamReader(responseStream))
{
    using (var tw = new StreamWriter(DESTINATION + subFolder + files[files.Count - 1] + ".txt", false, Encoding.ASCII))
    {
        while (true)
        {
            int readCount = reader.Read(buffer, 0, buffer.Length);
            if (readCount == 0) break;
            tw.Write(buffer, 0, readCount);
        }
        responseStream.Close();
        tw.Close();
    }
}

I'm inclined to say it has something to do with the file encoding.

I've written the same downloader in python and the file downloads as expected with that code. I just can't figure it out with C#.

Update

The downloaded text still yeilds as garbage but if I import the data From Text using Microsoft Excel and set the file origin as 20924: IBM Latin-1 the text is readable. Is there some way of doing this programmatically during the download when the file is made?

Update

Any of the IBM file origin types decode the downloaded data.


Solution

  • Try this:

    StreamReader reader = new StreamReader(inputFilePath, Encoding.GetEncoding("IBM00924"));
    using (reader = File.OpenText(inputFilePath))
    { ... }
    

    In case you want to try all available encodings to see what yields readable data, use the Encoding.GetEncodings() method to iterate over all encodings, like so:

    foreach (var encoding in Encoding.GetEncodings())
    {
        // Read raw
        var bytes = File.ReadAllBytes(inputFilePath);
        var converted = Encoding.Convert(encoding, Encoding.Utf8, bytes);
        File.WriteAllText(Path.Combine(Path.GetDirectoryName(outputFilePath), encoding.Name + ".txt") converted);
    }
    

    Hope this helps!