Search code examples
c#httphttpclient

HTTP GET returning binary in instead of text


I want to get a csv file from a http application, MicroStrategy, but in instead I get a binary that can be opened in Excel but not in a text editor.

When I use the same url in a browser it downloads as text and so can be opened in a text editor.

This are the relevant (I think) lines:

HttpClient cliente;
cliente = new HttpClient(handler) { BaseAddress = uri, Timeout = new TimeSpan(0, 30, 0) };
...
string csv;
responseMessage = await cliente.GetAsync(uri);
HttpContentHeaders contentHeaders = responseMessage.Content.Headers;
csv = await responseMessage.Content.ReadAsStringAsync();
File.WriteAllText(caminhoArquivo, csv, Encoding.UTF8);

The headers show the correct Content-Type:

responseMessage.Content.Headers = {Content-Length: 6188
Content-Disposition: attachment;filename=Grupo Cont%C3%A1bil.txt;
Content-Type: text/plain
}

Just reading bytes and writing bytes this is the file in a hex viewer:

byte[] bytes;
bytes = await responseMessage.Content.ReadAsByteArrayAsync();
File.WriteAllBytes(caminhoArquivo, bytes);

  Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F   
00000000: 47 00 72 00 75 00 70 00 6F 00 20 00 43 00 6F 00    G.r.u.p.o...C.o.
00000010: 6E 00 74 00 EF BF BD 00 62 00 69 00 6C 00 0D 00    n.t.o?=.b.i.l...
00000020: 0A 00 0D 00 0A 00 47 00 72 00 75 00 70 00 6F 00    ......G.r.u.p.o.

Solution

  • As was discovered through the comments, the issue was with encoding.

    Even though the client code used a string download routine, apparently the server did not specify the encoding [at all/correctly] and thus what was discovered to be UTF16 encoded text was interpreted as something else, most likely UTF8.

    The end result was that the file was a bit mangled, and was misinterpreted when opened up later.

    Fixing the code to download it as raw bytes, not decoding or encoding them but simply putting them into a file made it possible to open the file as UTF16 and get the text content.

    Ideally, the server should specify the encoding used when delivering the content, but sometimes when files are downloaded, it may be that the server code that is delivering the file does not know the encoding of the content of the file, typically because the file was produced by 3rd party code or was already on disk in an unknown encoding.