Search code examples
.nethttpclientdecode

How to get HttpClient to correctly decode this site


I am trying to read the text content of some uri's; the basic

httpClient.GetStringAsync(uri);

works fine for other sites, but it doesn't for https://abcapplepieoptiontrades.com (response begins with \u001f and there seems to be some binary characters mixed in there). The web site displays fine in web browsers and in fiddler.

I then tried

using (HttpResponseMessage response = httpClient.GetAsync(uri).Result)
{
  var byteArray = response2.Content.ReadAsByteArrayAsync().Result;
  response = Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
}

But that doesn't work either; result is the same as when using httpClient.GetStringAsync(). I tried all decodings listed in Encoding.*, and none of them worked. How do I get the properly-decoded text content of this Uri?


Solution

  • At first I didn't think @Nkosi's suggestion would work, since Fiddler's view of my own httpclient request's response worked just fine (so I thought it was just a decoding issue, and it would be nice to figure out how Fiddler successfully decoded my request's response). But, after trying it, adding all those headers does cause httpClient.GetStringAsync() to work. I had to add other headers in addition to User-Agent - Accept-Language, Accept, Accept-Encoding. But in the end, this worked.

    Edit: I spoke too soon. It seems there was a whole bunch of conflating issues. I noticed the issue happening again after adding headers, but only for some web sites (which were all running IIS, I think). What really confused me was that everything was working fine while Fiddler was capturing traffic; when it wasn't, the issue described in the question would manifest itself.

    Then I figured that the web sites causing issues were compressing their response, and httpClient was not automatically decompressing it. I modified the creation of httpClient as follows:

    HttpClient httpClient = new HttpClient(new HttpClientHandler() { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate });
    

    That seemed to solve the issue with some web sites but not all. Then I remembered that, when trying to mimic a web browser's headers as described above, I used:

    httpClient.DefaultRequestHeaders.AcceptEncoding.ParseAdd("gzip, deflate, br");
    

    Commenting that out seemed to resolve all my issues.