Search code examples
c#htmldownloadhttprequestwebclient

Quick Download HTML Source in C#


I am trying to download a HTML source code from a single website (https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/) in C#.

The issue is that it takes 10 seconds to download a 30kb HTML page source. Internet connection is not an issue, as I am able to download 10Mb files in this program instantly.

The following has been executed both in a separate thread and in the main thread. It still takes 10-12 seconds to download.


1)

using (var httpClient = new HttpClient())
    {
        using (var request = new HttpRequestMessage(new HttpMethod("GET"), url))
        {
            var response = await httpClient.SendAsync(request);
        }
    }

2)

using (var client = new System.Net.WebClient())
    {
        client.Proxy = null;
        response = client.DownloadString(url);
    }

3)

using (var client = new System.Net.WebClient())
    {
        webClient.Proxy = GlobalProxySelection.GetEmptyWebProxy();
        response = client.DownloadString(url);
    }

4)

WebRequest.DefaultWebProxy = null;

using (var client = new System.Net.WebClient())
    {
        response = client.DownloadString(url);
    }

5)

var client = new WebClient()
response = client.DownloadString(url);

6)

var client = new WebClient()
client.DownloadFile(url, filepath);

7)

System.Net.WebClient myWebClient = new System.Net.WebClient();
WebProxy myProxy = new WebProxy();
myProxy.IsBypassed(new Uri(url));
myWebClient.Proxy = myProxy;
response = myWebClient.DownloadString(url);

8)

using var client = new HttpClient();
var content = await client.GetStringAsync(url);

9)

HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(Url);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();

I want a faster way to do this in C#.

Any information or help you can provide is much appreciated.


Solution

  • I know that this is dated, but I think I found the cause: I've encountered this at other sites. If you look at the response cookies, you will find one named ak_bmsc. That cookie shows that the site is running the Akamai Bot Manager. It offers bot protection, thus blocks requests that 'look' suspicious.

    In order to get a quick response from the host, you need the right request settings. In this case:

    • Headers:
      • Host: (their host data) www.faa.gov
      • Accept: (something like:) */*
    • Cookies:
      • AkamaiEdge = true

    example:

    class Program
        {
            private static readonly HttpClient _client = new HttpClient();
            private static readonly string _url = "https://www.faa.gov/air_traffic/flight_info/aeronav/aero_data/NASR_Subscription/";
    
            static async Task Main(string[] args)
            {
                var sw = Stopwatch.StartNew();
                using (var request = new HttpRequestMessage(HttpMethod.Get,_url))
                {
                    request.Headers.Add("Host", "www.faa.gov");
                    request.Headers.Add("Accept", "*/*");
                    request.Headers.Add("Cookie", "AkamaiEdge=true");
                    Console.WriteLine(await _client.SendAsync(request));
                }
                Console.WriteLine("Elapsed: {0} ms", sw.ElapsedMilliseconds);
            }
        }
    

    Takes 896 ms for me.

    by the way, you shouldn't put HttpClient in a using block. I know it's disposable, but it's not designed to be disposed.