Search code examples
c#htmlhttpparsinghttpclient

Can't get HTML code of a page by URL address


I need to get the html code of a page at url address https://bakerhughesrigcount.gcs-web.com/intl-rig-count. I tried using HttpClient, but the request processing time is exceeded. Maybe this site has anti-bot protection? I tried adding User-Agent and Accept headers to the request to make it look more authentic and match a normal browser request, but it didn't work

string url = "https://bakerhughesrigcount.gcs-web.com/intl-rig-count";

        using (HttpClient client = new HttpClient())
        {
            try
            {
                client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36");
                client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");

                HttpResponseMessage response = await client.GetAsync(url);
                response.EnsureSuccessStatusCode();

                string htmlContent = await response.Content.ReadAsStringAsync();
                Console.WriteLine(htmlContent);
            }
            catch (HttpRequestException e)
            {
                Console.WriteLine($"Error: {e.Message}");
            }
        }

I also tried using Selenium, with its help I was able to get html code, but how to do it without using this and similar tools?


Solution

  • I would suggest you to pass these headers as well.

    client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
    client.DefaultRequestHeaders.Add("Accept-Encoding", "deflate,br");
    

    Here is the sample code which is working:

    using System;
    using System.Net.Http;
    using System.Threading.Tasks;
                        
    public class Program
    {   
        public static async Task Main()
        {
            string url = "https://bakerhughesrigcount.gcs-web.com/intl-rig-count/";
    
            using (HttpClient client = new HttpClient())
            {
                try {
                    client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36");
                    client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8");
                    client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
                    client.DefaultRequestHeaders.Add("Accept-Encoding", "deflate,br");
                    var content = await client.GetStringAsync(url);
                    Console.WriteLine(content);
                } catch (HttpRequestException e) {
                    Console.WriteLine($"Error: {e.Message}");
                }
            }
        }
    }
    

    Screenshot:

    enter image description here