Search code examples
c#web-scrapinghtml-agility-pack

Scrape Dynamic Data from Website Using C# HTMLAGILITYPACK


I am scraping data using HTMLAGILITY Pack, but the page doesn't load properly.

I need that my code should wait until the page is fully loaded.

There is some kind of work around to use browser in form, but I don't need to use that in form.

Here is the Link I need to scrap and following is my code.

HtmlWeb web = new HtmlWeb();
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
        HtmlAgilityPack.HtmlDocument doc = web.Load(website);
         var goldTypes = doc.DocumentNode.SelectNodes("//h2[@class='gold-box-title']").ToList();
       var goldPrices = doc.DocumentNode.SelectNodes("//span[@class='gold-box-price--sale'").ToList();

          for (int i = 0; i < 2; i++)
             {
               string  goldPrice = goldPrices[i].InnerText;
               string  goldType = goldTypes[i].InnerText;

             }

Solution

  • You were correct, all the data is available in structured json in the ":buyable" attribute of the "buyable-gold" elements.

    I did a quick test and this should be what you want. This will give you a list of structured objects with the data you need.

    HtmlWeb web = new HtmlWeb();
    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
    HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.ezrsgold.com/buy-runescape-gold");
    
    var buyGoldNodes = doc.DocumentNode.SelectNodes("//buyable-gold");
    
    var buyableJsonList = buyGoldNodes.Select(x => HttpUtility.HtmlDecode(x.Attributes[":buyable"].Value)).ToList();
    
    var buyables = buyableJsons.Select(x => JsonConvert.DeserializeObject<Buyable>(x)).ToList();
    

    Then your Buyable class would look something like this.

    public class Buyable
    {
        public int id { get; set; }
        public string sku { get; set; }
        public int game_id { get; set; }
        public string title { get; set; }
        public int min_qty { get; set; }
        public int max_qty { get; set; }
        public string base_price { get; set; }
        public string sale_price { get; set; }
        public Bulk_Price[] bulk_price { get; set; }
        public string delivery_time { get; set; }
        public string description { get; set; }
        public object sort_order { get; set; }
        public string created_at { get; set; }
        public string updated_at { get; set; }
        public string price { get; set; }
        public bool on_sale { get; set; }
        public int discount_from { get; set; }
    }
    
    public class Bulk_Price
    {
        public string qty { get; set; }
        public string price { get; set; }
    }