Search code examples
c#html-agility-packoverwrite

Problem httpClient.GetStringAsync(URI) load


I have a problem of load with this method. In fact, I want to load a webpage to get the Html code. But the webpage doesn't have the time to load completely. So I want to add a thread.sleep() to this method. Do you know how I can do it ?

            var html = await httpClient.GetStringAsync(url); 
            HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
            htmlDocument.LoadHtml(html);

Solution

  • My boss and me, we found the solutions. There is a function in Selenium that can get all the html code from a website. And since Selenium loads the page completely before doing any interactions with the page, the html code is loaded completely. Here is the code :

    driver.Navigate().GoToUrl(url);
    driver.Manage().Window.Size = new System.Drawing.Size(1936, 1056);
    driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(10);
    var result = driver.FindElement(By.TagName("body")).GetAttribute("innerHTML");
    await StartCrawlerasync(result);
    
    public static async Task StartCrawlerasync(string html)
            {
                var Links = new List<string>();
                StringBuilder csvcontent = new StringBuilder();
                StringBuilder htmlcontent = new StringBuilder();
                string htmlpath = @"path\Test.html";
                File.WriteAllText(htmlpath, string.Empty);
                File.WriteAllText(htmlpath, html);
                string csvpath = @"path\Tous_les_Liens.csv";
                File.WriteAllText(csvpath, string.Empty);
    
                var httpClient = new HttpClient();
                HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
                await Task.Delay(5000);
                htmlDocument.LoadHtml(html);
    
                if (htmlDocument.DocumentNode.SelectNodes("//a") != null)
                {
                    foreach (HtmlNode link in htmlDocument.DocumentNode.SelectNodes("//a"))
                    {
                        Links.Add(link.Attributes["href"].Value);
                        csvcontent.AppendLine(link.Attributes["href"].Value);
                    };
    
                    foreach (string l in Links)
                    {
                        Console.WriteLine(l);
                    }
                }
                else
                {
                    Console.WriteLine("C''est vide");
                }
                File.WriteAllText(csvpath, csvcontent.ToString());        
            }