I have a problem of load with this method. In fact, I want to load a webpage to get the Html code. But the webpage doesn't have the time to load completely. So I want to add a thread.sleep() to this method. Do you know how I can do it ?
var html = await httpClient.GetStringAsync(url);
HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(html);
My boss and me, we found the solutions. There is a function in Selenium that can get all the html code from a website. And since Selenium loads the page completely before doing any interactions with the page, the html code is loaded completely. Here is the code :
driver.Navigate().GoToUrl(url);
driver.Manage().Window.Size = new System.Drawing.Size(1936, 1056);
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(10);
var result = driver.FindElement(By.TagName("body")).GetAttribute("innerHTML");
await StartCrawlerasync(result);
public static async Task StartCrawlerasync(string html)
{
var Links = new List<string>();
StringBuilder csvcontent = new StringBuilder();
StringBuilder htmlcontent = new StringBuilder();
string htmlpath = @"path\Test.html";
File.WriteAllText(htmlpath, string.Empty);
File.WriteAllText(htmlpath, html);
string csvpath = @"path\Tous_les_Liens.csv";
File.WriteAllText(csvpath, string.Empty);
var httpClient = new HttpClient();
HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
await Task.Delay(5000);
htmlDocument.LoadHtml(html);
if (htmlDocument.DocumentNode.SelectNodes("//a") != null)
{
foreach (HtmlNode link in htmlDocument.DocumentNode.SelectNodes("//a"))
{
Links.Add(link.Attributes["href"].Value);
csvcontent.AppendLine(link.Attributes["href"].Value);
};
foreach (string l in Links)
{
Console.WriteLine(l);
}
}
else
{
Console.WriteLine("C''est vide");
}
File.WriteAllText(csvpath, csvcontent.ToString());
}