I am currently trying to webscrap some entries from a table of a website but when I make the get request, the string response does not include those entries that are shown in the website.
Here is the website: https://www.services-rte.com/en/view-data-published-by-rte/downtime-of-generation-resources.html
My guess is that I need to make a Post request to load the table but I can't find exactly what to post. Correct me if I am wrong.
Here is my code
static async void GetEntries()
{
var services = new ServiceCollection();
services.AddHttpClient();
var serviceProvider = services.BuildServiceProvider();
var httpClientFactory = serviceProvider.GetService<IHttpClientFactory>();
var client = httpClientFactory.CreateClient();
string response = string.Empty;
try
{
response = await client.GetStringAsync("https://www.services-rte.com/en/view-data-published-by-rte/downtime-of-generation-resources.html");
}
catch
{
Console.WriteLine("Site not found.");
return;
}
var parser = new HtmlParser();
var document = parser.ParseDocument(response);
string content = string.Empty;
for (int i = 1; i <= 20; i++)
{
try
{
Console.WriteLine(i);
content = document.QuerySelector($"#wrapper > div > div > div.c-editorial-page__container > div.c-editorial-page__content > ctx-remit-generation-unavailability > cortex-remit-generation-unavailability-table > cortex-table > div > div.ctx__table_content > cortex-table-row:nth-child({i})").TextContent;
}
catch
{
Console.WriteLine($"CSS selector not found for {i}.");
continue;
}
Console.WriteLine(content);
Console.WriteLine("NEW");
}
}
Error in this line: content = document.QuerySelector($"#wrapper > div > div > div.c-editorial-page__container > div.c-editorial-page__content > ctx-remit-generation-unavailability > cortex-remit-generation-unavailability-table > cortex-table > div > div.ctx__table_content > cortex-table-row:nth-child({i})").TextContent;
Object reference not set to an instance of an object.
I think the data is loaded async? I mean the table on the website. I had this problem once: I did see the HTML on the website, but when I did a request via C# I couldn't find the HTML.
What you can do is use something like Selenium. I know this might not be the best answer because I cannot really show you how to use it, but there is a plugin of Selenium you can use in C#. This can work with websites that load data async.
Maybe this website can help you: https://www.scrapingdog.com/blog/web-scraping-with-csharp/ (not mine, but it looks promising).