Having two problems that I would appreciate some advise on. Have used puppeteer in the past in node, but for some reason, running into a problem on the sharp version.
Basically I'm crawling a webpage with a WaitUntil set to WaitUntilNavigation.Networkidle0, the longest wait period. In my node code, this runs and loads my website correctly, but in the C# version, I get the page without angular loaded. From the best I can tell it is not waiting and returning the initial Load state. Below is my code.
if (BROWSER == null)
{
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
BROWSER = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new string[] { "--no-sandbox", "--disable-accelerated-2d-canvas", "--disable-gpu", "--proxy-server='direct://'", "--proxy-bypass-list=*" }
});
}
if (page == null)
{
page = await BROWSER.NewPageAsync();
await page.SetUserAgentAsync("PScraper-SiteCrawler");
await page.SetViewportAsync(new ViewPortOptions() { Width = 1024, Height = 842 });
var response = await page.GoToAsync(url, new NavigationOptions() { Referer = "PScraper-SiteCrawler", Timeout = timeoutMilliseconds, WaitUntil = new[] { WaitUntilNavigation.Networkidle0 } });
}
Timeout is set to 30 seconds, or 30,000 milliseconds. I then get the html of the page doing
await reponse.TextAsync()
My second question is unrelated, but likely simpler to solve. One route I was considering was using the page.WaitForSelectorAsync() method. This appears to wait until the content I'm looking for is loaded, but I haven't been able to figure out how to get the entire html of the page after this is done from the ElementHandle return.
Would appreciate some help here, tried a couple routes and haven't been able to figure out whats causing the difference between the node and C# code.
Solved my problem. The issue was how I was getting the html of the page.
I was using...
await reponse.TextAsync()
Apparently, this gets me only the initial response. When I changed my html get to the following line of code everything worked as expected.
await page.GetContentAsync()