Search code examples
puppeteer-sharp

WaitUntil not waiting / Get HTML on WaitForSelectorAsync


Having two problems that I would appreciate some advise on. Have used puppeteer in the past in node, but for some reason, running into a problem on the sharp version.

Basically I'm crawling a webpage with a WaitUntil set to WaitUntilNavigation.Networkidle0, the longest wait period. In my node code, this runs and loads my website correctly, but in the C# version, I get the page without angular loaded. From the best I can tell it is not waiting and returning the initial Load state. Below is my code.

        if (BROWSER == null)
        {
            await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);

            BROWSER = await Puppeteer.LaunchAsync(new LaunchOptions
            {
                Headless = true,
                Args = new string[] { "--no-sandbox", "--disable-accelerated-2d-canvas", "--disable-gpu", "--proxy-server='direct://'", "--proxy-bypass-list=*" }
            });
        }

        if (page == null)
        {
            page = await BROWSER.NewPageAsync();
            await page.SetUserAgentAsync("PScraper-SiteCrawler");
            await page.SetViewportAsync(new ViewPortOptions() { Width = 1024, Height = 842 });

            var response = await page.GoToAsync(url, new NavigationOptions() { Referer = "PScraper-SiteCrawler", Timeout = timeoutMilliseconds, WaitUntil = new[] { WaitUntilNavigation.Networkidle0 } });
        }

Timeout is set to 30 seconds, or 30,000 milliseconds. I then get the html of the page doing

await reponse.TextAsync()

My second question is unrelated, but likely simpler to solve. One route I was considering was using the page.WaitForSelectorAsync() method. This appears to wait until the content I'm looking for is loaded, but I haven't been able to figure out how to get the entire html of the page after this is done from the ElementHandle return.

Would appreciate some help here, tried a couple routes and haven't been able to figure out whats causing the difference between the node and C# code.


Solution

  • Solved my problem. The issue was how I was getting the html of the page.

    I was using...

    await reponse.TextAsync()
    

    Apparently, this gets me only the initial response. When I changed my html get to the following line of code everything worked as expected.

    await page.GetContentAsync()