Search code examples
javascriptweb-crawlerinfinite-scrollapify

How to make the Apify Crawler to scroll full page when web page have infinite scrolling?


I'm facing a problem that I unable to get all the product data as the website using a lazy load on product catalog page. meaning it needs to scroll until the whole page loaded.

I getting only first-page products data.


Solution

  • First, you should keep in mind that there are infinite ways that infinite scroll can be implemented. Sometimes you have to click buttons on the way or do any sort of transitions. I will cover only the most simple use-case here which is scrolling down with some interval and finishing when no new products are loaded.

    1. If you build your own actor using Apify SDK, you can use infiniteScroll helper utility function. If it doesn't cover your use-case, ideally please give us feedback on Github.

    2. If you are using generic Scrapers (Web Scraper or Puppeteer Scraper), the infinite scroll functionality is not currently built-in (but maybe if you read this in the future). On the other hand, it is not that complicated to implement it yourself, let me show you a simple solution for Web Scraper's pageFunction.

    async function pageFunction(context) {
        // few utilities
        const { request, log, jQuery } = context;
        const $ = jQuery;
    
        // Here we define the infinite scroll function, it has to be defined inside pageFunction
        const infiniteScroll = async (maxTime) => {
            const startedAt = Date.now();
            let itemCount = $('.my-class').length; // Update the selector
            while (true) {
                log.info(`INFINITE SCROLL --- ${itemCount} items loaded --- ${request.url}`)
                // timeout to prevent infinite loop
                if (Date.now() - startedAt > maxTime) {
                    return;
                }
                scrollBy(0, 9999);
                await context.waitFor(5000); // This can be any number that works for your website
                const currentItemCount = $('.my-class').length; // Update the selector
    
                // We check if the number of items changed after the scroll, if not we finish
                if (itemCount === currentItemCount) {
                    return;
                }
                itemCount = currentItemCount;
            }
        }
    
        // Generally, you want to do the scrolling only on the category type page
        if (request.userData.label === 'CATEGORY') {
            await infiniteScroll(60000); // Let's try 60 seconds max
    
            // ... Add your logic for categories
        } else {
            // Any logic for other types of pages
        }
    }
    

    Of course, this is a really trivial example. Sometimes it can get much more complicated. I even once used Puppeteer to navigate my mouse directly and drag some scroll bar that was accessible programmatically.