Search code examples
apify

How to use Apify SDK to automate selecting a javascript option to crawl a website


I have read the excellent documentation on using Apify SDK to crawl web sites but need a little help as the guide for PuppeteerCrawler is not completed

The part of the site I would like to crawl is a members' directory. The main page (which I believe I will need to provide as RequestList) contains links to the first 50 members of the directory. To get to the next 50 members, there is an option box which looks like this:

<span id="foo">Show:<select onchange="bar.pagerChanged(this);">
<option value="0">1-50</option>
<option value="50">51-100</option>
<option value="100">101-150</option>
...
<option value="2400">2401-2450</option>
</select>
</span>

I'm not sure how I would approach this, except that I think I will need PuppeteerCrawler given that user input (clicking on the option with id="foo") is required. What I need to do is to start with the top page, add all 50 links to the RequestQueue, then select the next batch of 50 members, and rinse and repeat.


Solution

  • I'm not sure exactly about details of the page but you can select any option with Puppeteer easily like this

    // First is selector to the select element and second is value of the option
    await page.select('#foo select', '50');
    

    In some rare cases this doesn't work. Then it can be solved by directly clicking on the select and then one of the displayed elements by simply using

    await page.click('#foo select');
    await page.waitFor(200);
    await page.click('selector-for-on-of-the-element-that-popped-up');
    

    If there are links inside every option, you can do simple lool

    const batchSize = 50;
    for (let i = 0; i < totalMembers; i += 50) {
        await page.select('#foo select', `${i}`); // i needs to be converted to a string
        const links = extractLinks(); // implement
        for (const url of links) {
            await requestQueue.addRequest({ url });
        }
    }