Search code examples
javascriptnode.jsgoogle-chrome-devtoolspuppeteerheadless-browser

Puppeteer: How to get the contents of each element of a nodelist?


I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element.

const tweets = await page.$$('.tweet');

From what I can tell, this returns a nodelist, just like the document.querySelectorAll() method in the browser.

How do I just loop over it and get what I need? I tried various stuff, like:

[...tweets].forEach(tweet => {
  console.log(tweet.innerText)
});

Solution

  • page.$$():

    You can use a combination of elementHandle.getProperty() and jsHandle.jsonValue() to obtain the innerText from an ElementHandle obtained with page.$$():

    const tweets = await page.$$('.tweet');
    
    for (let i = 0; i < tweets.length; i++) {
      const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
      console.log(tweet);
    }
    

    If you are set on using the forEach() method, you can wrap the loop in a promise:

    const tweets = await page.$$('.tweet');
    
    await new Promise((resolve, reject) => {
      tweets.forEach(async (tweet, i) => {
        tweet = await (await tweet.getProperty('innerText')).jsonValue();
        console.log(tweet);
        if (i === tweets.length - 1) {
          resolve();
        }
      });
    });
    

    page.evaluate():

    Alternatively, you can skip using page.$$() entirely, and use page.evaluate():

    const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));
    
    tweets.forEach(tweet => {
      console.log(tweet);
    });