Search code examples
javascriptweb-scrapingxpathpuppeteer

Wait for an xpath in Puppeteer


On a page I'm scraping with Puppeteer, I have a list with the same id for every li. I am trying to find and click on an element with specific text within this list. I have the following code:

await page.waitFor(5000)

const linkEx = await page.$x("//a[contains(text(), 'Shop')]")

if (linkEx.length > 0) {
  await linkEx[0].click()
}

Do you have any idea how I could replace the first line with waiting for the actual text 'Shop'?

I tried await page.waitFor(linkEx), waitForSelector(linkEx) but it's not working.

Also, I would like to replace that a in the second line of code with the actual id (#activities) or something like that but I couldn't find a proper example.

Could you please help me with this issue?


Solution

  • Update: My answer is obsolete by 2024. See "waitForSelector" options in ggorlen's comment or weeix's answer. They are correct.


    page.waitForXPath what you need here.

    Example:

    const puppeteer = require('puppeteer')
    
    async function fn() {
      const browser = await puppeteer.launch()
      const page = await browser.newPage()
      await page.goto('https://example.com')
    
      // await page.waitForSelector('//a[contains(text(), "More information...")]') // ❌
      await page.waitForXPath('//a[contains(text(), "More information...")]') // ✅
      const linkEx = await page.$x('//a[contains(text(), "More information...")]')
      if (linkEx.length > 0) {
        await linkEx[0].click()
      }
    
      await browser.close()
    }
    fn()
    

    Try this for id-based XPath:

    "//*[@id='activities' and contains(text(), 'Shop')]"
    

    Did you know? If you right-click on an element in Chrome DevTools "Elements" tab and you select "Copy": there you are able to copy the exact selector or XPath of an element. After that, you can switch to the "Console" tab and with the Chrome API you are able to test the selector's content, so you can prepare it for your puppeteer script. E.g.: $x("//*[@id='activities' and contains(text(), 'Shop')]")[0].href should show the link what you expected to click on, otherwise you need to change on the access, or you need to check if there are more elements with the same selector etc. This may help to find more appropriate selectors.