Search code examples
javascriptgoogle-chromexpathpuppeteer

How to use xpath in chrome headless+puppeteer evaluate()?


How can I use $x() to use xpath expression inside a page.evaluate() ?

As far as page is not in the same context, I tried $x() directly (like I would do in chrome dev tools), but no cigar.

The script goes in timeout.


Solution

  • $x() is not a standard JavaScript method to select element by XPath. $x() it's only a helper in chrome devtools. They claim this in the documentation:

    Key point: These functions only work when you call them from the Chrome DevTools Console. They won't work if you try to call them in your scripts.

    And page.evaluate() is treated here as a "scripts on the page".

    You have two options:

    1. Use document.evaluate

    Here is a example of selecting element (featured article) inside page.evaluate():

    const puppeteer = require('puppeteer');
    
    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });
    
        const text = await page.evaluate(() => {
            // $x() is not a JS standard -
            // this is only sugar syntax in chrome devtools
            // use document.evaluate()
            const featureArticle = document
                .evaluate(
                    '//*[@id="mp-tfa"]',
                    document,
                    null,
                    XPathResult.FIRST_ORDERED_NODE_TYPE,
                    null
                )
                .singleNodeValue;
    
            return featureArticle.textContent;
        });
    
        console.log(text);
        await browser.close();
    })();
    
    1. Select element by Puppeteer page.$x() and pass it to page.evaluate()

    This example achieves the same results as in the 1. example:

    const puppeteer = require('puppeteer');
    
    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
        await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });
    
        // await page.$x() returns array of ElementHandle
        // we are only interested in the first element
        const featureArticle = (await page.$x('//*[@id="mp-tfa"]'))[0];
        // the same as:
        // const featureArticle = await page.$('#mp-tfa');
    
        const text = await page.evaluate(el => {
            // do what you want with featureArticle in page.evaluate
            return el.textContent;
        }, featureArticle);
    
        console.log(text);
        await browser.close();
    })();
    

    Here is a related question how to inject $x() helper function to your scripts.