Search code examples
htmlimageparsingweb-scrapingxpath

Is it possible to find the current source from srcset using Xpath?


An example:

<img class="lazyautosizes lazyloaded" src="//cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_150x150.png?v=1583128930"
data-srcset="//cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_180x.png?v=1583128930 180w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_240x.png?v=1583128930 240w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_360x.png?v=1583128930 360w" 
srcset="//cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_180x.png?v=1583128930 180w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_240x.png?v=1583128930 240w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_360x.png?v=1583128930 360w">

I want to find the link from the srcset for the one which got rendered in the browser. Is there a way to write a xpath which points at that, say the 240w one? The tag has src but that is not the one rendered in the browser.

This is how I use that xpath in Puppeteer. I do not want to write specific logic for some specific type of xpath. -

const getXpathElement = await page.$x(xpath)
const promises = getXpathElement.map((element) => page.evaluate(el => {
                    return el.textContent
                 }, element));

Solution

  • Looks like currentSrc property solved this for me. Here is the complete working solution:

    const getXpathElement = await page.$x(xpath);
    const promises = getXpathElement.map((element) => page.evaluate(el => {
                        if (el.currentSrc && (el.tagName && el.tagName.toLowerCase() === "img"))     // when the xpath ends with tag name, Ex: //div//img
                            return el.currentSrc;
                        else            // when the xpath ends with property name, Ex: //div//div/@src
                            return el.textContent
                     }, element));