Search code examples
imagehtml-parsingchromeless

Chromeless - get all images src from a webpage


I'm trying to get the src values for all img tags in an HTML page using Chromeless. My current implementation is something like this:

async function run() {
    const chromeless = new Chromeless();
    let url = 'http://someurl/somepath.html';

    var allImgUrls = await chromeless
        .goto(url)
        .evaluate(() => document.getElementsByTagName('img'));

    var htmlContent = await chromeless
        .goto(url)
        .evaluate(() => document.documentElement.outerHTML );

    console.log(allImgUrls);

    await chromeless.end()
}

The issue is, I'm not getting any values of img object in the allImgUrls.


Solution

  • After some research, found out that we could use this approach:

    var imgSrcs = await chromeless
            .goto(url)
            .evaluate(() => {
                /// since document.querySelectorAll doesn't actually return an array but a Nodelist (similar to array)
                /// we call the map function from Array.prototype which is equivalent to [].map.call()
                const srcs = [].map.call(document.querySelectorAll('img'), img => img.src);
                return JSON.stringify(srcs);
            });