Search code examples
node.jsdompuppeteerqueryselector

Use post variable with querySelector


I'm facing an issue trying to scrape datas on the web with puppeteer and querySelector.

I have a nodeJS WebServer that handle a post query, and then call a function to scrape the datas. I'm sending 2 parameters (postBlogUrl & postDomValue).

PostDomValue will contains as string the selector I'm trying to fetch datas from, for example: [itemprop='articleBody'].

If I manually suggest the selector ([itemprop='articleBody']), everything is working well, I'm able to retrieve datas, but if i use the postDomValue var, nothing is returned.

I already tried to escape the var using CSS.escape(postDomValue), but no luck.

fetchBlogContent: async function(postBlogUrl, postDomValue) {
try {
  const puppeteer = require('puppeteer');
  const browser = await puppeteer.launch();
  page = await browser.newPage();
  await page.goto(postBlogUrl, {
    waitUntil: 'load'
  })
  let description = await page.evaluate(() => {
    //This works return document.querySelector("[itemprop='articleBody']").innerHTML;
    //This won't return document.querySelector(postDomValue).innerHTML;
  })
  return description
} catch (err) {
  // handle err
  return err;
 }
}

Solution

  • 
    const description = await page.evaluate((value) => 
        document.querySelector(value).innerHTML, JSON.stringify(postDomValue));
    

    See docs on how to pass args to page.evaluate() in puppeteer