I'm trying to scrap data from a website and getting back basic HTML with JS function in the body

Hi everyone,

I'm playing around with Node.js and cheerio package as part of my node.js learning and im trying to build a web scrapper that will get the title and the price of an item from a shopping site but when I try to console.log the html variable it returns a basic html structure with some Js functions that are trying to prevent the scraping.

my code:

const needle = require('needle')
const http = require('http')
const cheerio = require("cheerio")

needle.get('', (error, response, html) => {
    if (!error && response.statusCode == 200){
        const $ = cheerio.load(html)



        http.createServer(function (req, res) {
            res.writeHead(200, {'Content-Type': 'text/html'});


I guess it's some kind of protection layer from scrapers but this what i get as a result:

<html lang="he">

    <meta charset="utf-8" />
    <link rel="icon" id="header-icon" href="/web/favicon.ico">
    <link rel="canonical" id="header-canonical">
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <meta name="description" content="מעל 38,000 מוצרים: מחשבים סלולר, בשמים, למטבח, למשרד טיפוח, פארם, צעצועים, נעלים ומיזוג" />
    <link rel="manifest" href="/web/manifest.json" />
Any idea how can i overcome this ? Thanks everyone


  • This likely is not scraper protection. Instead, this site is probably using some web framework that loads in the viewable data and DOM elements after the JS has run. The easiest way to get past this would be to use a library like puppeteer that will load the site and process it like how a real browser would. Here is a basic example of what you might want:

    const puppeteer = require('puppeteer');
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('[the full URL you want to scrape]');
      // once the page has loaded, you can find data in a few ways:
      // 1: querying
      const elements = await page.$$("[any JS selector]")
      // 2: evaluate
      const elements1 = await page.evaluate(() => {
        // run any code on the site and have it's result returned to you
      // 3: text
      const wholePage = await page.evaluate(() => document.querySelector("*").outerHTML);
      // this gives you the text content of the whole page
      // which you can then put in to cheerio or any parser
      // and use how you were using before
      await browser.close();

    You can read more about puppeteer more broadly, method 1, method 2 and method 3.