web-scraping ascii chromium headless google-chrome-headless

Headless Chrome: website Div content to Text, toString or ASCII

I want to scrape text from a dynamically loaded website for which I need dynamic scraping. Because of dynamic loading, options such as $ lynx --dump google.com do not seem to work. For this I have used Headless Chrome such that

$ Chrome --headless --disable-gpu --no-sandbox --run-all-compositor-stages-before-draw --virtual-time-budget=1000 --window-size=1200,3000 --screenshot http://mtv.com

but I cannot find an option to scrape the text out of the website. I am available to all dynamic scraping options to get the text of specific div with some class for instance.

How can I scape text from a dynamically-loaded website?

Example result by the dynamic loading using headless chrome

Solution

If you can write JS for Node.js, you can try puppeteer, Node.js library to manage headless Chrome:

'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch({ headless: true });
    const [page] = await browser.pages();

    await page.goto('http://www.mtv.com/');

    const data = await page.evaluate(() => {
      return document.querySelector('div.header').innerText;
    });

    console.log(data);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

Output:

teen mom 2