Command line headless browser

I am looking for a command line option to get a webpage, and execute the associated JavaScript code. In other words, call a headless browser via command line.

I can't use wget, it does not load and execute the associated JavaScript:

wget --load-cookies cookies.txt -O /dev/null

Use case: we have web pages that read elastisearch indexes, do some data manipulation, and update elastisearch indexes. We'd like to do the update on an hourly basis via a cron job. We don't need to capture anything, e.g. no png capture, no HTML capture. We simply need to load the webpage and execute its JavaScript via a cron job, ideally something like run-headless OS is CentOS 7.

I searched stackoverflow and did not find any answer satisfying my needs. selenium etc seem like an overkill:


  • After some research I found a solution using puppeteer headless browser. Ideally I wanted a single command like run-headless, but login was required, hence driving the headless browser with puppeteer.

    Installation steps for CentOS 7.6:

    1. Install chrome

    # cd
    # mkdir install
    # cd install/
    # wget
    # yum localinstall vulkan-filesystem-
    # wget
    # yum localinstall vulkan-
    # wget
    # yum localinstall liberation-fonts-1.07.2-16.el7.noarch.rpm
    # vi /etc/yum.repos.d/google-chrome.repo
    # cat /etc/yum.repos.d/google-chrome.repo
    # yum install google-chrome-stable

    2. Install node.js

    # curl -sL | sudo bash -
    # yum install nodejs

    3. Patch /etc/sysctl.conf

    This was needed to run puppeteer without disabling the sandbox:

    # echo "user.max_user_namespaces=15000" >> /etc/sysctl.conf
    # reboot

    4. Create run-hourly.js puppeteer script

    This node script has to run as a regular user, not root:

    $ cd /path/to/script
    $ npm install --save puppeteer
    $ npm install --save pending-xhr-puppeteer
    $ mkdir userDataDir
    $ vi run-hourly.js # (content below)
    $ node run-hourly.js

    File content of run-hourly.js script:

    const config = {
        userDataDir: __dirname + '/userDataDir',
        login: {
            url:        '',
            username:   'foobar',
            password:   'secret',
        pages: [{
            url:        '',
            pdfFile:    __dirname + '/page.pdf'
    const puppeteer = require('puppeteer');
    const { PendingXHR } = require('pending-xhr-puppeteer');
    (async() => {
        // initialize headless browser
        const browser = await puppeteer.launch({
            headless:       true,   // run headless
            dumpio:         true,   // capture console log to stdout
            userDataDir:    config.userDataDir // custom user data
        const page = await browser.newPage();
        const pendingXHR = new PendingXHR(page);
        // login
        await page.goto(config.login.url, {waitUntil: 'load'});
        await page.type('#loginusername', config.login.username);
        await page.type('#password', config.login.password);
        await page.waitForNavigation();
        // load pages of interest
        await Promise.all( (pageCfg) => {
            await page.goto(pageCfg.url, {waitUntil: 'networkidle0'}); // wait for page load
            await page.setRequestInterception(true);  // intercept requests for next line
            await pendingXHR.waitForAllXhrFinished(); // wait for all requests to finish
            await page.pdf({path: pageCfg.pdfFile});  // generate PDF from rendered page
        await browser.close();

    5. Add hourly job to cron

    Install the cron job as same user as the script owner

    $ crontab -l
    $ crontab -e
    25 * * * * cd /path/to/script && node run-hourly.js > hourly.log 2>&1