Search code examples
node.jspdfphantomjshtml-to-pdf

Generating PDF of a Web Page


I'm trying to generate a pdf file of a web page and want to save to local disk to email later.

I had tried this approach but the problem here is, its not working for pages like this. I'm able to generate the pdf, but its not matching with web page content.

Its very clear that pdf is generated before document.ready or might be something else. I'm unable to figure out the exact issue. I'm just looking for an approach where I can save web page output as pdf.

I hope generating pdf of a web page is more suitable in Node then PHP? If any solution in PHP is available then it will be a big help or even node implementation is also fine.


Solution

  • Its very clear that pdf is generated before document ready

    Very true, so it is necessary to wait until after scripts are loaded and executed.


    You linked to an answer that uses phantom node module.

    The module was upgraded since then and now supports async/await functions that make script much much more readable.

    If I may suggest a solution that uses the async/await version (version 4.x, requires node 8+).

    const phantom = require('phantom');
    
    const timeout = ms => new Promise(resolve => setTimeout(resolve, ms));
    
    (async function() {
      const instance = await phantom.create();
      const page = await instance.createPage();
    
      await page.property('viewportSize', { width: 1920, height: 1024 });
    
      const status = await page.open('http://www.chartjs.org/samples/latest/charts/pie.html');
    
      // If a page has no set background color, it will have gray bg in PhantomJS
      // so we'll set white background ourselves
      await page.evaluate(function(){
          document.querySelector('body').style.background = '#fff';
      });
    
      // Let's benchmark
      console.time('wait');
    
      // Wait until the script creates the canvas with the charts
      while (0 == await page.evaluate(function(){ return document.querySelectorAll("canvas").length }) )  {
          await timeout(250);
      }
    
      // Make sure animation of the chart has played
      await timeout(500);
    
      console.timeEnd('wait');
    
      await page.render('screen.pdf');
    
      await instance.exit();
    })();
    

    On my dev machine it takes 600ms to wait for the chart to be ready. Much better than to await timeout(3000) or any other arbitrary number of seconds.