Search code examples
javascriptnode.jsjavascript-objectspuppeteer

How to find number of pages in a single pdf created via puppeteer


I am currently trying to find the number of pages in a single pdf / what is the total size of the pdf file created by puppeteer.page as per requirement

Here's what I did:

    try {
      const generatedPdfFilePath = `${directory}/feedback-${requestId}.pdf`;
      const htmlFilePath = `${directory}/report-${requestId}.html`;
      const htmlTemplate =
        fs.readFileSync(path.join(process.cwd(), '/data/feedback-template.hbs'), 'utf-8');
      const template = handlebars.compile(htmlTemplate);
      const htmlFile = minify(template(data), {
        collapseWhitespace: true,
      });
      fs.writeFileSync(htmlFilePath , htmlFile);
      const options = {
        format: 'A4',
        printBackground: true,
        path: generatedPdfFilePath ,
      };
      const browser = await puppeteer.launch({
        args: ['--no-sandbox'],
        headless: true,
      });
      const page = await browser.newPage();
      await page.goto(`file://${htmlFilePath}`, {
        waitUntil: 'networkidle0',
        timeout: 300000,
      });
      await page.pdf(options);
      // Do something here to find number of pages in this pdf
      await browser.close();
      resolve({ file: generatedPdfFilePath });
    } catch (error) {
      console.log(error);
      reject(error);
    }

So far what I have done is created an html template for the pdf, then used puppeteer, headless chrome for nodejs to generate the required pdf of the page. But now Im sort of stuck because I want to know how many pages are actually in this pdf file or in other words what is the size of the pdf which I need in further calculations. I have only mentioned the relevant code here for ease.

Also, Im pretty new to puppeteer, Can someone explain how can I get details of this pdf. I have been searching for quite some time now and no luck. Puppeteer's documentation isn't helping in any case no details are there on why we do what we do. All I get is the details on pdf options.. docs

Any help would be much appreciated.


Solution

  • You can use the pdf-parse node module, like this:

    const fs = require('fs');
    const pdf = require('pdf-parse');
    
    let dataBuffer = fs.readFileSync('path to PDF file...');
    
    pdf(dataBuffer).then(function(data) {
    
        // number of pages
        console.log(data.numpages);
    });
    

    Your code would become something like:

    const pdf = require('pdf-parse');
    try {
          const generatedPdfFilePath = `${directory}/feedback-${requestId}.pdf`;
          const htmlFilePath = `${directory}/report-${requestId}.html`;
          const htmlTemplate =
            fs.readFileSync(path.join(process.cwd(), '/data/feedback-template.hbs'), 'utf-8');
          const template = handlebars.compile(htmlTemplate);
          const htmlFile = minify(template(data), {
            collapseWhitespace: true,
          });
          fs.writeFileSync(htmlFilePath , htmlFile);
          const options = {
            format: 'A4',
            printBackground: true,
            path: generatedPdfFilePath ,
          };
          const browser = await puppeteer.launch({
            args: ['--no-sandbox'],
            headless: true,
          });
          const page = await browser.newPage();
          await page.goto(`file://${htmlFilePath}`, {
            waitUntil: 'networkidle0',
            timeout: 300000,
          });
          await page.pdf(options);
          // Do something here to find number of pages in this pdf
          let dataBuffer = fs.readFileSync(htmlFilePath);
          const pdfInfo = await pdf(dataBuffer);
          const numPages = pdfInfo.numpages;
    
          await browser.close();
          resolve({ file: generatedPdfFilePath });
        } catch (error) {
          console.log(error);
          reject(error);
        }