Search code examples
javascriptnode.jspdffile-ionode-pdfkit

Why is nodeJs not reading entire binary file from disk?


I have a PDF file which I want to read into memory using NodeJS. Ideally I'd like to encode it using base64 for transferring it. But somehow the read function does not seem to read the full PDF file, which makes no sense to me. The original PDF was generated using pdfKit, and is ok and viewable using a PDF reader program.

The original file test.pdf has 90kB on disk. But if I read and write it back to disk there are just 82kB and the new PDF test-out.pdf is not ok. The pdf viewer says:

Unable to open document. The pdf document is damaged.

The base64 encoding therefore also does not work correctly. I tested it using this webservice. Does someone know why and what is happening here? And how to resolve it.

I found this post already.

fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // returns raw buffer binary data
// buf = fs.readFileSync('test.pdf', {encoding:'base64'}); // for the base64 encoded data
// ...transfer the base64 data...
fs.writeFileSync('test-out.pdf', buf); // should be pdf again

EDIT MCVE:

const fs = require('fs');
const PDFDocument = require('pdfkit');

let filepath = 'output.pdf';

class PDF {
  constructor() {
    this.doc = new PDFDocument();
    this.setupdocument();
    this.doc.pipe(fs.createWriteStream(filepath));
  }

  setupdocument() {
    var pageNumber = 1;
    this.doc.on('pageAdded', () => {
        this.doc.text(++pageNumber, 0.5 * (this.doc.page.width - 100), 40, {width: 100, align: 'center'});
      }
    );

    this.doc.moveDown();
    // draw some headline text
    this.doc.fontSize(25).text('Some Headline');
    this.doc.fontSize(15).text('Generated: ' + new Date().toUTCString());
    this.doc.moveDown();
    this.doc.font('Times-Roman', 11);
  }

  report(object) {

    this.doc.moveDown();
    this.doc
      .text(object.location+' '+object.table+' '+Date.now())
      .font('Times-Roman', 11)
      .moveDown()
      .text(object.name)
      .font('Times-Roman', 11);

    this.doc.end();
    let report = fs.readFileSync(filepath);
    return report;
  }
}

let pdf = new PDF();
let buf = pdf.report({location: 'athome', table:'wood', name:'Bob'});
fs.writeFileSync('outfile1.pdf', buf);

Solution

  • After a lot of searching I found this Github issue. The problem in my question seems to be the call of doc.end() which for some reason doesn't wait for the stream to finish (finish event of write stream). Therefore as suggested in the Github issue, the following approaches work:

    • callback based:
    doc = new PDFDocument();
    writeStream = fs.createWriteStream('filename.pdf');
    doc.pipe(writeStream);
    doc.end()
    writeStream.on('finish', function () {
        // do stuff with the PDF file
    });
    
    • or promise based:
    const stream = fs.createWriteStream(localFilePath);
    doc.pipe(stream);
    .....
    doc.end();
    await new Promise<void>(resolve => {
      stream.on("finish", function() {
        resolve();
      });
    });
    
    • or even nicer, instead of calling doc.end() direcly, call the function savePdfToFile below:
    function savePdfToFile(pdf : PDFKit.PDFDocument, fileName : string) : Promise<void> {
      return new Promise<void>((resolve, reject) => {
    
        //  To determine when the PDF has finished being written sucessfully 
        //  we need to confirm the following 2 conditions:
        //
        //  1. The write stream has been closed
        //  2. PDFDocument.end() was called syncronously without an error being thrown
    
        let pendingStepCount = 2;
    
        const stepFinished = () => {
          if (--pendingStepCount == 0) {
            resolve();
          }
        };
    
        const writeStream = fs.createWriteStream(fileName);
        writeStream.on('close', stepFinished);
        pdf.pipe(writeStream);
    
        pdf.end();
    
        stepFinished();
      }); 
    }
    

    This function should correctly handle the following situations:

    • PDF generated successfully
    • Error is thrown inside pdf.end() before write stream is closed
    • Error is thrown inside pdf.end() after write stream has been closed