Search code examples
itextitextpdf

Signing PDF - memory consumption


I tried some utilities for digital PDF signing based on iText v1 or v2 and found out that it seems whole PDF is loaded into memory (for 60M PDF process can take up to 300-400MB of memory).

Can recent iText versions sign PDF without load it into memory?

Updates

I tested Bruno's example with itextpdf 5.5.6

  • PdfReader constructor doesn't matter - it can be (src) or (src, null, true), or (src, null, false) - result the same.
  • what matters is new File(tmp) in createSignature.

But memory consumption is still to big. I tried to sign 100M file (it's PDF with embedded attachment), peak memory is about 325M. Sure, it's better than 540M without temporary file, but not good enough (((.

With 32K file max. memory was 65M (that's JVM and java code itself, I guess)

Memory was measured with /usr/bin/time -v java ....

I limited Java memory with -Xmx100m, but it crashed with out of memory:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at com.itextpdf.text.pdf.PdfReader.getStreamBytesRaw(PdfReader.java:2576) at com.itextpdf.text.pdf.PdfReader.getStreamBytesRaw(PdfReader.java:2615) at com.itextpdf.text.pdf.PRStream.toPdf(PRStream.java:230) at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158) at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420) at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398) at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:887) at com.itextpdf.text.pdf.PdfStamperImp.close(PdfStamperImp.java:412) at com.itextpdf.text.pdf.PdfStamperImp.close(PdfStamperImp.java:386) at com.itextpdf.text.pdf.PdfSignatureAppearance.preClose(PdfSignatureAppearance.java:1316) at com.itextpdf.text.pdf.security.MakeSignature.signDetached(MakeSignature.java:140)

Code is:

public static byte[] getStreamBytesRaw(final PRStream stream, final RandomAccessFileOrArray file) throws IOException {
        PdfReader reader = stream.getReader();
        byte b[];
        if (stream.getOffset() < 0)
            b = stream.getBytes();
        else {
      ----> b = new byte[stream.getLength()];
            file.readFully(b);

I see in debugger that stream type is EmbeddedFile and length is 100M - so whole embedded file is being read into memory.

Update - create big PDF

It's difficult to share 100M file )), but here is create sequence:

  1. Run dd if=/dev/urandom of=file.bin bs=1048000 count=100
  2. Go to http://blog.didierstevens.com/programs/pdf-tools/ and take http://didierstevens.com/files/software/make-pdf_V0_1_6.zip
  3. Unzip and run python make-pdf-embedded.py file.bin file.pdf

Here you are )

I should note that it's important to use /dev/urandom. /dev/zero creates compressed PDF with only 100K size.

Anyway, if it's necessary to obtain my file I've created 50M file on server - http://50mpdf.tk/50m.pdf


Solution

  • While signing a PDF, iText uses relevant amounts of memory

    • reading the whole unsigned PDF into memory unless using a PdfReader in partial mode;
    • creating the signed file in memory unless using a PdfStamper configured to use a temporary file; and
    • reading whole individual PDF objects (e.g. streams containing embedded files) into memory when copying the unsigned data to the to-be-signed file unless using a PdfStamper in append mode.

    E.g. signing the sample 50 MB file supplied by the OP requires

    • about -Xmx240m if using neither append mode, nor a temporary file, nor partial mode;
    • about -Xmx81m if using a temporary file but not append mode, partial mode makes no difference;
    • about -Xmx7m if using append mode and a temporary file, partial mode makes no difference.

    The reason why partial mode makes no difference in the later cases, is that even in non-partial-mode the PdfReader does not seems to read stream contents during initialization. As the sample file consists mostly of the contents of a single big stream, the few objects read or not read during initialization don't make a difference, especially as even in partial mode the PdfReader reads and keeps some objects in memory which reflect the global document structure, e.g. the page tree.

    You can find my test routines here: CreateSignature.java. I ran it on a 64bit MS Windows Java 8 using iText 5.5.7-SNAPSHOT (which should not differ from the 5.5.6 release in this context).

    Thus, for memory-friendly signing use this variant of @Bruno's code:

    // Creating the reader and the stamper
    PdfReader reader = new PdfReader(filepath, null, true);
    FileOutputStream os = new FileOutputStream(dest);
    PdfStamper stamper =
        PdfStamper.createSignature(reader, os, '\0', new File(tmp), true);
    // Creating the appearance
    PdfSignatureAppearance appearance = stamper.getSignatureAppearance();
    appearance.setReason(reason);
    appearance.setLocation(location);
    appearance.setVisibleSignature(new Rectangle(36, 748, 144, 780), 1, "sig");
    // Creating the signature
    ExternalSignature pks = new PrivateKeySignature(pk, digestAlgorithm, provider);
    ExternalDigest digest = new BouncyCastleDigest();
    MakeSignature.signDetached(appearance, digest, pks, chain,
        null, null, null, 0, subfilter);