iTextPdf: why misleading number of revisions when verifiying signature?

Im using iTextPdf to do signatures and integrity check on PDF's, powered by Alfresco

This is the code for signature:

public void signItem(NodeRef itemToSign, String signer) {

       try{
        // retrieving user's public and private key
        Certificate chain[] = getCertificate(signer);
        PrivateKey pk = getPrivateKey(signer);

        String digestAlgorithm = DigestAlgorithms.SHA512;
        BouncyCastleProvider provider = new BouncyCastleProvider();
        Security.addProvider(provider);

        // Getting content of item to sign
        InputStream originalInputStream = getNodeRefInputStream(itemToSign);
        PdfReader pdfReader = new PdfReader(originalInputStream);

        // get an outputStream on the item to sign nodeRef and give to the
        // pdfStamper
        ByteArrayOutputStream outputStream = getNodeRefOutputStream(itemToSign);
        // logger.info("Before" + outputStream);

        PdfStamper pdfStamper = PdfStamper.createSignature(pdfReader, outputStream, '\0', new File("temp"), true);

        // Creating the appearance
        PdfSignatureAppearance appearance = pdfStamper.getSignatureAppearance();
        appearance.setReason("freeze");
        appearance.setLocation("koosserydesk");
        appearance.setVisibleSignature(new Rectangle(36, 748, 144, 780), 1, "signature space");

        // the sign document is subject to future approval signatures
        appearance.setCertificationLevel(PdfSignatureAppearance.CERTIFIED_FORM_FILLING);

        // Creating the signature
        ExternalDigest digest = new BouncyCastleDigest();
        ExternalSignature signature = new PrivateKeySignature(pk, digestAlgorithm, provider.getName());
        // signing...
        MakeSignature.signDetached(appearance, digest, signature, chain, null, null, null, 0, CryptoStandard.CMS);


        // get the signed input stream
        InputStream signedInputStream = new ByteArrayInputStream(outputStream.toByteArray());

        // replace the itemToSign content with the signed content
        ContentWriter writer = getWriter(itemToSign);
        writer.putContent(signedInputStream);
} catch (Exception e) {

        // do something

    }

}

And this is the code for integrity check

public void checkDocIntegrity(NodeRef itemToSign) throws KoosseryDeskServerException {
    /** check the integrity of the document **/

    ArrayList<String> signatureNames;
    PdfPKCS7 pkcs7;
    boolean result = false;
    try {
        InputStream is = getNodeRefInputStream(itemToSign);
        PdfReader reader = new PdfReader(is);
        AcroFields fields = reader.getAcroFields();

        signatureNames = fields.getSignatureNames();
        String name = signatureNames.get(0);
        System.out.println("Siganture names = " + signatureNames);
        System.out.println("Document revision: " + fields.getRevision(name) + " of " + fields.getTotalRevisions());
        pkcs7 = fields.verifySignature(name);

        result = pkcs7.verify();
        System.out.println("Is the document integrity check OK? : "+result);
    } catch (Exception e) {

        // do something

    }

}

When i run the integrity check on a document signed using the above signItem function, i'm always getting this output:

Siganture names = [signature space] 
Document revision: 1 of 2
Is the document integrity check OK? : false

I guess that the integrity check is always false cause a second second revision has been added after the signature has been affixed, but: i don't know why i'm getting two document revisions nevertheless i didn't add any annotations or other other approval signatures.

Please tell me What am i doing wrong? Thanks!

Solution

In a nutshell

It looks like either your method getNodeRefOutputStream returns a ByteArrayOutputStream which already contains a copy of the original document to start with or your method getWriter returns a ContentWriter that appends to the existing content instead of replacing it .

The result is that the final result document is the concatenation of (A) the original document and (B) the original document plus signature.

To solve this problem change or replace the faulty method call to return an object which effectively replaces the original contents by the stamper output.

In detail

Analyzing your PDF it quickly becomes clear that it is somewhat broken as instead of the expected two revisions (first the original PDF, then the additions created for signing it) it actually consists of three parts (first the original PDF, then the original PDF again, and then the additions created for signing it with cross references as if the original part was preceding but once).

The effects are that

the aggregated cross references are wrong, they point to inadequate locations in the second copy of the original document instead of the actually added objects and the startxref pointer points somewhere into the second copy of the original PDF, too; thus, Adobe Reader under the hood "repairs" the PDF which provokes the "Do you want to save changes" dialog when closing the document;
the gap in the signed byte ranges is somewhere inside the second copy of the original document and in particular does not contain the encoded signature bytes; thus, Adobe Reader - expecting the signature in that gap - signals errors in the formatting or information contained in this signature;
the signed byte ranges reach somewhat into the second copy of the original causing iText to report two revisions, first the data up to the end of the signed byte ranges and then the data beyond; and
the signed hash value is broken as the signed byte range does not contain the original document plus the signing additions with the exception of the actual signature bytes but instead the original document plus some weird sections of the second copy of the original document; this causes iText to fail verification.

(You might want to read this answer on Information Security Stack Exchange to understand the details.)

Such behavior is unheard of in respect of the iText classes. Thus, it appears to be caused by your code.

Looking at the code you posted this duplication of the original document most likely is either due to your code

stamping to the ByteArrayOutputStream returned by getNodeRefOutputStream (if that stream is initialized with a copy of the original document) or
writing the result PDF to the ContentWriter returned by getWriter (if the putContent method of that class actually appends to the existing content).

I would propose, therefore, instead of setting outputStream to the ByteArrayOutputStream returned by getNodeRefOutputStream to set outputStream to an empty new ByteArrayOutputStream(); if that does not help, I would propose looking for alternatives to getWriter or ContentWriter.putContent.