Search code examples
javaitextpdfboxdigital-signature

Detect Changes on Signed PDF that was done between signatures


I'm developing an application that should verify signatures of pdf files. The application should detect full history of updates done on the file content before each signature is applied. For example:

  1. Signer 1 signed the plain pdf file
  2. Signer 2 added comment to the signed file, then signed it

How can application detect that Signer 2 added a comment before his signature.

I have tried to use itext and pdfbox


Solution

  • As already explained in a comment, neither iText nor PDFBox bring along a high-level API telling you what changed in an incremental update in terms of UI objects (comments, text content, ...).

    You can use them to render the different revisions of the PDF as bitmaps and compare those images.

    Or you can use them to tell you the changes in terms of low level COS objects (dictionaries, arrays, numbers, strings, ...).

    But analyzing the changes in those images or low level objects and determining their meaning in terms of UI objects, that e.g. a comment and only a comment has been added, is highly non-trivial.

    In response you asked

    Can you explain more, how can I detect changes in low level COS objects.

    What to Compare And What Changes to Consider

    First of all you have to be clear about what document states you can compare to detect changes.

    The PDF format allows to append changes to a PDF in so called incremental updates. This allows changes to signed documents without cryptographically breaking those signatures as the original signed bytes are left as is:

    Structure

    There can be more incremental updates in-between, though, which are not signed; e.g. the "Changes for version 2" might include multiple incremental updates.

    One might consider comparing the revisions created by arbitrary incremental updates. The problem here is, though, that you cannot identify the person who applied an incremental update without signature.

    Thus, it usually makes more sense to compare the signed revisions only and to hold each signer responsible for all changes since the previous signed revision. The only exception here is the whole file which as the current version of the PDF is of special interest even if it there is no signature covering all of it.

    Next you have to decide what you consider a change. In particular:

    • Is every object override in an incremental update a change? Even those that override the original object with an identical copy?

    • What about changes that make a direct object indirect (or vice versa) but keep all contents and references intact?

    • What about addition of new objects that are not referred to from anywhere in the standard structure?

    • What about addition of objects that are not referenced from the cross reference streams or tables?

    • What about addition of data that's not following PDF syntax at all?

    If you are indeed interested in such changes, too, existing PDF libraries out-of-the-box usually don't provide you the means to determine them; you most likely will at least have to change their code for traversing the chain of cross reference tables/streams or even analyze the file bytes in the update directly.

    If you are not interested in such changes, though, there usually is no need to change or replace library routines.

    As the enumerated and similar changes make no difference when the PDF is processed by specification conform PDF processors, one can usually ignore such changes.

    If this is your position, too, the following example tool might give you a starting point.

    An Example Tool Based on iText 7

    With the limitations explained above you can compare signed revisions of a PDF using iText 7 without changes to the library by loading the revisions to compare into separate PdfDocument instances and recursively comparing the PDF objects starting with the trailer.

    I once implemented this as a small helper tool for personal use (so it is not completely finished yet, more work-in-progress). First there is the base class that allows comparing two arbitrary documents:

    public class PdfCompare {
        public static void main(String[] args) throws IOException {
            System.out.printf("Comparing:\n* %s\n* %s\n", args[0], args[1]);
            try (   PdfDocument pdfDocument1 = new PdfDocument(new PdfReader(args[0]));
                    PdfDocument pdfDocument2 = new PdfDocument(new PdfReader(args[1]))  ) {
                PdfCompare pdfCompare = new PdfCompare(pdfDocument1, pdfDocument2);
                pdfCompare.compare();
    
                List<Difference> differences = pdfCompare.getDifferences();
                if (differences == null || differences.isEmpty()) {
                    System.out.println("No differences found.");
                } else {
                    System.out.printf("%d differences found:\n", differences.size());
                    for (Difference difference : pdfCompare.getDifferences()) {
                        for (String element : difference.getPath()) {
                            System.out.print(element);
                        }
                        System.out.printf(" - %s\n", difference.getDescription());
                    }
                }
            }
        }
    
        public interface Difference {
            List<String> getPath();
            String getDescription();
        }
    
        public PdfCompare(PdfDocument pdfDocument1, PdfDocument pdfDocument2) {
            trailer1 = pdfDocument1.getTrailer();
            trailer2 = pdfDocument2.getTrailer();
        }
    
        public void compare() {
            LOGGER.info("Starting comparison");
            try {
                compared.clear();
                differences.clear();
                LOGGER.info("START COMPARE");
                compare(trailer1, trailer2, Collections.singletonList("trailer"));
                LOGGER.info("START SHORTEN PATHS");
                shortenPaths();
            } finally {
                LOGGER.info("Finished comparison and shortening");
            }
        }
    
        public List<Difference> getDifferences() {
            return differences;
        }
    
        class DifferenceImplSimple implements Difference {
            DifferenceImplSimple(PdfObject object1, PdfObject object2, List<String> path, String description) {
                this.pair = Pair.of(object1, object2);
                this.path = path;
                this.description = description;
            }
    
            @Override
            public List<String> getPath() {
                List<String> byPair = getShortestPath(pair);
                return byPair != null ? byPair : shorten(path);
            }
            @Override public String getDescription()    { return description;           }
    
            final Pair<PdfObject, PdfObject> pair;
            final List<String> path;
            final String description;
        }
    
        void compare(PdfObject object1, PdfObject object2, List<String> path) {
            LOGGER.debug("Comparing objects at {}.", path);
            if (object1 == null && object2 == null)
            {
                LOGGER.debug("Both objects are null at {}.", path);
                return;
            }
            if (object1 == null) {
                differences.add(new DifferenceImplSimple(object1, object2, path, "Missing in document 1"));
                LOGGER.info("Object in document 1 is missing at {}.", path);
                return;
            }
            if (object2 == null) {
                differences.add(new DifferenceImplSimple(object1, object2, path, "Missing in document 2"));
                LOGGER.info("Object in document 2 is missing at {}.", path);
                return;
            }
    
            if (object1.getType() != object2.getType()) {
                differences.add(new DifferenceImplSimple(object1, object2, path,
                        String.format("Type difference, %s in document 1 and %s in document 2",
                                getTypeName(object1.getType()), getTypeName(object2.getType()))));
                LOGGER.info("Objects have different types at {}, {} and {}.", path, getTypeName(object1.getType()), getTypeName(object2.getType()));
                return;
            }
    
            switch (object1.getType()) {
            case PdfObject.ARRAY:
                compareContents((PdfArray) object1, (PdfArray) object2, path);
                break;
            case PdfObject.DICTIONARY:
                compareContents((PdfDictionary) object1, (PdfDictionary) object2, path);
                break;
            case PdfObject.STREAM:
                compareContents((PdfStream)object1, (PdfStream)object2, path);
                break;
            case PdfObject.BOOLEAN:
            case PdfObject.INDIRECT_REFERENCE:
            case PdfObject.LITERAL:
            case PdfObject.NAME:
            case PdfObject.NULL:
            case PdfObject.NUMBER:
            case PdfObject.STRING:
                compareContentsSimple(object1, object2, path);
                break;
            default:
                differences.add(new DifferenceImplSimple(object1, object2, path, "Unknown object type " + object1.getType() + "; cannot compare"));
                LOGGER.warn("Unknown object type at {}, {}.", path, object1.getType());
                break;
            }
        }
    
        void compareContents(PdfArray array1, PdfArray array2, List<String> path) {
            int count1 = array1.size();
            int count2 = array2.size();
            if (count1 < count2) {
                differences.add(new DifferenceImplSimple(array1, array2, path, "Document 1 misses " + (count2-count1) + " array entries"));
                LOGGER.info("Array in document 1 is missing {} entries at {} for {}.", (count2-count1), path);
            }
            if (count1 > count2) {
                differences.add(new DifferenceImplSimple(array1, array2, path, "Document 2 misses " + (count1-count2) + " array entries"));
                LOGGER.info("Array in document 2 is missing {} entries at {} for {}.", (count1-count2), path);
            }
    
            if (alreadyCompared(array1, array2, path)) {
                return;
            }
    
            int count = Math.min(count1, count2);
            for (int i = 0; i < count; i++) {
                compare(array1.get(i), array2.get(i), join(path, String.format("[%d]", i)));
            }
        }
    
        void compareContents(PdfDictionary dictionary1, PdfDictionary dictionary2, List<String> path) {
            List<PdfName> missing1 = new ArrayList<PdfName>(dictionary2.keySet());
            missing1.removeAll(dictionary1.keySet());
            if (!missing1.isEmpty()) {
                differences.add(new DifferenceImplSimple(dictionary1, dictionary2, path, "Document 1 misses dictionary entries for " + missing1));
                LOGGER.info("Dictionary in document 1 is missing entries at {} for {}.", path, missing1);
            }
    
            List<PdfName> missing2 = new ArrayList<PdfName>(dictionary1.keySet());
            missing2.removeAll(dictionary2.keySet());
            if (!missing2.isEmpty()) {
                differences.add(new DifferenceImplSimple(dictionary1, dictionary2, path, "Document 2 misses dictionary entries for " + missing2));
                LOGGER.info("Dictionary in document 2 is missing entries at {} for {}.", path, missing2);
            }
    
            if (alreadyCompared(dictionary1, dictionary2, path)) {
                return;
            }
    
            List<PdfName> common = new ArrayList<PdfName>(dictionary1.keySet());
            common.retainAll(dictionary2.keySet());
            for (PdfName name : common) {
                compare(dictionary1.get(name), dictionary2.get(name), join(path, name.toString()));
            }
        }
    
        void compareContents(PdfStream stream1, PdfStream stream2, List<String> path) {
            compareContents((PdfDictionary)stream1, (PdfDictionary)stream2, path);
    
            byte[] bytes1 = stream1.getBytes();
            byte[] bytes2 = stream2.getBytes();
            if (!Arrays.equals(bytes1, bytes2)) {
                differences.add(new DifferenceImplSimple(stream1, stream2, path, "Stream contents differ"));
                LOGGER.info("Stream contents differ at {}.", path);
            }
        }
    
        void compareContentsSimple(PdfObject object1, PdfObject object2, List<String> path) {
            // vvv--- work-around for DEVSIX-4931, likely to be fixed in 7.1.15
            if (object1 instanceof PdfNumber)
                ((PdfNumber)object1).getValue();
            if (object2 instanceof PdfNumber)
                ((PdfNumber)object2).getValue();
            // ^^^--- work-around for DEVSIX-4931, likely to be fixed in 7.1.15
            if (!object1.equals(object2)) {
                if (object1 instanceof PdfString) {
                    String string1 = object1.toString();
                    if (string1.length() > 40)
                        string1 = string1.substring(0, 40) + '\u22EF';
                    string1 = sanitize(string1);
                    String string2 = object2.toString();
                    if (string2.length() > 40)
                        string2 = string2.substring(0, 40) + '\u22EF';
                    string2 = sanitize(string2);
                    differences.add(new DifferenceImplSimple(object1, object2, path, String.format("String values differ, '%s' and '%s'", string1, string2)));
                    LOGGER.info("String values differ at {}, '{}' and '{}'.", path, string1, string2);
                } else {
                    differences.add(new DifferenceImplSimple(object1, object2, path, String.format("Object values differ, '%s' and '%s'", object1, object2)));
                    LOGGER.info("Object values differ at {}, '{}' and '{}'.", path, object1, object2);
                }
            }
        }
    
        String sanitize(CharSequence string) {
            char[] sanitized = new char[string.length()];
            for (int i = 0; i < sanitized.length; i++) {
                char c = string.charAt(i);
                if (c >= 0 && c < ' ')
                    c = '\uFFFD';
                sanitized[i] = c;
            }
            return new String(sanitized);
        }
    
        String getTypeName(byte type) {
            switch (type) {
            case PdfObject.ARRAY:               return "ARRAY";
            case PdfObject.BOOLEAN:             return "BOOLEAN";
            case PdfObject.DICTIONARY:          return "DICTIONARY";
            case PdfObject.LITERAL:             return "LITERAL";
            case PdfObject.INDIRECT_REFERENCE:  return "REFERENCE";
            case PdfObject.NAME:                return "NAME";
            case PdfObject.NULL:                return "NULL";
            case PdfObject.NUMBER:              return "NUMBER";
            case PdfObject.STREAM:              return "STREAM";
            case PdfObject.STRING:              return "STRING";
            default:
                return "UNKNOWN";
            }
        }
    
        List<String> join(List<String> path, String element) {
            String[] array = path.toArray(new String[path.size() + 1]);
            array[array.length-1] = element;
            return Arrays.asList(array);
        }
    
        boolean alreadyCompared(PdfObject object1, PdfObject object2, List<String> path) {
            Pair<PdfObject, PdfObject> pair = Pair.of(object1, object2);
            if (compared.containsKey(pair)) {
                //LOGGER.debug("Objects already compared at {}, previously at {}.", path, compared.get(pair));
                Set<List<String>> paths = compared.get(pair);
                boolean alreadyPresent = false;
    //            List<List<String>> toRemove = new ArrayList<>();
    //            for (List<String> formerPath : paths) {
    //                for (int i = 0; ; i++) {
    //                    if (i == path.size()) {
    //                        toRemove.add(formerPath);
    //                        System.out.print('.');
    //                        break;
    //                    }
    //                    if (i == formerPath.size()) {
    //                        alreadyPresent = true;
    //                        System.out.print(':');
    //                        break;
    //                    }
    //                    if (!path.get(i).equals(formerPath.get(i)))
    //                        break;
    //                }
    //            }
    //            paths.removeAll(toRemove);
                if (!alreadyPresent)
                    paths.add(path);
                return true;
            }
            compared.put(pair, new HashSet<>(Collections.singleton(path)));
            return false;
        }
    
        List<String> getShortestPath(Pair<PdfObject, PdfObject> pair) {
            Set<List<String>> paths = compared.get(pair);
            //return (paths == null) ? null : Collections.min(paths, pathComparator);
            return (paths == null || paths.isEmpty()) ? null : shortened.get(paths.stream().findFirst().get());
        }
    
        void shortenPaths() {
            List<Map<List<String>, SortedSet<List<String>>>> data = new ArrayList<>();
            for (Set<List<String>> set : compared.values()) {
                SortedSet<List<String>> sortedSet = new TreeSet<List<String>>(pathComparator);
                sortedSet.addAll(set);
                for (List<String> path : sortedSet) {
                    while (path.size() >= data.size()) {
                        data.add(new HashMap<>());
                    }
                    SortedSet<List<String>> former = data.get(path.size()).put(path, sortedSet);
                    if (former != null) {
                        LOGGER.error("Path not well-defined for {}", path);
                    }
                }
            }
            for (int pathSize = 3; pathSize < data.size(); pathSize++) {
                for (Map.Entry<List<String>, SortedSet<List<String>>> pathEntry : data.get(pathSize).entrySet()) {
                    List<String> path = pathEntry.getKey();
                    SortedSet<List<String>> equivalents = pathEntry.getValue();
                    for (int subpathSize = 2; subpathSize < pathSize; subpathSize++) {
                        List<String> subpath = path.subList(0, subpathSize);
                        List<String> remainder = path.subList(subpathSize, pathSize); 
                        SortedSet<List<String>> subequivalents = data.get(subpathSize).get(subpath);
                        if (subequivalents != null && subequivalents.size() > 1) {
                            List<String> subequivalent = subequivalents.first();
                            if (subequivalent.size() < subpathSize) {
                                List<String> replacement = join(subequivalent, remainder);
                                if (equivalents.add(replacement)) {
                                    data.get(replacement.size()).put(replacement, equivalents);
                                }
                            }
                        }
                    }
                }
            }
    
            shortened.clear();
            for (Map<List<String>, SortedSet<List<String>>> singleLengthData : data) {
                for (Map.Entry<List<String>, SortedSet<List<String>>> entry : singleLengthData.entrySet()) {
                    List<String> path = entry.getKey();
                    List<String> shortenedPath = entry.getValue().first();
                    shortened.put(path, shortenedPath);
                }
            }
        }
    
        List<String> join(List<String> path, List<String> elements) {
            String[] array = path.toArray(new String[path.size() + elements.size()]);
            for (int i = 0; i < elements.size(); i++) {
                array[path.size() + i] = elements.get(i);
            }
            return Arrays.asList(array);
        }
    
        List<String> shorten(List<String> path) {
            List<String> shortPath = path;
            for (int subpathSize = path.size(); subpathSize > 2; subpathSize--) {
                List<String> subpath = path.subList(0, subpathSize);
                List<String> shortSubpath = shortened.get(subpath);
                if (shortSubpath != null && shortSubpath.size() < subpathSize) {
                    List<String> remainder = path.subList(subpathSize, path.size());
                    List<String> replacement = join(shortSubpath, remainder);
                    if (replacement.size() < shortPath.size())
                        shortPath = replacement;
                }
            }
            return shortPath;
        }
    
        final static Logger LOGGER = LoggerFactory.getLogger(PdfCompare.class);
        final PdfDictionary trailer1;
        final PdfDictionary trailer2;
        final Map<Pair<PdfObject, PdfObject>, Set<List<String>>> compared = new HashMap<>();
        final List<Difference> differences = new ArrayList<>();
        final Map<List<String>, List<String>> shortened = new HashMap<>();
        final static Comparator<List<String>> pathComparator = new Comparator<List<String>>() {
            @Override
            public int compare(List<String> o1, List<String> o2) {
                int compare = Integer.compare(o1.size(), o2.size());
                if (compare != 0)
                    return compare;
                for (int i = 0; i < o1.size(); i++) {
                    compare = o1.get(i).compareTo(o2.get(i));
                    if (compare != 0)
                        return compare;
                }
                return 0;
            }
        };
    }
    

    (PdfCompare.java)

    The tool to use this code for revision comparison is a subclass thereof:

    public class PdfRevisionCompare extends PdfCompare {
        public static void main(String[] args) throws IOException {
            for (String arg : args) {
                System.out.printf("\nComparing revisions of: %s\n***********************\n", args[0]);
                try (PdfDocument pdfDocument = new PdfDocument(new PdfReader(arg))) {
                    SignatureUtil signatureUtil = new SignatureUtil(pdfDocument);
                    List<String> signatureNames = signatureUtil.getSignatureNames();
                    if (signatureNames.isEmpty()) {
                        System.out.println("No signed revisions detected. (no AcroForm)");
                        continue;
                    }
                    String previousRevision = signatureNames.get(0);
                    PdfDocument previousDocument = new PdfDocument(new PdfReader(signatureUtil.extractRevision(previousRevision)));
                    System.out.printf("* Initial signed revision: %s\n", previousRevision);
                    for (int i = 1; i < signatureNames.size(); i++) {
                        String currentRevision = signatureNames.get(i);
                        PdfDocument currentDocument = new PdfDocument(new PdfReader(signatureUtil.extractRevision(currentRevision)));
                        showDifferences(previousDocument, currentDocument);
                        System.out.printf("* Next signed revision (%d): %s\n", i+1, currentRevision);
                        previousDocument.close();
                        previousDocument = currentDocument;
                        previousRevision = currentRevision;
                    }
                    if (signatureUtil.signatureCoversWholeDocument(previousRevision)) {
                        System.out.println("No unsigned updates.");
                    } else {
                        showDifferences(previousDocument, pdfDocument);
                        System.out.println("* Final unsigned revision");
                    }
                    previousDocument.close();
                }
            }
        }
    
        static void showDifferences(PdfDocument previousDocument, PdfDocument currentDocument) {
            PdfRevisionCompare pdfRevisionCompare = new PdfRevisionCompare(previousDocument, currentDocument);
            pdfRevisionCompare.compare();
            List<Difference> differences = pdfRevisionCompare.getDifferences();
            if (differences == null || differences.isEmpty()) {
                System.out.println("No differences found.");
            } else {
                System.out.printf("%d differences found:\n", differences.size());
                for (Difference difference : differences) {
                    for (String element : difference.getPath()) {
                        System.out.print(element);
                    }
                    System.out.printf(" - %s\n", difference.getDescription());
                }
            }
        }
    
        public PdfRevisionCompare(PdfDocument pdfDocument1, PdfDocument pdfDocument2) {
            super(pdfDocument1, pdfDocument2);
        }
    }
    

    (PdfRevisionCompare.java)