Search code examples
javapdfpdfbox

How to get correct page numbering when merging PDFs with PDFBox?


I am merging multiple PDFs with PDFMergerUtility of PDFBox.

        try (PDDocument result = new PDDocument()) {
          result.setVersion(1.5f);
          PDFMergerUtility merger = new PDFMergerUtility();

          for (PrintableDocument pd : ivDocuments) {
            if (pd.getData() == null)
              continue;
            try (PDDocument pdd = PDDocument.load(pd.getData())) {
              merger.appendDocument(result, pdd);
            }
          }

          result.save(os);
        }

This works fine except for one detail. In some PDF viewers (Firefox, SumatraPDF, Chrome), the page numbers are shown incorrectly. For exmaple, if I merge three documents with three pages each, the resulting page numbers are:

1
2
3
1
2
3
1
2
3

instead of

1
2
3
4
5
6
7
8
9

The acffected viewers seem to be extracting the page number information from some metadata in the PDF instead of calculating it themselves.

Is there a way to fix this with PDFBox?


Solution

  • Thanks to Codo I ended up with the following solution:

              PDPageLabels pageLabels = new PDPageLabels(result);
              PDPageLabelRange pageLabelRange = new PDPageLabelRange();
              pageLabelRange.setStyle(PDPageLabelRange.STYLE_DECIMAL);
              pageLabelRange.setStart(1);
              pageLabels.setLabelItem(0, pageLabelRange);
              result.getDocumentCatalog().setPageLabels(pageLabels);
    

    Here are more examples for creating page labels: https://simplesolution.dev/creating-pdf-document-page-labels-in-java-with-apache-pdfbox/