Search code examples
pdfpdf-generationghostscript

Combining PDF with GhostScript: Using Original Bookmarks with corrected page numbers


I am using

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf  -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf

to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.

I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...


Solution

  • As so often the case, someone has walked the same path before you...

    unfolding disasters has worked out a solution based on https://ubuntuforums.org/showthread.php?t=1545064 to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...

    However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.

    [Edit by K J] I have updated dead links above to web archived sources but the code was expanded later on to be used in "r-XMPDF" so for those interested see that method here https://github.com/trevorld/r-xmpdf?tab=readme-ov-file#add-xmpdocinfo-metadata-and-bookmarks-to-a-pdf

    {xmpdf} provides functions for getting and setting Extensibe Metadata Platform (XMP) metadata in a variety of media file formats as well as getting and setting PDF documentation info entries and bookmarks (aka outline aka table of contents).