Search code examples
phppdfghostscriptdompdf

Is there a way to add links to merged pdfs with dompdf & ghostscript


I am trying to create a contents page for several PDF documents, that will include links to the start of each (merged with GS) pdf.

At the moment I have: A HTML page that acts as a contents page, to be converted vis domPDF (this part works) Several section divider pages (PDFs converted with domPdf) that have section X anchors inside Additional PDF documents to be merged to create 1 large PDF with a contents page.

I am running GS on the shell to process the merge of the PDF docs:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=save_path/final.pdf contents.pdf section1.pdf brochure1.pdf section2.pdf brochure2.pdf back.pdf

GS successfully merges the PDF docs into 1 large PDF.

However, the links do not work.

It seems that the PDF cannot link to documents where the destination is outside of its original file.

If I add a link in the contents file, and the destination is is the same contents file..the final output PDF renders the links and they operate as desired.

So, my question is, is it possible to include a link in a merged PDF docu that will link to other, merged PDF files.

Any pointers or suggestions would be most helpful.

Thanks

David


Solution

  • Yes, the problem here is that Ghostscript cannot know how many pages there will be in the final file when it processes the first file, and more importantly what the object numbers of those pages are going to be.

    Now a /Dest for a Link annotation might be something like [page /XYZ left top zoom] the 'page' in this case is a page object, that's an indirect reference to a PDF object, so a /Link on page 1 which references page 2 might look something like:

    [18 0 R /XYZ 0 792 1]

    if we assume that page 2 is object number 18 in the output PDF file.

    When handling annotations, the PDF interpreter executes them as the last thing it does with the input PDF file. This means that all the pages are complete, so the pdfwrite device knows that page 2 has (eg) object number 18. So there's no problem in figuring out which page is associated with which object number.

    But in your case, you are running the first file completely, then running the annotations (before executing the second file). At that time, one or more of the links is pointing to a page which doesn't yet exist. Since there's no way to know what the object number of that page is going to be when the subsequent files are executed, there's no way that the pdfwrite device can process the Link annotation.

    So I'm afraid you cannot trivially do what you want with Ghostscript. In point of fact, I can't see how you can even get your contents file to legally have Links of this kind on it.

    You can do it, after a fashion, but its much harder than just stringing the files together. You could leave all the Link annotations off the first page, process all the PDF files together, and then send a load of pdfmark instructions after processing all the PDF files, which describe the Link annotations you want to create.

    I could be missing the point of course; you haven't supplied any example files to look at so I can't tell what kind of Link annotations and Dests your file is using at present.

    In passing let me note that the pdfwrite device does not 'merge' PDF files, its a much more complex process. You can find the process documented here and I think its worth reading so you can get some idea of the abilities and limitations of the device in this case.

    Fundamentally Ghostscript and the pdfwrite decice aren't intended as PDF editing or manipulation tools.