Search code examples
javascriptpdfreplace

How to program a text search and replace in PDF files


How would I be able to programmatically search and replace some text in a large number of PDF files? I would like to remove a URL that has been added to a set of files. I have been able to remove the link using javascript under Batch Processing in Adobe Pro, but the link text remains. I have seen recommendations to use text touchup, which works manually, but I don't want to modify 1300 files manually.


Solution

  • Finding text in a PDF can be inherently hard because of the graphical nature of the document format -- the letters you are searching for may not be contiguous in the file. That said, CAM::PDF has some search-replace capabilities and heuristics. Give changepagestring.pl a try and see if it works on your PDFs.

    To install:

     $ cpan install CAM::PDF
     # start a new terminal if this is your first cpan module
     $ changepagestring.pl input.pdf oldtext newtext output.pdf