Search code examples
javapdfpdfbox

Alter CropBox and text color of pdf


I have a pdf that a nearly impossible to read on my ereader. The margins are large which makes the text small. The text color is light grey which is rendered very faintly on the screen of my device.

I think I can make the file more legible by changing:

  • CropBox from 0 0 612 792 to 60 70 525 725
  • text color from 0.314 0.314 0.314 rg to 0 0 0 rg

I can clearly see these parts in the pdfbox debugger app. Unfortunately it gives me a read-only view. Is there a way to load the pdf and copy all of its parts to a new pdf while altering only the CropBox and text color?

% curl -O https://repo1.maven.org/maven2/org/apache/pdfbox/debugger-app/2.0.25/debugger-app-2.0.25.jar
% java -jar debugger-app-2.0.25.jar

screenshot of pdfbox debugger-app with highlighted CropBox and rg


Solution

  • This dirty hack seems to work, but I feel like there has to be a better way.

    Decode the pdf

    % curl -O https://repo1.maven.org/maven2/org/apache/pdfbox/pdfbox-app/2.0.25/pdfbox-app-2.0.25.jar
    % java -jar pdfbox-app-2.0.25.jar WriteDecodedDoc input.pdf decoded.pdf
    

    decoded.pdf now contains the same contents as the original pdf, but FlatDecode is no longer applied. This means the content is "legible".

    Edit decoded.pdf with hexfiend

    • Find: 0.314 0.314 0.314 rg
      Replace: 0.000 0.000 0.000 rg
    • Find: /CropBox [0.0 0.0 612.0 792.0]
      Replace: /CropBox [60 70 525 725.00000]

    The file is now very large ... so open it in preview end export to pdf. This way of re-compressing sadly throws away the table of contents.

    Screenshot of find and replace with hexfiend

    I've since found krop, a simple graphical tool to crop the pages of PDF files for linux. It is much easier than editing the file in a hex editor. Doesn't change text color though.