I was trying to automate adding title, bookmarks and such to some PDFs I need. The way I came up with was to create a simple pdfmark
script like this:
% pdfmark.ps
[ /Title (My document)
/Author(Me)
/DOCINFO pdfmark
[ /Title (First chapter)
/Page 1
/OUT pdfmark
Then generate a new PDF with ghostscript using:
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf pdfmark.ps
If in.pdf
doesn't have any pdfmark
data it works fine, however if it does things don't work out nicely: for example title/author aren't modified and bookmarks are appended instead of replaced.
Since I don't want to mess around modifying the PDF's corresponding postscript, I was trying to find if there is some command to add to pdfmark.ps
that can delete (or overwrite) previous metadata.
I'll leave PostScript to others and show how to remove a PDF outline using the qpdf package (for qpdf
and fix-qdf
) and GNU sed
.
From the qpdf
manual:
In QDF mode, qpdf creates PDF files in what we call QDF form. A PDF file in QDF form, sometimes called a QDF file, is a completely valid PDF file that has
%QDF-1.0
as its third line (after the pdf header and binary characters) and has certain other characteristics. The purpose of QDF form is to make it possible to edit PDF files, with some restrictions, in an ordinary text editor.
(For a non-GNU/Linux system adapt the commands below.)
qpdf --qdf --compress-streams=n --decode-level=generalized \
--object-streams=disable -- in.pdf - |
sed --binary \
-e '/^[ ][ ]*\/Outlines [0-9][0-9]* [0-9] R/ s/[1-9]/0/g' |
fix-qdf > tmp.qdf
qpdf --coalesce-contents --compression-level=9 \
--object-streams=generate -- tmp.qdf out.pdf
where:
qpdf
command converts the PDF file to QDF form for editingsed
orphans outlines in the QDF file by rooting them at non-existing obj
0fix-qdf
repairs the QDF after editingqpdf
converts and compresses QDF to PDFqpdf
input cannot be pipelined, it needs to seek
The sed
command changes digits to zeros in the line containing
the indented text /Outlines
.
Note that GNU sed
is used for the non-standard --binary
option
to avoid mishaps on an OS distinguishing between text and binary files.
Similarly, to strip annotations replace /Outlines
with /Annots
in
the -e
above, or insert it in a second -e
option to do both.
Another patch utility than sed
will do; often just one byte has
to be changed.
To quickly strip all non-page data (docinfo, outlines a.o. but not
annotations) qpdf
's --empty
option may be useful:
qpdf --coalesce-contents --compression-level=9 \
--object-streams=generate \
--empty --pages in.pdf 1-z -- out.pdf