Search code examples
bashfindexecpdftoolspdfjam

Recursively(many subdirs) find pdf files and merge into one pdf file (linux, bash)


Surprisingly I have seen many help pages on how to do this, from the same directory. Those that are recursively used don't seem to work for me (the tries below), or require complications I don't want to utilize as I don't understand them (even worse than these).

Summarily, I have pdfs scattered in many subdirs and want to go through each one and join the pdfs into one big pdf.

These mostly came from:

https://unix.stackexchange.com/questions/298031/compress-all-pdf-files-recursively

Merge / convert multiple PDF files into one PDF

First attempt: (This works great - but only from within a directory):

qpdf --empty --pages *.pdf -- out.pdf

at top level directory, this didn't work:

find . -type f -name "*.pdf" -exec bash -c 'qpdf --empty --pages "{}" -- merged.pdf;' {} \;

Second attempt:

find . -type f -name "*.pdf" | while read -r file; do pdfjam "$file" -o output.pdf; done

or

touch output.pdf
find . -type f -name "*.pdf" | while read -r file; do pdfjam "$file" output.pdf -o output.pdf; done

Third attempt:

find . -type f -name "*.pdf" -exec bash -c 'pdftk "{}" cat output "new.pdf";' {} \;

or

touch new.pdf    
find . -type f -name "*.pdf" -exec bash -c 'pdftk "{}" new.pdf cat output "new.pdf";' {} \;

Fourth attempt:

python3 -m pip install --user pdftools
pdftools merge --input-dir ./top_directory --output out.pdf

  usage: pdftools [-h] [-V] <command> ...
  pdftools: error: unrecognized arguments: --input-dir

Fifth attempt (seems most successful, though output file only has pages of first file):

 find . -type f -name "*.pdf" -exec bash -c 'gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf "{}";' {} \;

I was thinking about the differences with find .... {} \; or find .... {} + so I tried this also,

Sixth attmpt:

find . -type f -name "*.pdf" -exec bash -c 'gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf ;' {}  +

which produced a blank page.

It is clear to me that I have trouble concatenating the files - probably using the find -exec command, and there is no issue with the various tools....

EDIT

I guess I could do a two-step procedure,

find . -name "*pdf" -exec mv {} pdfs \;
qpdf --empty --pages *.pdf -- out.pdf

but I wanted a one-liner, but more importantly know why I am using find wrong...

EDIT 2

I really just want the first page of each file, but that isn't a big deal.


Solution

  • A very simple solution , that use iname instead of name ( see man find ) .

    I push the result is in /tmp/ , to not interfere if you run the command multiple times .

    After you must copy /tmp/out.pdf where you want to be .

      qpdf --empty --pages \
         $( find . -iname '*.pdf' 2>/dev/null ) -- /tmp/out.pdf