I have a few thousand PDFs that I need merged based on filename.
Named like:
Lastname, Firstname_12345.pdf
Instead of overwriting or appending, our software appends a number/datetime to the pdf if there are additional pages like:
Lastname, Firstname_12345_201305160953344627.pdf
For all the ones that don't have a second (or third) pdf the script doesn't need to touch. But, for all the ones that have multiples, they need to be merged into a new file *_merged.pdf
? and the originals deleted.
I gave this my best effort and this is what I have so far.
#! /bin/bash
# list all pdfs to show shortest name first
LIST=$(ls -r *.pdf)
for x in "$LIST"
# Remove .pdf extension. merge pdfs. delete originals.
do
y=${x%%.*}
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
done
This script works to a certain extent. It will merge and delete the originals, but it doesn't have anything in it to skip ones that don't need anything appended to them, and when I run it in a folder with several test files it stops after one file. Can anyone point me in the right direction?
Since your file names contain spaces the for loop won't work as is.
Once you have a list of file names, a test on the number of files matching y*.pdf
to determine if you need to merge the pdfs.
#!/bin/bash
LIST=( * )
# Remove .pdf extension. merge pdfs. delete originals.
for x in "${LIST[@]}" ; do
y=${x%%.pdf}
if [ $(ls "$y"*.pdf 2>/dev/null | wc -l ) -gt 1 ]; then
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
fi
done