I have thousands of images that need to be converted and combined into multiple PDF files. Some of the images are used multiple times. I'm looking for a solution to automate this.
I have all of the .tif files named and organized in a spreadsheet. I want to use that file list and run an automated script to save myself hundreds of hours converting these files one by one.
All the files are in the same folder.
I'm not a programmer. I've tried finding some kind of documentation, code, or third-party tool, but this seems to be an uncommon task. Thanks in advance.
I'm on Windows. Below is an example of the spreadsheet. I've got no problem using formulas to make whatever format or code I need in Excel. These files are individual pages of many scanned documents. "First Page" refers to the beginning of a section. Example: 0066.tif-0068.tif is one document where 0066.tif is the title page of the document. 0070.tif-0081.tif is THREE separate documents combined into one, with 0070.tif as the title page for all three. So it would be 0070.tif-0072.tif, 0070.tif & 0073.tif-0074.tif, and 0070.tif & 0075.tif-0081.tif. 0069.tif is a single page document.
Document | Title Page | First Page | Last Page |
---|---|---|---|
P-05593.pdf | 0066.tif | 0066.tif | 0068.tif |
P-05594.pdf | 0069.tif | 0069.tif | 0069.tif |
P-05595.pdf | 0070.tif | 0071.tif | 0072.tif |
P-05596.pdf | 0070.tif | 0073.tif | 0074.tif |
P-05597.pdf | 0070.tif | 0075.tif | 0081.tif |
P-05598.pdf | 0082.tif | 0083.tif | 0084.tif |
P-05599.pdf | 0082.tif | 0085.tif | 0090.tif |
P-05600.pdf | 0091.tif | 0091.tif | 0093.tif |
P-05601.pdf | 0094.tif | 0094.tif | 0100.tif |
Update: I'm sure this is rare. I found a solution, just in case there is someone out there with a similar situation.
I used a combination of Excel and nConvert. Still working on an easier/faster way to identify which pages are which, so for now that's still a mostly manual process. BUT, after I get that list made I used Excel formulas and VBA commands to export the list as a .bat file I can run to process all the files at once. Not sure if I am able to post files, but here is a the code output for the table of files I posted before, minus a few since this is getting lengthy:
nconvert
C:\Users\username\NConvert\nconvert.exe -multi -dpi 200 -c 1 -out pdf -o P-05593.PDF -n 0066 0068 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -dpi 200 -c 1 -out pdf -o P-05594.PDF 0069.tif
C:\Users\username\NConvert\nconvert.exe -multi -dpi 200 -c 1 -out pdf -o P-05595.PDF -n 0070 0072 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -overwrite -out pdf -o TEMPA 0070.tif
C:\Users\username\NConvert\nconvert.exe -overwrite -multi -out pdf -o TEMPB -n 0073 0074 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -overwrite -D -multi -out pdf -dpi 200 -c 1 -xall -o P-05596.PDF TEMPA.pdf TEMPB.pdf
C:\Users\username\NConvert\nconvert.exe -overwrite -out pdf -o TEMPA 0070.tif
C:\Users\username\NConvert\nconvert.exe -overwrite -multi -out pdf -o TEMPB -n 0075 0081 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -overwrite -D -multi -out pdf -dpi 200 -c 1 -xall -o P-05597.PDF TEMPA.pdf TEMPB.pdf
C:\Users\username\NConvert\nconvert.exe -multi -dpi 200 -c 1 -out pdf -o P-05598.PDF -n 0082 0084 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -overwrite -out pdf -o TEMPA 0082.tif
C:\Users\username\NConvert\nconvert.exe -overwrite -multi -out pdf -o TEMPB -n 0085 0090 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -overwrite -D -multi -out pdf -dpi 200 -c 1 -xall -o P-05599.PDF TEMPA.pdf TEMPB.pdf
C:\Users\username\NConvert\nconvert.exe -multi -dpi 200 -c 1 -out pdf -o P-05600.PDF -n 0091 0093 1 "####.tif"
C:\Users\username\NConvert\nconvert.exe -multi -dpi 200 -c 1 -out pdf -o P-05601.PDF -n 0094 0100 1 "####.tif"
Cut down my processing time by a huge amount. 5,759 pages in 1,169 documents, all compiled in less than an hour with this method. Normally would have taken me weeks.
Edit:
Okay. I am returning to this thread because I solved my issue, but as cday pointed out in my post on Stack Overflow It looks to be overcomplicated. They seem to be invested in the solution so I'll try to explain what I did here, and I'll post an attachment if possible.
As stated before, I have multiple folders with 5,000+ raw .tif page scans each. Odd choice of filetype since whoever scanned them did not save with multiple pages, only one page per file. These scans are of numbered documents with title pages, some single with its own title page, others multiple that share a title page. I needed a way to break out these individual pages into organized lists for each document number, sometimes re-using those multiple-document title pages, so a straight file list probably would not have worked. At least not easily.
My initial and very slow process with this task was to manually scroll through each scan and type its file name into a spreadsheet. Then, highlight those files in explorer, right click, and combine in Adobe. Save as. Name the file accordingly. After all 1,600+ documents were done, I needed to spot check to make sure I didn't miss any. I always did. So this solution would not only GREATLY speed up the process, but it would ensure accuracy as well.
Long story short, I coded the following with VBA in Excel (with great effort, trial and error, and frustration):
I guess it's a bit difficult to explain without context. I am working on simplifying the formulas and the VBA code because this thing is UGLY. I was much more focused on function, not elegance. I'd attach the file but it looks like I'm not able to post Excel files. Link to my Dropbox below.
From cday https://newsgroup.xnview.com/index.php
I sent you an email a few days ago, I don't know if you received it?
Your problem interested me and I developed simple code, a single line .bat to batch convert suitably formatted output from Excel. I haven't posted it yet as it requires a bug fix in an NConvert option, and probably a small enhancement to another option, both of which could be possible reasonably soon.
Looking at you code above quickly, it looks unnecessarily complex but I'll reserve judgement on that.
If you are interested probably better to contact me there, or maybe directly by PM or email.
Edit:
For the benefit of anyone else who might be interested, this was Michael Clark's original post on the XnView forum under a different username:
https://newsgroup.xnview.com/viewtopic.php?f=57&t=42567
And this was my draft solution later in the same thread: