I have two folders with PDF's of identical file names. I want to iterate through the first folder, get the first 3 characters of the filename, make that the 'current' page name, then use that value to grab the 2 corresponding PDF's from both folders, merge them, and write them to a third folder.
The script below works as expected for the first iteration, but after that, the subsequent merged PDF's include all the previous ones (ballooning quickly to 72 pages within 8 iterations).
Some of this could be due to poor code, but I can't figure out where that is, or how to clear the inputs/outputs that could be causing the failure to write only 2 pages per iteration:
import os
from PyPDF2 import PdfFileMerger
merger = PdfFileMerger()
rootdir = 'D:/Python/Scatterplots/BoundaryEnrollmentPatternMap'
for subdir, dirs, files in os.walk(rootdir):
for currentPDF in files:
#print os.path.join(file[0:3])
pagename = os.path.join(currentPDF[0:3])
print "pagename is: " + pagename
print "File is: " + pagename + ".pdf"
input1temp = 'D:/Python/Scatterplots/BoundaryEnrollmentPatternMap/' + pagename + '.pdf'
input2temp = 'D:/Python/Scatterplots/TraditionalScatter/' + pagename + '.pdf'
input1 = open(input1temp, "rb")
input2 = open(input2temp, "rb")
merger.append(fileobj=input1, pages=(0,1))
merger.append(fileobj=input2, pages=(0,1))
outputfile = 'D:/Python/Scatterplots/CombinedMaps/Sch_' + pagename + '.pdf'
print merger.inputs
output = open(outputfile, "wb")
merger.write(output)
output.close()
#clear all inputs - necessary?
outputfile = []
output = []
merger.inputs = []
input1temp = []
input2temp = []
input1 = []
input2 = []
print "done"
My code / work is based on this sample:
https://github.com/mstamy2/PyPDF2/blob/master/Sample_Code/basic_merging.py
I think that the error is that merger
is initialized before the loop and it accumulates all the documents. Try to move line merger = PdfFileMerger()
into the loop body. merger.inputs = []
doesn't seem to help in this case.
There are a few notes about your code:
input1 = []
doesn't close file. It will result in many files, which are opened by the program. You should call input1.close()
instead.
[] means an empty array. It is better to use None if a variable should not contain any meaningful value.
To remove a variable (e.g. output
), use del output
.
After all, clearing all variables is not necessary. They will be freed with garbage collector.
Use os.path.join to create input1temp and input2temp.