Im having an issue when merging multiple pdf's because i do have to loop between folders and merge the two files that match. this was easily done but when i:
input1.append(file1)
input2.append(file2)
PDFFileMerger.write(output)
the merging occurs, but the next iteration includes the previous inputs and so on, making the last one a huge pdf file que the occurrences repeating on each other
for i in range(nPdfs):
abr = onlypdf[i]
abr = abr.replace('.pdf', '')
for j in range(nXl):
pdf_file = open('SEPTIEMBRE DE 2020/' + onlyfiles[j], 'rb')
read_pdf = pdf.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(0)
page_content = page.extractText()
if abr in page_content:
file1 = onlypdf[i]
file2 = onlyfiles[j]
print(file1)
print(file2)
print(file1+' esta en '+file2)
input1 = open('Combinadora/documentos/'+file1, 'rb')
input2 = open('SEPTIEMBRE DE 2020/'+file2, 'rb')
merger.append(input1)
merger.append(input2)
input1.close()
input2.close()
print('archivo creado')
output = open(abr+'-'+file2, 'wb')
merger.write(output)
output.close()
This is my code, am i screwing it in the loop?
PyPDF is a great library but I had some problems too with memory. So generally I used separate processes creating the merger (killed after job) or you can delete (del) the actual object. Keep in mind that even if you find a tricky way to surpass this problem, memory leaks can happen so I strongly suggest creation and killing of processes.