Search code examples
anacondapython-3.6pdf-conversionpoppler

Converting PDFs to Images using Poppler. It works for one file, but not when looping through a folder of PDFs?


I'm working to convert a folder of PDFs to JPG images. I'm using Poppler and when I have it working at one image at a time it works. However, when looping through each PDF file within a folder it gives the following error: "PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?" Since I know Poppler is installed and in PATH (and working for a single sample) I'm wondering if my looping is somehow causing the issue? For context, please note this is being down in Windows on Spyder in Anaconda; python-3.6.

from pdf2image import convert_from_path 
import os

for filename in os.listdir(r".\I. Original Data\PDF Files"):
    if filename.endswith(r".pdf"):
        with open(os.path.join(r".\I. Original Data\PDF Files", filename)) as f:
            pages = convert_from_path(f, 1000)
            image_counter = 1
            for page in pages:
                filename = ".\II. Transformation\page_" + str(image_counter) + ".jpg"
                page.save(filename, 'JPEG')
                image_counter = image_counter + 1 

I'm hoping to have a final product that can output each page of each PDF into a separate file within the referenced 'Transformation' folder.

Thank you!


Solution

  • for filename in os.listdir(r".\I. Original Data\PDF Files"):    
        filepath = (".\I. Original Data\PDF Files\\" + filename)
        pages = convert_from_path(filepath, 1000)
        image_counter = 1
        for page in pages:
            file = ".\II. Transformation\\" + str(os.path.splitext(filename)[0]) + "_page_" + str(image_counter) + ".jpg"
            page.save(file, 'JPEG')
            image_counter = image_counter + 1