Search code examples
python-3.xocr

convert pdf file pages to images - Wand


Beginner here:

My code runs fine when I use it for just one pdf but as soon as I add a for loop, the code still runs but it just converts the first page of the pdf in multipage pdfs instead of all.

For example, if my pdf is xyz.pdf with 2 pages it will convert both pages as jpg and output it separately. But as soon as I run my code for both pdf xyz and abc, it just converts the first page of both the pdfs.

What am I missing here?

from wand.image import Image as wi

for pdf_file in os.listdir(pdf_dir):                               
  if pdf_file.endswith(".pdf"):
   pdf = wi(filename= os.path.join(pdf_dir, pdf_file), resolution=300)
   pdfimage = pdf.convert("jpeg")
   i=1
   for img in pdfimage.sequence:
     page = wi(image=img)
     page.save(filename=os.path.join(pdf_dir, str(pdf_file[:-4] +".jpg")))
     i +=1

Solution

  • works for me with:

    def convert_pdf(filename, output_path, resolution=150):
        all_pages = wi(filename=filename, resolution=resolution)
        for i, page in enumerate(all_pages.sequence):
            with wi(page) as img:
                image_filename = os.path.splitext(os.path.basename(filename))[0]
                image_filename = '{}-{}.jpg'.format(image_filename, i)
                image_filename = os.path.join(output_path, image_filename)
    
                img.save(filename=image_filename)
    
    
    for pdf_file in os.listdir(pdf_dir):
        if pdf_file.endswith(".pdf"):
            convert_pdf(os.path.join(pdf_dir, pdf_file), pdf_dir)