Search code examples
pythonimagepdfimagemagickbatch-processing

Convert multiple multipage PDFs to JPGs in subfolders


Simple use case:

  • A folder with many (mostly multipage) PDF files.
  • A script should convert each PDF page to JPG and store it in a subfolder named after the PDF filename. (e.g. #33.pdf to folder #33)
  • Single JPG files should also have this filename plus a counter mirroring the sequential page number in the PDF. (e.g. #33_001.jpg)

I found a bounch of related questions, but nothing that quite does what I want, e.g.

How do I convert multiple PDFs into images from the same folder in Python?

A python script would work fine, but also any other way to do this in Win10 (imagemagick, e.g.) is cool with me.


Solution

  • Your comment requests how a batch can do as required, for simplicity the following only processes a single file so Python will need to loop through a folder and call with each name in turn. That could be done by adding a "for loop" in batch but first see where problems arise, as many of my single test files threw differing errors.

    I have tried to cover several fails in this batch file, in my system, but there can still be issues such as a file that has no valid fonts to display enter image description here

    For most recent poppler windows 64bit utils see https://github.com/oschwartz10612/poppler-windows/releases/ for 32 bit use xpdf latest version http://www.xpdfreader.com/download.html but that has direct pdftopng.exe so needs a few edits.

    pdf2dir.bat

    @echo off
    set "bin=C:\Apps\PDF\poppler\22.11.0\Library\bin"
    set "res=200"
    REM for type use one of 3 i.e. png jpeg jpegcmyk (PNG is best for documents)
    set "type=png"
    
    if exist "%~dpn1\*.%type%" echo: &echo Files already exist in "%~dpn1" skipping overwrite&goto pause
    if not exist "%~dpn1.pdf" echo: &echo "%~dpn0" File "%~dpn1.pdf" not found&goto pause
    
    if not exist "%~dpn1\*.*" md "%~dpn1"
    
    REM following line deliberately opens folder to show progress delete it or prefix with REM for blind running
    explorer "%~dpn1"
    
    "%bin%\pdftoppm.exe" -%type% -r %res% "%~dpn1.pdf" "%~dpn1\%~n1"
    if %errorlevel%==1 echo: &echo Build of %type% files failed&goto pause
    if not exist "%~dpn1\*.%type%" echo: &echo Build of %type% files failed&goto pause
    
    :pause
    echo:
    pause
    :end
    
    
    • It requires Poppler binaries path to pdftoppm be correctly set in the second line
    • It can be placed wherever desired i.e. work folder or desktop
    • It allows for drag and drop of one pdf on top will (should work) without need to run in console
    • Can be run in a command console and place a space character after, you can drag and drop a single filename but any spaces in name must be "double quoted"
    • can be run from any shell or OS command as "path to/batchfile.bat" "c:\path to\file.pdf"