Search code examples
pymupdf

PyMUPDF - How to convert PDF to image, using the original document settings for the image size and set to 300dpi?


I'm currently looking at using the python package PyMuPDF for a workflow that converts PDF's to images (In my case, .TIFF files).

I am trying to mimic the behaviour of another program that I currently use for PDF -> Image conversion. In that program, it lets you set the settings for imaging as below:

Image Output Quality (DPI): (Defaults to 300dpi)

Basic Image Size: Original setting - renders the image with the original document settings.

My question is, is this possible within PyMuPDF? How can I set the output DPI for my images to 300 and set the image size to the original document size? I am quite new to dealing with this sort of processing for PDF's/images so any help would be much appreciated.

Thanks in advance,


Solution

  • PyMuPDF is wrapped around MuPDF

    It has many powerful pdf manipulation options which include the ability to set page scale and resolution of page image outputs.

    However MuPDF does support Tiff input but not natively export to single or multipage Tiff, thus would need an additional conversion from say multiple PNG which is native.

    The range of current inputs and outputs

    Input   Output  Description
    JPEG    -       Joint Photographic Experts Group
    BMP     -       Windows Bitmap
    JXR     -       JPEG Extended Range
    JPX     -       JPEG 2000
    GIF     -       Graphics Interchange Format
    TIFF    -       Tagged Image File Format
    PNG     PNG     Portable Network Graphics
    PNM     PNM     Portable Anymap
    PGM     PGM     Portable Graymap
    PBM     PBM     Portable Bitmap
    PPM     PPM     Portable Pixmap
    PAM     PAM     Portable Arbitrary Map
    -       PSD     Adobe Photoshop Document
    -       PS      Adobe Postscript
    

    to export to tiff you would need say PIL/Pillow along the lines of

    from PIL import Image
    import fitz
    
    pix = fitz.Pixmap(...)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    img.save("output.tif", "TIFF")
    
    

    However for storing single pages to muti-page you will need to experiment with PILlow settings.

    [Update]

    I see you asked this question in PyMuPDF and for others benefit the answer was

    Sounds like you will create a so-called "pixmap" for each page and save that as an image. PyMuPDF itself only support a handful of image output formats, the most popular being PNG, others are the PNM-type images. If you want to use others, you must use an additional package, presumably PIL/Pillow. PyMuPDF supports Pillow directly via its pixmap output methods. So a code snippet may look like this:

    import fitz
    mat = fitz.Matrix(300 / 72, 300 / 72)  # sets zoom factor for 300 dpi
    doc = fitz.open("yourfile.pdf")
    for page in doc:
        pix = page.get_pixmap(matrix=mat)
        img_filename = "page-%04i.tiff" % page.number
        pix.pil_save(img_filename, format="TIFF", dpi=(300,300), ... more PIL parameters)
    

    For more sophistication on PIL output, please consult their documentation. For example, TIFF supports multiple images in one file.