Search code examples
pythonpython-imaging-librarygetpixel

Lost information getting pdf page as image


I am not an expert in any sense, I am trying to extract a pdf page as an image to do some processing later. I used the following code for that, that I built from other recommendations in this page.

import fitz
from PIL import Image


dir = r'C:\Users\...'
files =  os.listdir(dir)
print(dir+files[21])
doc = fitz.open(dir+files[21])
page = doc.loadPage(2)
zoom = 2
mat = fitz.Matrix(zoom, zoom)
pix = page.getPixmap(matrix = mat)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

density=img.getdata()

Usually this would give me the pixel information of the image, but in this case it returns a list of white pixels. I have no clue as for what is the reason of this... The image (img) is displayed if asked, but not its data.

I will appreciate any help?


Solution

  • The code works if I take a shorter path and replace doc.loadPage with doc.getPagePixmap

    import fitz
    from PIL import Image
    
    
    dir = r'C:\Users\...'
    files =  os.listdir(dir)
    print(dir+files[21])
    doc = fitz.open(dir+files[21])
    pix= doc.getPagePixmap(2)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    
    density=img.getdata()
    

    I still don't know why the long code fails, and the working method doesn't allows me to get a better resolution version of the exctracted page.