I am not an expert in any sense, I am trying to extract a pdf page as an image to do some processing later. I used the following code for that, that I built from other recommendations in this page.
import fitz
from PIL import Image
dir = r'C:\Users\...'
files = os.listdir(dir)
print(dir+files[21])
doc = fitz.open(dir+files[21])
page = doc.loadPage(2)
zoom = 2
mat = fitz.Matrix(zoom, zoom)
pix = page.getPixmap(matrix = mat)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
density=img.getdata()
Usually this would give me the pixel information of the image, but in this case it returns a list of white pixels. I have no clue as for what is the reason of this... The image (img) is displayed if asked, but not its data.
I will appreciate any help?
The code works if I take a shorter path and replace doc.loadPage with doc.getPagePixmap
import fitz
from PIL import Image
dir = r'C:\Users\...'
files = os.listdir(dir)
print(dir+files[21])
doc = fitz.open(dir+files[21])
pix= doc.getPagePixmap(2)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
density=img.getdata()
I still don't know why the long code fails, and the working method doesn't allows me to get a better resolution version of the exctracted page.