converting pdf file to excel with images in the table using python

The pdf files that I need to convert will have images in table. I want to convert the text as well as the images in tables of pdf to excel. Please suggest me suitable libraries for it.

Solution

You can use PikePDF to extract the images from the pdf:

from pikepdf import Pdf, PdfImage

filename = "sample.pdf"
example = Pdf.open(filename)

for i, page in enumerate(example.pages):
    for j, (name, raw_image) in enumerate(page.images.items()):
        image = PdfImage(raw_image)
        out = image.extract_to(fileprefix=f"{filename}-page{i:03}-img{j:03}")

After extracting the image you can then use OCR to convert the image to a table