The pdf files that I need to convert will have images in table. I want to convert the text as well as the images in tables of pdf to excel. Please suggest me suitable libraries for it.
You can use PikePDF to extract the images from the pdf:
from pikepdf import Pdf, PdfImage
filename = "sample.pdf"
example = Pdf.open(filename)
for i, page in enumerate(example.pages):
for j, (name, raw_image) in enumerate(page.images.items()):
image = PdfImage(raw_image)
out = image.extract_to(fileprefix=f"{filename}-page{i:03}-img{j:03}")
After extracting the image you can then use OCR to convert the image to a table