I have a lot of images extracted from Search engine, and I am use OCR to perform descent text extraction from these image, but There are images that do not contain text.
Thus I would like to determine if an image simply contains text or not in python, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.
Use pytteseract. Something like this:
from PIL import Image
import pytesseract
def contains_text(image_path):
text = pytesseract.image_to_string(Image.open(image_path))
if text == "":
return False # No text detected
else:
return text
I do not know of a way to detect that there is no text without trying to perform OCR (like above).