Search code examples
pythonimageclassificationocr

how to classify if images contain text or not?


I have a lot of images extracted from Search engine, and I am use OCR to perform descent text extraction from these image, but There are images that do not contain text.

Thus I would like to determine if an image simply contains text or not in python, and if it doesn't, i wouldn't have to perform OCR on it. Ideally this method would have a high recall.


Solution

  • Use pytteseract. Something like this:

    from PIL import Image
    import pytesseract
    
    def contains_text(image_path):
        text = pytesseract.image_to_string(Image.open(image_path))
        
        if text == "":
            return False # No text detected
        else:
            return text
    

    I do not know of a way to detect that there is no text without trying to perform OCR (like above).