Search code examples
pythonpython-3.ximage-recognition

Detect text in an image


I'm working on a Python project. One of the functionalities i need to create is to be able to detect whether an image has text. I do not need any kind of bounding box, I only need true or false, regardless the amount of text the image has. I've been following the steps here but, as all the links i managed to found, it eventually creates the bounding boxes.

I have two questions:

  1. Is there any text detection mecanism that i can use to detect text without all the overhead of the bounding box process?
  2. OpenCV uses a neural net to detect text which is an exteneral .PB file; i need to load it to use the nn. Is there any way to embed this file within the .py file? This would avoid having two files. The idea behind this is to be able to import the .py file and use it as a library, disregarding the .pb file (which is the model that detects text).

Thank you!


Solution

  • Is there any text detection mecanism that i can use to detect text without all the overhead of the bounding box process?

    The bounding boxes are the result of doing all the detection processing, and as such represent an intrinsic part of the process. If you don't care where the text is, you are free to ignore the resulting bounding boxes in your own code. But in order to detect whether there is text in the image, the algorithm (of whatever type) has to detect where the text is.

    The DNN method used in the linked article may be overkill if you don't care about the results. You could always try some other text detection algorithms and try to profile them to find a less computationally expensive one for your application. There will always be tradeoffs.

    OpenCV uses a neural net to detect text which is an exteneral .PB file; i need to load it to use the nn. Is there any way to embed this file within the .py file? This would avoid having two files. The idea behind this is to be able to import the .py file and use it as a library, disregarding the .pb file (which is the model that detects text).

    Yes, you could embed the contents of the model .pb file directly into your Python code as a buffer object, and then use the alternate model loading mechanism to read the model from a buffer:

    retval = cv.dnn.readNetFromTensorflow(bufferModel[, bufferConfig])
    

    You could use the Unix hexdump command to convert the binary file into a hex sequence:

    hexdump -e '"    " 8/1 "0x%02x, " "\n"' your_training.pb
    

    which produces output like this:

    0x0a, 0x35, 0x0a, 0x0a, 0x62, 0x61, 0x74, 0x63,
    0x68, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x12, 0x0b,
    

    and then paste this into your source file, wrapped with:

    bufferModel = bytearray([
        0x0a, 0x35, 0x0a, 0x0a, 0x62, 0x61, 0x74, 0x63,
        0x68, 0x5f, 0x73, 0x69, 0x7a, 0x65, 0x12, 0x0b,
        # ...
    ])
    

    which you can then pass to OpenCV:

    retval = cv.dnn.readNetFromTensorflow(bufferModel)