Search code examples
pythonimageopencvletters

Improve the quality of the letters in a image


I'm working with images that have text. The problem is that these images are receipts, and after a lot of transformations, the text lost quality. I'm using python and opencv. I was trying with a lot of combinations of morphological transformations from the doc Morphological Transformations, but I don't get satisfactory results.

I'm doing this right now (I'll comment what I've tried, and just let uncommented what I'm using):

kernel = np.ones((2, 2), np.uint8)
# opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
# closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
# dilation = cv2.dilate(opening, kernel, iterations=1)
# kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(img, kernel, iterations=1)
# gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
#
img = erosion.copy()

With this, from this original image:

enter image description here

I get this:

enter image description here

It's a little bit better, as you can see. But it still too bad. The OCR (tesseract) doesn't recognize the characters here very well. I've trained, but as you can note, every "e" is different, and so on.

I get good results, but I think, if I resolve this problem, they would be even better.

Maybe I can do another thing, or use a better combination of the morphological transformations. If there is another tool (PIL, imagemagick, etc..) that I could use, I can use it.

Here's the whole image, so you can see how it looks:

enter image description here

As I said, it's not so bad, but a little be more "optimization" of the letters would be perfect.


Solution

  • After years working in this theme, I can tell now, that what I wanted to do take a big effort, it's quite slow, and NEVER worked as I expected. The irregularities of the pixels in the characters are always unpredictable, that's why "easy algorithms" don't work.

    Question: It's impossible then to have a decent OCR, which can read damaged characters?

    Answer: No, it's not impossible. But it takes "a bit" more than just using erosion, morphological closing or something like that.

    Then, how? Neural Networks :)

    Here are two amazing papers that help me a lot:

    Can we build language-independent OCR using LSTM networks?

    Reading Scene Text in Deep Convolutional Sequences

    And for those who aren't familiar with RNN, I can suggest this:

    Understanding LSTM Networks

    There's also a python library, which works pretty good (and unfortunately even better for C++):

    ocropy

    I really hope this can help someone.