python python-3.x opencv computer-vision ocr

text recognition and restructuring OCR opencv

Link to original image https://ibb.co/0VC6vkX

I am currently working with an OCR Project. I pre-processed the image, and then applied pre-trained EAST model for text detection.

import cv2
import numpy as np
from imutils.object_detection import non_max_suppression
import matplotlib.pyplot as plt
%matplotlib inline

img=cv2.imread('bw_image.jpg')
model=cv2.dnn.readNet('frozen_east_text_detection.pb')

#Prepare the Image
#use multiple of 32 to set the new image shape
height,width,colorch=img.shape
new_height=(height//32)*32
new_width=(width//32)*32
print(new_height,new_width)

h_ratio=height/new_height
w_ratio=width/new_width
print(h_ratio,w_ratio)

#blob from image helps us to prepare the image
blob=cv2.dnn.blobFromImage(img,1,(new_width,new_height),(123.68,116.78,103.94),True, False)
model.setInput(blob)

#this model outputs geometry and score maps
(geometry,scores)=model.forward(model.getUnconnectedOutLayersNames())

#once we have done geometry and score maps we have to do post processing to obtain the final text boxes
rectangles=[]
confidence_score=[]
for i in range(geometry.shape[2]):
    for j in range(0,geometry.shape[3]):
    
        if scores[0][0][i][j]<0.1:
            continue

        bottom_x=int(j*4 + geometry[0][1][i][j])
        bottom_y=int(i*4 + geometry[0][2][i][j])

        top_x=int(j*4 - geometry[0][3][i][j])
        top_y=int(i*4 - geometry[0][0][i][j])

        rectangles.append((top_x,top_y,bottom_x,bottom_y))
        confidence_score.append(float(scores[0][0][i][j]))

#use nms to get required triangles
final_boxes=non_max_suppression(np.array(rectangles),probs=confidence_score,overlapThresh=0.5)

#finally to display these text boxes let's iterate over them and convert them to the original shape 
#using the ratio we calculated earlier
img_copy=img.copy()

for (x1,y1,x2,y2) in final_boxes:
    
    x1=int(x1*w_ratio)
    y1=int(y1*h_ratio)
    x2=int(x2*w_ratio)
    y2=int(y2*h_ratio)
    
    #to draw the rectangles on the image use cv2.rectangle function
    cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)

This gives us the detected text as follows:

Now for text recognition I used pre-trained opencv CRNN model as follows:

# Download the CRNN model and Load it
model1 = cv2.dnn.readNet('D:/downloads/crnn.onnx')


# ## Prepare the image
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

blob = cv2.dnn.blobFromImage(img_gray, scalefactor=1/127.5, size=(100,32), mean=127.5)


# Pass the image to network and extract per-timestep scores
model1.setInput(blob)

scores = model1.forward()
print(scores.shape)

alphabet_set = "0123456789abcdefghijklmnopqrstuvwxyz"
blank = '-'

char_set = blank + alphabet_set


# Decode the scores to text
def most_likely(scores, char_set):
    text = ""
    for i in range(scores.shape[0]):
        c = np.argmax(scores[i][0])
        text += char_set[c]
    return text


def map_rule(text):
    char_list = []
    for i in range(len(text)):
        if i == 0:
            if text[i] != '-':
                char_list.append(text[i])
        else:
            if text[i] != '-' and (not (text[i] == text[i - 1])):
                char_list.append(text[i])
    return ''.join(char_list)


def best_path(scores, char_set):
    text = most_likely(scores, char_set)
    final_text = map_rule(text)
    return final_text


out = best_path(scores, char_set)
print(out)

But applying this model on the image gives the following output:

saetan

I really don't understand it. Can anyone guide what is the problem with text recognition. Is there a problem with pre-trained CRNN model? Moreover, I also want to restructure the text after it has been recognized, they way it is structured in the original image. We have the bounding box coordinates and recognised text after recognition problem is solved, so how can we restructure the text exactly? Any help will be appreciated.

Edit: I used pytesseract image_to_string() and image_to_data() functions but they don't give that good performance. Is there any other pre-trained text recognition model that I can use so that I can replicate the success of my EAST Text Detection model if this CRNN model is not fit enough. So that I can restructure my text accurately as it is in the image with the help of coordinates(bounding boxes) obtained through EAST Model.

Solution

Processing crops is really simple, just change a little your last loop:

import pytesseract
from PIL import Image

...
 
for x1,y1,x2,y2 in final_boxes:
        
    #to draw the rectangles on the image use cv2.rectangle function
    # cv2.rectangle(img_copy,(x1,y1),(x2,y2),(0,255,0),2)
    img_crop = Image.fromarray(img[y1-1: y2+1, x1-1:x2+1])
    text = pytesseract.image_to_string(img_crop, config='--psm 8').strip()
    cv2.putText(img_copy, text, (x1,y1), 0, .7, (0, 0, 255), 2 )