I'm a beginner to computer vision and OpenCV, but I do have moderate experience with Python. I am trying to write a program that takes an image and divides the image into tiles based on the structural organization of the text. For example, given a menu like follow,
I want to use computer vision to identify the table formatting of the texts and divide it into tiles like follow
As of now, my purpose isn't to extract the text using OCR. All I need to do is identify the (hidden) table structure in the image and divide it into individual cells, and extract them as sub-images. Any approaches I can use?
Sorry I am really new to computer vision. Feel free to let me know if any other libraries from OpenCV are needed.
I see you have mentioned that you do not want OCR. However, let me still go forward and post this solution here with EasyOCR.
import easyocr
import cv2 as cv
import numpy as np
import os
path = "menu.jpg"
assert os.path.exists(path)
#always a good idea to convert BGR to RGB when using OCR
img = cv.imread(path)
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
viz_img = np.copy(img)
#read the text
reader = easyocr.Reader(['en'])
text_data = reader.readtext(img, paragraph=True, x_ths=0.5) #in order ([box-coords], text, confidence)
print(text_data)
#visualize
for data in text_data:
# box, text
box, text = data
top_left, top_right, bottom_right, bottom_left = box
tl = [int(x) for x in top_left]
br = [int(x) for x in bottom_right]
cv.rectangle(viz_img, tl, br, (0, 255, 0), 4)
cv.putText(viz_img, text, br, cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
cv.imwrite('viz_with_text.jpg', viz_img)
The documentation of EasyOCR is here.
Let me explain what I did.
For controlling the extent of merging boxes to form paragraph you need to play with the parameters x_ths for merging horizontally and y_ths for merging vertically.
Additional Information: If you see your text not being detected properly which can affect the output of the code you have to play with the parameters text_threshold, low_text and link_threshold.
Please refer to the EasyOCR documentation I have linked above for more details on the parameters.
The result on the image you have provided is as follows.