(Examples are changed but the idea is the same)
I'm trying find a SRD Model No. on a product label from a live camera feed.
Here's a label example:
The conditions are:
So the question is, is there a way to find a substring of a SRD Model No. in a string generated from OCR, other then hard coding all possible variations?
Here is an example script following @Angus Comber's suggestion:
import pytesseract
import numpy as np
import cv2
from cv2 import imread, cvtColor, COLOR_BGR2HSV as HSV, inRange, getStructuringElement, resize
from pytesseract import image_to_data, Output
def extract_SRD(filename):
img = cv2.imread(filename)
img_copy = img.copy()
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_blur = cv2.GaussianBlur(img_gray, (3,3),0)
mydata = pytesseract.image_to_data(img_blur, output_type=Output.DICT, config='--psm 6')
SRD = mydata['text'][mydata['text'].index('SRD')+2]
return SRD
filename = 'wm3tG.png'
SRD = extract_SRD(filename)
print(SRD)
This snippet returns: 5427G2
The important line here is SRD = mydata['text'][mydata['text'].index('SRD')+2]
. This is where you define the logic used to retrieve the SRD code. In this example, I simply query the second string of characters after SRD, thus skipping the word "Model".
I would suggest fine-tuning this example to check whether a specific value in the output dictionary contains "SRD". Then you may simply look for the next string of characters: