During extract Integers from a image containing a 2D Matrix form Tesseract is unable to give the correct result and the result varies every time we execute the code can anybody please give some idea what is missing from the code below
img = cv2.imread(img_path)
rows = img.shape[0]
cols = img.shape[1]
#print rows , cols
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
#kernel = np.ones((5,5), np.uint64)
#img = cv2.dilate(img, kernel, iterations=1)
#img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise1.png", img)
# Apply threshold to get image with only black and white
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,225,95)
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C , cv2.THRESH_BINARY ,251,95)
#print cv2.getGaussianKernel(ksize=221,sigma=41)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres1.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres1.png"))
Input
Threshold value : adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C , cv2.THRESH_BINARY ,251,95
Output of threshold :
Output is in the form of 1 5 5 7 5 7 3 8 6 4 9 0 2 4 8 6 1 3 0 2 3 9 0 8 9 can be either in the row major form and column major form doesn't matter but we do need the given output saved into a variable
Try changing the threshold value from 251, 95 to something like 251, 40.
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C , cv2.THRESH_BINARY ,251,40)
It seems you already have perfect image which doesn't require any changes to extract string from image. Pytesseract's image_to_string doesn't work on my system so I used someone's made word by word OCR. This is definitely not the best solution in the world by hey if it works, it works. I have attached few files (see google drive links below)
Steps:
Please note:
TrainAndTest.py - https://drive.google.com/file/d/0B05aeuFExe2Aa3p3SWszN2xqU2c/view?usp=sharing
slice_image.py - https://drive.google.com/file/d/0B05aeuFExe2AN0t3UUlGZ3VjcW8/view?usp=sharing
training_chars.png - https://drive.google.com/file/d/0B05aeuFExe2ANjJNbzV5VTJyRTA/view?usp=sharing
classifications.txt - https://drive.google.com/file/d/0B05aeuFExe2AZU91bUpOblB3d2c/view?usp=sharing
flattened_images.txt - https://drive.google.com/file/d/0B05aeuFExe2AeXVnbXVXVTZ2RTQ/view?usp=sharing