Search code examples
pythontesseract

How to find specific text & print the next 2 words after it


My code is below.

I currently have an if statement that finds a specific word, in this case 'INGREDIENTS'.

Next, Instead of print("true") I need to print next 2 words/strings from 'INGREDIENTS'. This word/string appears once in the image ('INGREDIENTS').

As an example, I run the .py file and this is my output if I include this in my script: print(text)

Ground Almonds

INGREDIENTS: Ground Almonds(100%).

1kg

I just need to re-code this section:

if 'INGREDIENTS' in text:
 print("True")
else:
 print("False")

so the output is like this:

INGREDIENTS: Ground Almonds

Becasue next two words/strings are Ground and Almonds

Python Code

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\gzi\AppData\Roaming\Python\Python37\site-packages\tesseract.exe'

img=Image.open('C:/Users/gzi/Desktop/work/lux.jpg')

text = pytesseract.image_to_string(img, lang = 'eng')


if 'INGREDIENTS' in text:
 print("True")
else:
 print("False")

Solution

  • If you don't care about the percentage and want to avoid regex:

    string = 'INGREDIENTS: Ground Almonds(100%).'
    
    tokens = string.split()
    for n,i in enumerate(tokens):
        if 'INGREDIENTS' in i:
            print(' '.join(tokens[n:n+3]))
    

    Output:

    INGREDIENTS: Ground Almonds(100%).