Pytesseract / Recoginizing chars + digits + spaces

i would like to recognize some text (with digits and spaces) from a image using the following code:

erg = pytesseract.image_to_string(img)

Generally this works fine with that but i also get character i don´t want like Ô


() Preliminary Specification
(V) Final Specification
Module 18.5" Color TFT-LCD
Model Name (G18SHANOT.O
Customer Date ÔApproved by Date
Crystal Hsieh 2016/06/29
Approved by Propared by

So i tried to whitelist tesseract using the following code instead:

workString =f'-c tessedit\_char\_whitelist={string.digits}(){string.ascii\_letters}' 
erg = pytesseract.image\_to\_string(img, config=workString)

With that i get the following text - so it seems that Ô is not outputted - but unfortunately have no spaces anymore -


Module 185ColorTFTLCD
ModelName (G18SHANOTO
Customer Date Approvedby Date
CrstalHsieh 2016(06)29
Approvedby Proparedby

Is there any way to whitelist the characters and digits but also still output the spaces / blanks?


  • config = f"-c tessedit_char_whitelist='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.#-:/ '"

    Try this. I added a space within the inner quotes when I was having a similar issue, but this approach worked for me (space is the last character in the string). Feel free to add/remove any characters you want tesseract to include/exclude.