Search code examples
pythonanacondaspydertesseractpython-tesseract

Problem with printing to console with pytessaract in spyder


I am currently using spyder via anaconda with python 3.8.5 on windows 10 and when I run this code:

import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\\Program Files\\Tesseract-OCR\\tesseract.exe"

img_path ='img/gotta-go-fast.jpg'

img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

result = pytesseract.image_to_string(img, lang='eng')
print(result)

The IPython console in spyder will automaticlly be cleared, but the same will not happen if I only write to a text document like this:

result = pytesseract.image_to_string(img, lang='eng')

with open('text_result.txt', mode ='w') as file:
    file.write(result)

Is there anyway to fix this?


Solution

  • Found what was happening. When pytessaract read the text a \x0c command were added in a new line after the last read line. Solved the issue by removing the command like this:

    result = pytesseract.image_to_string(img, lang='eng')
    arr = result.split('\n')[0:-1]
    result = '\n'.join(arr)