I want to get the text out of an image. I tried tesseract but I had issues installing it, so im wondering if I can get some help with that or another way to do it. When I try to use tesseract it says I have no module names PIL? But I know I have pillow installed and i thought that was in reference to it.
I've provided a Colab solution since that will probably be useful to the most people.
Install tesseract in our Colab environment.
!sudo apt install tesseract-ocr
!pip install PyTesseract
Import libraries and mount our Drive
from google.colab import drive
from google.colab.patches import cv2_imshow
drive.mount('/content/drive/')
Set our pytesseract path, read in source image from our Drive, show our source image, then finally convert the image to text.
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.terreract_cmd = {
r'/usr/bin/tesseract'
}
src = cv2.imread('/content/drive/MyDrive/Colab Notebooks/images/macbeth.png')
cv2_imshow(src)
output_txt = pytesseract.image_to_string(src)
print(type(output_txt))
print(output_txt)