Search code examples
python-3.xtesseracttelegram-bot

I have a problem with pytesseract on Ubuntu


I'm trying to make a telegram bot, one of the functions of which is text recognition from an image, everything works fine on Windows, but as soon as I switch to Linux, I immediately encounter the same kind of exceptions, at first I thought that I was incorrectly specifying the path pytesseract.pytesseract.tesseract_cmd (since the sites I visited wrote exactly this, but after carefully rechecking everything, I did not find any error) Here is my code:

from telebot import types
from googlesearch import search
from PIL import Image
import pytesseract
import cv2
import os
import numpy as np
import telebot
import config
 
bot = telebot.TeleBot(config.token)
@bot.message_handler(content_types= ["photo"])

def answer_to_photo(message):
    statuss = ['creator', 'administrator', 'member']
    user_status = str(bot.get_chat_member(chat_id='chat id', user_id=message.from_user.id).status)    
    if user_status in statuss:
        pytesseract.pytesseract.tesseract_cmd = r'/home/shalor1k/.local/bin/pytesseract'

        file_info = bot.get_file(message.photo[len(message.photo) - 1].file_id)
        downloaded_file = bot.download_file(file_info.file_path)
        src = r'C:\bot\photo' + message.photo[1].file_id
        with open(src, 'wb') as new_file:
            new_file.write(downloaded_file)
        bot.reply_to(message, 'Processing your photo')

        image = src

        preprocess = "thresh"

        image = cv2.imread(image)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        if preprocess == "tresh":
            gray = cv2.threshold(gray, 0, 255,
                cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

        elif preprocess == "blur":
            gray = cv2.median.Blur(gray, 3)

        filename = "{}.png".format(os.getpid())
        cv2.imwrite(filename, gray)

        text = pytesseract.image_to_string(Image.open(filename), lang = 'rus')
        os.remove(filename)
        os.remove(src)

The text of the exception:

File "main_bot_for_server.py", line 67, in answer_to_photo
    text = pytesseract.image_to_string(Image.open(filename), lang = 'rus')
  File "/home/shalor1k/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 370, in image_to_string
    return {
  File "/home/shalor1k/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 373, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "/home/shalor1k/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 282, in run_and_get_output
    run_tesseract(**kwargs)
  File "/home/shalor1k/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py", line 258, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (2, 'Usage: pytesseract [-l lang] input_file')

Solution

  • The first problem was that the binaries of the tesseract ocr itself were not installed. The second problem was that the required language packs were not installed