I'm trying to make a translator bot that would translate from images. I've already got image to text and translation figured out for working with a file, but I can't really make it work with an image sent to bot in Telegram.
After reading documentation I thought that this would work:
chat_id = message.chat.id
photo = message.photo[-1].file_id
new_file = await bot.get_file(photo)
new_file.download_to_drive()
But await doesn't seem to work here. Other things I've tried were
updater = Updater(BOT_TOKEN, use_context = True)
dp = updater.dispatcher
dp.add_handler(MessageHandler(filters.PHOTO, handle_photo))
to try and filter messages with photos in them but dispatcher got removed in latest update of Telegram API so I don't really know how handle this now.
I'm not sure if this would help answer the question, but I use pytesseract for image to text in my code like this:
text = pytesseract.image_to_string('img\\test2.png', config='-l jpn')
Whole code is as follows:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
import os
import telebot
import asyncio
from dotenv import load_dotenv, find_dotenv
import matplotlib.pyplot as plt
import requests
import io
from telegram.ext import *
from googletrans import Translator
load_dotenv(find_dotenv())
BOT_TOKEN = os.environ.get('BOT_TOKEN')
bot = telebot.TeleBot(BOT_TOKEN)
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\tesseract\\tesseract.exe'
loop = asyncio.get_event_loop()
@bot.message_handler(commands=['start', 'hello'])
def send_welcome(message):
bot.reply_to(message, "Welcome")
@bot.message_handler(commands=['translate'])
def translate1(message):
text = "Send a picture you want to translate"
sent_msg = bot.send_message(message.chat.id, text, parse_mode="Markdown")
bot.register_next_step_handler(sent_msg, translate2)
def translate2(message):
coro = translate(message)
asyncio.run(coro)
async def download(photo):
new_file = await bot.get_file(photo)
await new_file.download_to_drive()
async def translate(message):
chat_id = message.chat.id
photo = message.photo[-1].file_id
new_file = await bot.get_file(photo)
new_file.download_to_drive()
image = Image.open(io.BytesIO(requests.get().content))
text = pytesseract.image_to_string(image, config='-l jpn')
translator = Translator()
translated_text = translator.translate(text, dest='uk' )
bot.send_message(chat_id=chat_id, text=translated_text)
bot.infinity_polling()
I figured out how to do it myself, so I'll post the answer in case someone runs into the same problem.
The whole code would be:
import os
from dotenv import load_dotenv, find_dotenv
from PIL import Image
import pytesseract
from deep_translator import GoogleTranslator
import re
from typing import Final
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, filters, ContextTypes
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\tesseract\\tesseract.exe'
print('Starting up bot...')
load_dotenv(find_dotenv())
TOKEN: Final = os.environ.get('BOT_TOKEN')
async def start_command(update: Update, context:ContextTypes.DEFAULT_TYPE):
await update.message.reply_text("Hello!")
async def error(update: Update, context: ContextTypes.DEFAULT_TYPE):
print(f'Update {update} caused error {context.error}')
async def downloader(update: Update, context: ContextTypes.DEFAULT_TYPE):
# Download file
new_file = await update.message.effective_attachment[-1].get_file()
file = await new_file.download_to_drive()
return file
async def translate_msg(update: Update, context: ContextTypes.DEFAULT_TYPE):
if (
not update.message
or not update.effective_chat
or (
not update.message.photo
and not update.message.video
and not update.message.document
and not update.message.sticker
and not update.message.animation
)
):
return
file = await downloader(update, context)
if not file:
await update.message.reply_text("Something went wrong, try again")
return
image = Image.open(file)
text = pytesseract.image_to_string(image, config='-l jpn')
new_text = re.sub(r"[\n\r]+", " ", text)
translated_text = GoogleTranslator(source='ja', target='uk').translate(new_text)
await update.message.reply_text(translated_text)
if __name__ == '__main__':
app = Application.builder().token(TOKEN).build()
app.add_handler(CommandHandler('start', start_command))
app.add_handler(MessageHandler(filters.PHOTO, translate_msg))
app.add_error_handler(error)
app.run_polling(poll_interval=3)
So the way to download a photo is:
async def downloader(update: Update, context: ContextTypes.DEFAULT_TYPE):
new_file = await update.message.effective_attachment[-1].get_file()
file = await new_file.download_to_drive()
return file
Quoting official documentation:
async download_to_drive(custom_path=None, *, read_timeout=None, write_timeout=None, connect_timeout=None, pool_timeout=None)
Download this file. By default, the file is saved in the current working directory with file_path as f ile name. If the file has no filename, the file ID will be used as filename. If custom_path is supplied as a str or pathlib.Path, it will be saved to that path.
Changed in version 20.0:
• custom_path parameter now also accepts pathlib.Path as argument.
• Returns pathlib.Path object in cases where previously a str was returned.
• This method was previously called download. It was split into download_to_drive() and download_to_memory().
It's also possible to use
async download_to_memory(out, *, read_timeout=None, write_timeout=None, connect_timeout=None, pool_timeout=None)
Download this file into memory. out needs to be supplied with a io.BufferedIOBase, the file contents will be saved to that object using the out.write method.
Effective attachment is:
property effective_attachment If this message is neither a plain text message nor a status update, this gives the attachment that this message was sent with. This may be one of
• telegram.Audio
• telegram.Dice
• telegram.Contact
• telegram.Document
• telegram.Animation
• telegram.Game
• telegram.Invoice
• telegram.Location
• telegram.PassportData
• List[telegram.PhotoSize]
• telegram.Poll
• telegram.Sticker
• telegram.SuccessfulPayment
• telegram.Venue
• telegram.Video
• telegram.VideoNote
• telegram.Voice Otherwise None is returned.
And get_file:
async get_file(file_id, *, read_timeout=None, write_timeout=None, connect_timeout=None, pool_timeout=None, api_kwargs=None)
Use this method to get basic info about a file and prepare it for downloading. For the moment, bots can download files of up to 20 MB in size. The file can then be e.g. downloaded with telegram.File. download_to_drive(). It is guaranteed that the link will be valid for at least 1 hour. When the link expires, a new one can be requested by calling get_file again.
Apart from that, I had to make a list of handlers and initialize bot as Application.builder().token(TOKEN).build()
instead of telebot.TeleBot(BOT_TOKEN)
.
So now commands are passed as (if app = Application.builder().token(TOKEN).build()
)
app.add_handler(CommandHandler('command_name_as_it_is_displayed_in_bot', function_that_should_be_called))
And non-command messages are passed as (if app = Application.builder().token(TOKEN).build()
)
app.add_handler(MessageHandler(filters.<filter here>, function_that_should_be_called))