Search code examples
pythonimage-processingtelegramtelegram-bot

How to receive an image in telegram using python?


I'm trying to make a translator bot that would translate from images. I've already got image to text and translation figured out for working with a file, but I can't really make it work with an image sent to bot in Telegram.

After reading documentation I thought that this would work:

chat_id = message.chat.id
    photo = message.photo[-1].file_id
    
    new_file = await bot.get_file(photo)
    new_file.download_to_drive()

But await doesn't seem to work here. Other things I've tried were

updater = Updater(BOT_TOKEN, use_context = True)
dp = updater.dispatcher
dp.add_handler(MessageHandler(filters.PHOTO, handle_photo))

to try and filter messages with photos in them but dispatcher got removed in latest update of Telegram API so I don't really know how handle this now.

I'm not sure if this would help answer the question, but I use pytesseract for image to text in my code like this: text = pytesseract.image_to_string('img\\test2.png', config='-l jpn')

Whole code is as follows:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
import os
import telebot
import asyncio
from dotenv import load_dotenv, find_dotenv
import matplotlib.pyplot as plt
import requests
import io
from telegram.ext import *
from googletrans import Translator

load_dotenv(find_dotenv())

BOT_TOKEN = os.environ.get('BOT_TOKEN')

bot = telebot.TeleBot(BOT_TOKEN)
pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\tesseract\\tesseract.exe'

loop = asyncio.get_event_loop()

@bot.message_handler(commands=['start', 'hello'])
def send_welcome(message):
    bot.reply_to(message, "Welcome")

@bot.message_handler(commands=['translate'])
def translate1(message):
    text = "Send a picture you want to translate"
    sent_msg = bot.send_message(message.chat.id, text, parse_mode="Markdown")
    bot.register_next_step_handler(sent_msg, translate2)

def translate2(message):
    coro = translate(message)
    asyncio.run(coro)

async def download(photo):
    new_file = await bot.get_file(photo)
    await new_file.download_to_drive()

async def translate(message):
    
    chat_id = message.chat.id
    photo = message.photo[-1].file_id
    
    new_file = await bot.get_file(photo)
    new_file.download_to_drive()
    image = Image.open(io.BytesIO(requests.get().content))
    text = pytesseract.image_to_string(image, config='-l jpn')
    
    translator = Translator()
    translated_text = translator.translate(text, dest='uk' )
    
    bot.send_message(chat_id=chat_id, text=translated_text)

bot.infinity_polling()

Solution

  • I figured out how to do it myself, so I'll post the answer in case someone runs into the same problem.

    The whole code would be:

    import os
    from dotenv import load_dotenv, find_dotenv
    from PIL import Image
    import pytesseract
    from deep_translator import GoogleTranslator
    import re
    
    from typing import Final
    
    from telegram import Update
    from telegram.ext import Application, CommandHandler, MessageHandler, filters, ContextTypes
    
    pytesseract.pytesseract.tesseract_cmd = r'C:\\Program Files\\tesseract\\tesseract.exe'
    
    print('Starting up bot...')
    
    load_dotenv(find_dotenv())
    
    TOKEN: Final = os.environ.get('BOT_TOKEN')
    
    async def start_command(update: Update, context:ContextTypes.DEFAULT_TYPE):
        await update.message.reply_text("Hello!")
    
    async def error(update: Update, context: ContextTypes.DEFAULT_TYPE):
        print(f'Update {update} caused error {context.error}')
    
    async def downloader(update: Update, context: ContextTypes.DEFAULT_TYPE):
        # Download file
        new_file = await update.message.effective_attachment[-1].get_file()
        file = await new_file.download_to_drive()
        
        return file
    
    async def translate_msg(update: Update, context: ContextTypes.DEFAULT_TYPE):
        if (
                not update.message
                or not update.effective_chat
                or (
                    not update.message.photo
                    and not update.message.video
                    and not update.message.document
                    and not update.message.sticker
                    and not update.message.animation
                )
            ):
                return
        file = await downloader(update, context)
       
        if not file:
            await update.message.reply_text("Something went wrong, try again")
            return
        
        image = Image.open(file)
        text = pytesseract.image_to_string(image, config='-l jpn')
        
        new_text = re.sub(r"[\n\r]+", " ", text)
        
        translated_text = GoogleTranslator(source='ja', target='uk').translate(new_text)
        await update.message.reply_text(translated_text)
    
    
    if __name__ == '__main__':
        app = Application.builder().token(TOKEN).build()    
        app.add_handler(CommandHandler('start', start_command))    
        app.add_handler(MessageHandler(filters.PHOTO, translate_msg))    
        app.add_error_handler(error)
        app.run_polling(poll_interval=3)
    

    So the way to download a photo is:

    async def downloader(update: Update, context: ContextTypes.DEFAULT_TYPE):
        
        new_file = await update.message.effective_attachment[-1].get_file()
        file = await new_file.download_to_drive()
        
        return file
    

    Quoting official documentation:

    async download_to_drive(custom_path=None, *, read_timeout=None, write_timeout=None, connect_timeout=None, pool_timeout=None)

    Download this file. By default, the file is saved in the current working directory with file_path as f ile name. If the file has no filename, the file ID will be used as filename. If custom_path is supplied as a str or pathlib.Path, it will be saved to that path.

    Changed in version 20.0:

    • custom_path parameter now also accepts pathlib.Path as argument.

    • Returns pathlib.Path object in cases where previously a str was returned.

    • This method was previously called download. It was split into download_to_drive() and download_to_memory().

    It's also possible to use

    async download_to_memory(out, *, read_timeout=None, write_timeout=None, connect_timeout=None, pool_timeout=None)

    Download this file into memory. out needs to be supplied with a io.BufferedIOBase, the file contents will be saved to that object using the out.write method.

    Effective attachment is:

    property effective_attachment If this message is neither a plain text message nor a status update, this gives the attachment that this message was sent with. This may be one of

    • telegram.Audio

    • telegram.Dice

    • telegram.Contact

    • telegram.Document

    • telegram.Animation

    • telegram.Game

    • telegram.Invoice

    • telegram.Location

    • telegram.PassportData

    • List[telegram.PhotoSize]

    • telegram.Poll

    • telegram.Sticker

    • telegram.SuccessfulPayment

    • telegram.Venue

    • telegram.Video

    • telegram.VideoNote

    • telegram.Voice Otherwise None is returned.

    And get_file:

    async get_file(file_id, *, read_timeout=None, write_timeout=None, connect_timeout=None, pool_timeout=None, api_kwargs=None)

    Use this method to get basic info about a file and prepare it for downloading. For the moment, bots can download files of up to 20 MB in size. The file can then be e.g. downloaded with telegram.File. download_to_drive(). It is guaranteed that the link will be valid for at least 1 hour. When the link expires, a new one can be requested by calling get_file again.

    Apart from that, I had to make a list of handlers and initialize bot as Application.builder().token(TOKEN).build() instead of telebot.TeleBot(BOT_TOKEN). So now commands are passed as (if app = Application.builder().token(TOKEN).build())

    app.add_handler(CommandHandler('command_name_as_it_is_displayed_in_bot', function_that_should_be_called))
    

    And non-command messages are passed as (if app = Application.builder().token(TOKEN).build())

    app.add_handler(MessageHandler(filters.<filter here>, function_that_should_be_called))