Detecting Bangla characters using pytesseract

I am trying to detect Bangla characters from images of Bangla number plates using Python, so I decided to use pytesseract. For this purpose I have used below code:

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
text = pytesseract.image_to_string(Image.open('input.png'),lang="ben")
print(text)

The problem is when I am printing, it is showing as empty output.

When I tried to freeze it in a text, it is showing like:

Example Picture: (Link)

Expected Output (should be something like or should be somewhat relatable like):

ঢাকা মেট্রো হ

৪৫ ২৩০৭

P.S: I have downloaded Bengali language data while installing Tesseract-OCR-64 and I am trying to run it in VS Code.

Can anyone help me to solve this problem or give me an idea of how to solve this problem?

Solution

The solution to this problem is:

You need to segment all the characters (you can take any approach if you want, can be deep learning or image processing) and feed the PyTesseract only the character. (only clear photos)

Reason: It can detect the Bangla language from pictures of clear and considerably acceptable resolution. It might have considerably fewer models trained for this language for pictures of small size. (which is quite understandable)

Code:

### any deep learning approach or any image processing approach here

# load the segmented character

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
character = pytesseract.image_to_string(Image.open('char.png'),lang="ben")
print(character)