Search code examples
c#ocrtesseract

C# - Tesseract OCR: scan multiple language at once


Any idea about how to do it?

TesseractEngine engine = new TesseractEngine("./tessdata", "eng", EngineMode.Default);

Usually, for one language, just adding the abbreviation is enough. But how if I want to scan an image with multiple languages in it? Btw, I use the package by Charles Weld. Thanks.


Solution

  • According to here, the + syntax is supported, so you just need to add a + sign like the following:

    TesseractEngine engine = new TesseractEngine("./tessdata", "jpn+eng", EngineMode.Default); // jpn+eng for Japanese and English
    

    Also, according to here:

    The output can be different based on the order of languages, so -l eng+hin can give different result than -l hin+eng.

    From what I can see, the language you specify first has better accuracy.