Search code examples
c#characterocrtesseractgenesis

How to get character wise confidence in genesis tesseract 4 using c#?


I have two problems.

  1. I would like to get character wise confidence values. At the moment I am just getting the meanConfidence for each word. Let`s say "Hello" - meanConfidence: 90. I want it like that:
  • "H" - confidence: 90
  • "e" - confidence: 94
  • ...
  1. At the moment I´m getting the ocrText and the segment rectangles seperated. I need these informations together. Let`s say:
  • 100 100 100 100 "H"
  • 110 100 110 100 "e"
  • ...
...
private TesseractEngine tesseract = new TesseractEngine(path, "eng", EngineMode.LstmOnly);
....
using (var page = tesseract.Process(image, rec, PageSegMode.Auto))
{
    text= page.GetText(); // returns the ocr text of the whole rectangle
    confidence = page.GetMeanConfidence(); // returns the confidence for the whole word.
    List<System.Drawing.Rectangle> rectangles = page.GetSegmentedRegions(PageIteratorLevel.Symbol); //returns each character reactangle of the word.
}

Thanks for your help! :)


Solution

  • You'd need to obtain ResultIterator object (via page.GetIterator() method) and then operate on it at PageIteratorLevel.Symbol level. Check PageSerializer class for example.