I need your help. I am trying to get the emails on the image bellow as separate results and respectively their bounding box's. Somehow Tesseract OCR does not recognize them as separate lines and returns them as a single result.
Current Output - One block
Top: 182, Bottom: 512, Left: 533, Right: 852 -
BCF6CC517E7642BBB21AAF2068E54C28 - Test
D4852831D8CA439EB9D98B54629D1840 - Test
8DFFDO6FA3B44989B224DABDD9292B3E - Test
10E1D83F0D834000AF7BDSDEA48442E8 - Test
6FOA122825AA42159FDEESEBFFAC279B - Test
E719274DA1CE46ADASBDB659812ED684 - Test
ES18EE9D7D7B4AA3ABAT81523F748B24 - Test
?0304b4b-ba1d-4897-8ebe-20bcc3930201 - Test
2ebad2h1-c385-4d84-96c7-bc9082141e1c - Test
Desired output - multiple blocks per GUID
Top: 182, Bottom: 210, Left: 533, Right: 852 -
BCF6CC517E7642BBB21AAF2068E54C28 - Test
Top: 210, Bottom: 230, Left: 533, Right: 852 -
D4852831D8CA439EB9D98B54629D1840 - Test
Top: 230, Bottom: 250, Left: 533, Right: 852 -
8DFFDO6FA3B44989B224DABDD9292B3E - Test
...
I have tried most of the OcrEngineMode's and PageSegmentationMode's. Nothing worked out correctly. I also scaled the image from 96 DPI to 300 DPI. Did not help. Also I went through the documentation and couldn't find a solution.
I am using Tesseract 4
Thank you in advance for your time and help.
When iterating over the results, you should set your PageIteratorLevel to
RIL_TEXTLINE so that it can split the result paragraphs into separate lines.
var resultIterator = tessBaseAPI.GetIterator();
var pageIteratorLevel = PageIteratorLevel.RIL_TEXTLINE;