Search code examples
javaeclipsetesseracttess4j

Java tesseract return co-ordinates of text location


I am using Java in eclipse and want to return the co-ordinates of all recognized text which is found. My code which I attained through tess4j currently outputs all of the text found, this code is below:

import java.awt.color.ColorSpace;
import java.awt.image.BufferedImage;
import java.awt.image.ColorConvertOp;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

import net.sourceforge.tess4j.*;



public class TesseractExample {
    
    public static void main(String[] args) throws IOException 
    {
        try 
        {           
               String x = System.getProperty("user.dir");
                File b = new File(x+"/inDCM");
            File imageFile = new File(b+"/surrey.png");
            BufferedImage img =  ImageIO.read(imageFile);
            Tesseract instance = Tesseract.getInstance(); 
            ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_GRAY);  
            ColorConvertOp op = new ColorConvertOp(cs, null);
            op.filter(img, img); 
            try 
            {   
                String result = instance.doOCR(img);
               
                System.out.println("The result is: " + result);
                
            }
            catch (TesseractException e) 
            {
                System.out.println("error:" + e);
            }
        }finally{
            
        }
    }}

Is it possible to retrieve the co-ordinates?


Solution

  • You can get the coordinates through ResultIterator object available in the low-level TessBaseAPI API. Code examples can be found in unit tests in the project's repo.