I am working a bit with tess4j tesseract in Java. It works well and it allows me to do what I need.
But I have come across an issue that I cannot solve without guidance or help.
Let us say, I have the following image:
This then provides me with the following output:
Column 1 Column 2 Column3
Row 1 Column 1 Rowt Column 3
Row 2 Column 1 Row 2 Column 2 Row 2 Column 3
Here is my code
String readFile(String inputFilePath){
Tesseract tesseract = new Tesseract();
tesseract.setDatapath(path);
tesseract.setLanguage("eng");
tesseract.setTessVariable("user_defined_dpi", "300");
String string = null;
try {
string = tesseract.doOCR(new File(inputFilePath));
} catch (TesseractException e) {
e.printStackTrace();
}
return string;
}
Is there a way in which I can achieve a result that mimics what is in the image? So I can differentiate between the columns.
You can preserve the spaces and then count them:
tesseract.setTessVariable("preserve_interword_spaces", "1");