i would extract the text from an image in a string variable, so i use Tess4j to do this and it's work fine when i create a new project and test it:
public static void main(String[] args) throws TesseractException {
File image=new File("eurotext.tif");
Tesseract instance=Tesseract.getInstance();
String result = instance.doOCR(image);
System.out.println(result);
}
but when i try to integrate tess4j into myproject i get exception:
java.lang.IllegalStateException: Input not set!
at com.sun.media.imageioimpl.plugins.tiff.TIFFImageReader.getNumImages
(TIFFImageReader.java:28)
at net.sourceforge.vietocr.ImageIOHelper.getIIOImageList(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
is there another method to extract the text without using ocr?
i dont know why i get this error but it invoke TessBaseAPIGetUTF8Text
java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:470)
at com.sun.jna.Function.invoke(Function.java:404)
at com.sun.jna.Function.invoke(Function.java:315)
at com.sun.jna.Library$Handler.invoke(Library.java:212)
at sun.proxy.$Proxy7.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:336)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:232)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:173)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:158)
is someone has already used tess4j with tomcat?
You are having the "Invalid memory access" because you must define the datapath of the traineddata to use. If you don't specify the language it will assume it is eng.
For instance, it your project path is PROJECT and your trained data is at PROJECT/data/tessdata/eng.traineddata
tess.setDatapath("data");
tess.setLanguage("eng");