I have a folder with a bit more than 50k images. Here is the code i have written.
public static File folder = new File("D:\\image\\");
public static File[] listofFiles = folder.listFiles();
private static int counter;
public static void main(String[] args) {
Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("C:\\Users\\zirpm\\Documents\\Coden\\Libaries\\Tess4J\\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
String text = tesseract.doOCR(new File("D:\\image\\"+listofFiles[i].getName()));
counter++;
System.out.println("Image Number: "+counter+" "+text);
}
}catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
}
Somehow it sometimes runs in to the following error:
Cannot convert RAW image to Pix with bpp = 64
Please call SetImage before attempting recognition.net.sourceforge.tess4j.TesseractException: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.krissemicolon.Main.main(Main.java:23)
Caused by: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
... 3 more
How could you just skip the images that causes that error and moves on to the next?
Just add another try-catch:
public static File folder = new File("D:\\image\\");
public static File[] listofFiles = folder.listFiles();
private static int counter;
public static void main(String[] args) {
Tesseract tesseract = new Tesseract();
try {
tesseract.setDatapath("C:\\Users\\zirpm\\Documents\\Coden\\Libaries\\Tess4J\\tessdata");
for (int i = 0; i < listofFiles.length; i++) {
try{
String text = tesseract.doOCR(new File("D:\\image\\"+listofFiles[i].getName()));
}catch(TesseractException e){
System.out.println("Skipping "+listOfFiles[i].getName());
}
counter++;
System.out.println("Image Number: "+counter+" "+text);
}
}catch (TesseractException e) {
e.printStackTrace();
System.out.println("TESSERACT ERROR");
}
If a TesseractException
occurs, it will inform you of the error and skip it.
You may also want to remove the outer try
-catch
-block.