Im trying to use Tess4J in Java EE (Payara server), is this possible and if so how?
Exact Exception I'm getting:
e = (net.sourceforge.tess4j.TesseractException) net.sourceforge.tess4j.TesseractException: java.lang.RuntimeException: Need to install JAI Image I/O package.
I have added the jai-imageio
to my pom.xml, as well as added it to the modules of Payara.
File pom.xml
<!-- -->
<version>3.4.1</version> <!-- used 3.4.2 as well -->
<!-- -->
<scope>runtime</scope> <!-- tried without this as well -->
Added JAR to
Tess4J code (If any improvements can be made to this as well it would be appreciated).
ITesseract instance = new Tesseract();
instance.setDatapath(pLangaugePath); // C:\\t
instance.setLanguage(pLanguage); // eng
try {
File[] tifFiles = PdfUtilities.convertPdf2Png(pFile);
if (tifFiles != null) {
for (File tifFile : tifFiles) {
String ocrText = instance.doOCR(tifFile);
if (StringUtils.isNotBlank(ocrText)) {
} catch (TesseractException e) {
LOG.error("Could not do ocr on image file created via pdf ", e);
Have tried the following 2 examples as well. 1.
try (PDDocument document = PDDocument.load(pFile)) {
int totalPages = document.getNumberOfPages();
PDFRenderer renderer = new PDFRenderer(document);
for (int pi = 0; pi < totalPages; pi++) {
BufferedImage image = renderer.renderImageWithDPI(pi, 75);
String ocrText = instance.doOCR(image);
if (StringUtils.isNotBlank(ocrText)) {
} catch (Exception e) {
LOG.error("Could not do ocr on pdf", e);
try {
ITesseract instance = new Tesseract();
instance.setDatapath(pLangaugePath); // C:\\t
instance.setLanguage(pLanguage); // eng
String ocrText = instance.doOCR(pFile);
if (StringUtils.isNotBlank(ocrText)) {
} catch (Exception e) {
LOG.error("Could not do ocr on image file created via pdf ", e);
Found this Didnt work / solution
as well as didnt work
Tess4J was known for not working with Glassfish due to the run-time exception caused by the unavailability of JNA RESOURCE_PREFIX
string constant. This issue has been fixed in the latest releases 3.4.9 (for Tesseract 3.05.01) and 4.0.2 (for Tesseract 4.0.0-beta.1). The library can now be used with Glassfish, and perhaps Payara Server.
You may also need to include ImageIO.scanForPlugins();
statement before the OCR call. That is meant to ensure the appropriate ImageReader
be available to read input images.