Search code examples
javamacoscrashtesseracttess4j

Tesseract / Tess4J crashes on Mac OS X: Problematic frame: C [libtesseract.dylib+0xcf72] tesseract::TessResultRenderer::~TessResultRenderer()+0x10


I run a simple program using Tesseract and the Java wrapper library Tess4J, on Mac OS X. Tried both JDK7 and JDK8.

The code does OCR on an image and creates a PDF out of it. The code works and does what it's supposed to do (the pdf gets created just fine). But at the end, I get a crash report on my Mac.

private static void testTesseract() throws Exception {
    File imageFile = new File("/Users/mln/Desktop/urkunde.jpg");
    ITesseract instance = new Tesseract();  // JNA Interface Mapping

    // http://tess4j.sourceforge.net/tutorial/

    instance.setDatapath("/Users/mln/Desktop/tessdata");
    instance.setLanguage("deu");

    try {
        String result = instance.doOCR(imageFile);
        System.out.println(result);
    } catch (TesseractException e) {
        System.err.println(e.getMessage());
    }

    List<ITesseract.RenderedFormat> list = new ArrayList<ITesseract.RenderedFormat>();
    list.add(ITesseract.RenderedFormat.PDF);
    File pdfFile = new File("/Users/mln/Desktop/urkunde.jpg");
    instance.createDocuments(pdfFile.getAbsolutePath(), "/Users/mln/Desktop/urkunde", list);

}

The line causing the crash is this last one:

instance.createDocuments(pdfFile.getAbsolutePath(), "/Users/mln/Desktop/urkunde", list);

Console output:

Warning in pixReadMemJpeg: work-around: writing to a temp file
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000001295c9f72, pid=6336, tid=5891
#
# JRE version: Java(TM) SE Runtime Environment (8.0_31-b13) (build 1.8.0_31-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.31-b07 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.dylib+0xcf72]  tesseract::TessResultRenderer::~TessResultRenderer()+0x10
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/mln/Projects/jackrabbit-client/hs_err_pid6336.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

and the crash report:

Process:               java [6336]
Path:                  /Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/bin/java
Identifier:            net.java.openjdk.cmd
Version:               1.0 (1.0)
Code Type:             X86-64 (Native)
Parent Process:        idea [81650]
Responsible:           java [6336]
User ID:               501

Date/Time:             2016-10-28 11:09:35.377 +0200
OS Version:            Mac OS X 10.11.6 (15G1004)
Report Version:        11
Anonymous UUID:        6CF2EEC0-C9B5-315F-EB2E-5AEBDF0094FD

Sleep/Wake UUID:       F9F2D823-9374-4EC4-B8FD-9342826E1A37

Time Awake Since Boot: 600000 seconds
Time Since Wake:       10000 seconds

System Integrity Protection: enabled

Crashed Thread:        4

Exception Type:        EXC_BAD_ACCESS (SIGABRT)
Exception Codes:       EXC_I386_GPFLT
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called

Complete output on pastebin: http://pastebin.com/v9gPd4hk


Solution

  • I haven't tested it myself, but it looks like createDocuments calls init() and dispose() and so does doOCR(). You may want to try overriding these methods to only call each one time. Kind of a shot in the dark, but it seems reasonable.

    @Override
    public void createDocuments(String[] filenames, String[] outputbases, List<RenderedFormat> formats) throws TesseractException {
        if (filenames.length != outputbases.length) {
            throw new RuntimeException("The two arrays must match in length.");
        }
    
        init();
        setTessVariables();
    
        try {
            for (int i = 0; i < filenames.length; i++) {
                File workingTiffFile = null;
                try {
                    String filename = filenames[i];
    
                    // if PDF, convert to multi-page TIFF
                    if (filename.toLowerCase().endsWith(".pdf")) {
                        workingTiffFile = PdfUtilities.convertPdf2Tiff(new File(filename));
                        filename = workingTiffFile.getPath();
                    }
    
                    TessResultRenderer renderer = createRenderers(outputbases[i], formats);
                    createDocuments(filename, renderer);
                    TessDeleteResultRenderer(renderer);
                } catch (Exception e) {
                    // skip the problematic image file
                    logger.error(e.getMessage(), e);
                } finally {
                    if (workingTiffFile != null && workingTiffFile.exists()) {
                        workingTiffFile.delete();
                    }
                }
            }
        } finally {
            dispose();
        }
    }