Search code examples
dockertesseractubuntu-20.04tess4jleptonica

Tesseract does not work within a Docker image with ubuntu 20.04 called from within a Java Spring App


I want to call tesseract from a Java Spring Boot application running on Ubuntu 20.04 in a Docker Image. Tesseract fails with the following log entry:

java.lang.UnsatisfiedLinkError: Error looking up function 'boxaSizeConsistency': /lib/x86_64-linux-gnu/liblept.so.5: undefined symbol: boxaSizeConsistency
at com.sun.jna.Function.<init>(Function.java:252) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:620) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:596) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:582) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at com.sun.jna.Native.register(Native.java:1904) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at com.sun.jna.Native.register(Native.java:1775) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at com.sun.jna.Native.register(Native.java:1493) ~[jna-5.13.0.jar!/:5.13.0 (b0)]
at net.sourceforge.lept4j.Leptonica1.<clinit>(Leptonica1.java:41) ~[lept4j-1.18.0.jar!/:na]
at net.sourceforge.lept4j.util.LeptUtils.convertImageToPix(LeptUtils.java:92) ~[lept4j-1.18.0.jar!/:na]
at net.sourceforge.tess4j.Tesseract.createDocuments(Tesseract.java:709) ~[tess4j-5.6.0.jar!/:5.6.0]

I added

<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>5.6.0</version>
</dependency>

to my Maven pom.xml and

RUN apt-get update \
    && apt-get install -qq \
    libleptonica-dev
    
RUN apt-get update \
    && apt-get install -qq \
    tesseract-ocr \
    tesseract-ocr-deu \
    libtesseract-dev

to my Dockerfile. I am able to build my jar and my docker image.

When I open a shell within the running docker image and call tesseract --version it outputs:

# tesseract --version
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

So it seemes to me, that tesseract and leptonica are installed in compatible versions. Also, liblept is available at the requested path:

# ldconfig -p | grep lept
        liblept.so.5 (libc6,x86-64) => /lib/x86_64-linux-gnu/liblept.so.5
        liblept.so (libc6,x86-64) => /lib/x86_64-linux-gnu/liblept.so

Solution

  • You need to use the Java library versions compatible with your native libraries. For example, for Leptonica 1.79, use lept4j-1.13.x version. You can specify the version in the POM.