Search code examples
dockerdockerfiletesseract

Dockerfile for tesseract 4.0


I am trying to create a Dockerfile for tesseract-ocr version 4.0. Following are the contents of the Docker file.

FROM ubuntu:16.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y software-properties- 
  common && add-apt-repository -y ppa:alex-p/tesseract-ocr
RUN apt-get update && apt-get install -y tesseract-ocr

FROM python:3.6-alpine
ADD . /App
WORKDIR /App
COPY requirements.txt ./ 
COPY . . 
RUN pip install --no-cache-dir -r requirements.txt

I am able to build the Docker image, but when I spin a container and try to run a tesseract command, I get

"tesseract" not found


Solution

  • The solution was to upgrade to Ubuntu 18.04:

    FROM ubuntu:18.04
    RUN apt-get update \
        && apt-get install tesseract-ocr -y \
        python3 \
        #python-setuptools \
        python3-pip \
        && apt-get clean \
        && apt-get autoremove
    
    ADD . /home/App
    WORKDIR /home/App
    COPY requirements.txt ./
    COPY . .
    
    RUN pip3 install -r requirements.txt
    
    VOLUME ["/data"]
    EXPOSE 5000 5000
    CMD ["python3","OCRRun.py"]