I'm trying to install tesseract-ocr
in a Docker container based on the python:3.10
image. During the build process it looks like installation goes fine, but then I cannot find the files inside the container. If I then open up the container and install it from within the container it works.
Relevant parts of my Dockerfile looks like this
# debian based
FROM python:3.10
WORKDIR /code
RUN mkdir __logger
RUN apt-get update -y
RUN apt-get install apt-utils -y
# tesseract part, tried both apt & apt-get
RUN apt-get install tesseract-ocr -y
COPY ./requirements.txt ./
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "./app.py"]
Then I run the container with docker compose up
and go into the container with docker exec -t -i my_container_name /bin/bash
and finally try find / -type d -name "*tesseract*"
which yields no results.
If I run apt-cache search tesseract-ocr
I can see it is available in the list.
If I then run apt install tesseract-ocr
inside the container terminal, I can see the files are installed. And then if I run find / -type d -name "*tesseract*"
again, I can see that now tesseract was installed
root@06d4e841c6d2:/code# find / -type d -name "*tess*"
/usr/share/doc/tesseract-ocr-eng
/usr/share/doc/tesseract-ocr-osd
/usr/share/doc/tesseract-ocr
/usr/share/doc/libtesseract4
/usr/share/tesseract-ocr
/usr/share/tesseract-ocr/4.00/tessdata
/usr/share/tesseract-ocr/4.00/tessdata/tessconfigs
How can I make it work so that it is installed correctly during the build phase?
Here's a snippet of the logs towards the end of the build process for RUN apt-get install tesseract-ocr -y
#18 4.079 Preparing to unpack .../5-tesseract-ocr-osd_1%3a4.00~git30-7274cfa-1.1_all.deb ...
#18 4.086 Unpacking tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
#18 4.447 Selecting previously unselected package tesseract-ocr.
#18 4.451 Preparing to unpack .../6-tesseract-ocr_4.1.1-2.1_amd64.deb ...
#18 4.463 Unpacking tesseract-ocr (4.1.1-2.1) ...
#18 4.552 Setting up libarchive13:amd64 (3.4.3-2+deb11u1) ...
#18 4.574 Setting up tesseract-ocr-eng (1:4.00~git30-7274cfa-1.1) ...
#18 4.596 Setting up libgif7:amd64 (5.1.9-2) ...
#18 4.618 Setting up tesseract-ocr-osd (1:4.00~git30-7274cfa-1.1) ...
#18 4.640 Setting up liblept5:amd64 (1.79.0-1.1+deb11u1) ...
#18 4.665 Setting up libtesseract4:amd64 (4.1.1-2.1) ...
#18 4.688 Setting up tesseract-ocr (4.1.1-2.1) ...
#18 4.710 Processing triggers for libc-bin (2.31-13+deb11u6) ...
#18 DONE 4.8s
I'm unable to reproduce your problem. I created a docker image with this truncated Dockerfile
# debian based
FROM python:3.10
WORKDIR /code
RUN mkdir __logger
RUN apt-get update -y
RUN apt-get install apt-utils -y
# tesseract part, tried both apt & apt-get
RUN apt-get install tesseract-ocr -y
and then built the docker image like docker build --tag stackoverflow:test .
and then logged into a container and was able to find tesseract like
% docker run -it stackoverflow:test /bin/bash
root@2e2e3599c939:/code# find / -type d -name "*tess*"
/usr/share/doc/tesseract-ocr
/usr/share/doc/libtesseract4
/usr/share/doc/tesseract-ocr-osd
/usr/share/doc/tesseract-ocr-eng
/usr/share/tesseract-ocr
/usr/share/tesseract-ocr/4.00/tessdata
/usr/share/tesseract-ocr/4.00/tessdata/tessconfigs
So this problem is a bit of stumper. But here are a few things that you can try that might help...
--no-cache
argument to docker