Search code examples
pythondockerdocker-composepopplerpdf2image

pdf2image fails in docker container


I have a Python project running in a docker container, but I can't get convert_from_path to work (from pdf2image library). It works locally on my Windows PC, but not in the linux-based docker container.

The error I get each time is Unable to get page count. Is poppler installed and in PATH?

Relevant parts of my code look like this

from pdf2image import convert_from_path
import os
from sys import exit

def my_function(file_source_path):
    try:
        pages = convert_from_path(file_source_path, 600, poppler_path=os.environ.get('POPPLER_PATH'))
    except Exception as e:
        print('Fail 1')
        print(e)
    try:
        pages = convert_from_path(file_source_path, 600)
    except Exception as e:
        print('Fail 2')
        print(e)
    try:
        pages = convert_from_path(file_source_path, 600, poppler_path=r'\usr\local\bin')
    except Exception as e:
        print('Fail 3')
        print(e)
        print(os.environ)
        exit('Exiting script')

In attempt 1 I try to reference the original file saved on windows. Basically the path refers to '/code/poppler' which is a binded mount referring to

[snippet from docker-compose.yml]
- type: bind
  source: "C:/Program Files/poppler-0.68.0/bin"
  target: /code/poppler

In attempt 2 I just try to leave the path empty. In attempt 3 I tried something I found that worked from some other users locally.

Relevant parts of my Dockerfile look like this

FROM python:3.10

WORKDIR /code

# install poppler
RUN apt-get update
RUN apt-get install poppler-utils -y

COPY ./requirements.txt ./
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "./app.py"]

Solution

  • So the issue was that my Docker image was not refreshing correctly and after nuking the build-cache and trying again the middle option worked combined with the above Dockerfile.

    So a combination of RUN apt-get install poppler-utils -y in the Dockerfile + not referencing the path in the code pages = convert_from_path(file_source_path, 600) will work, as it will find the PATH automatically when installing poppler-utils.

    The binded mount can also be removed from docker-compose.yml and from the .env file.