Search code examples
pythonherokuocrfastapitesseract

Heroku Deployment: ocrmypdf.exceptions.MissingDependencyError: tesseract


I'm trying to deploy a FastAPI application to Heroku that uses the ocrmypdf package for OCR (Optical Character Recognition). The application works fine locally, but on Heroku, I get a missing dependency error for tesseract.

Here are the relevant logs:

    2023-09-28T04:57:02.190892+00:00 heroku[web.1]: State changed from starting to up
2023-09-28T04:57:04.351961+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [10] [INFO] Started server process [10]
2023-09-28T04:57:04.352002+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [10] [INFO] Waiting for application startup.
2023-09-28T04:57:04.352226+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [10] [INFO] Application startup complete.
2023-09-28T04:57:04.352573+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [8] [INFO] Started server process [8]
2023-09-28T04:57:04.352646+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [8] [INFO] Waiting for application startup.
2023-09-28T04:57:04.352835+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [8] [INFO] Application startup complete.
2023-09-28T04:57:04.353501+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [9] [INFO] Started server process [9]
2023-09-28T04:57:04.353548+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [9] [INFO] Waiting for application startup.
2023-09-28T04:57:04.353743+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [9] [INFO] Application startup complete.
2023-09-28T04:57:04.353866+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [7] [INFO] Started server process [7]
2023-09-28T04:57:04.353923+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [7] [INFO] Waiting for application startup.
2023-09-28T04:57:04.354135+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [7] [INFO] Application startup complete.
2023-09-28T04:57:04.420648+00:00 app[web.1]: 102.38.199.5:0 - "POST /upload/ HTTP/1.1" 500
2023-09-28T04:57:04.425146+00:00 app[web.1]: [2023-09-28 04:57:04 +0000] [10] [ERROR] Exception in ASGI application
2023-09-28T04:57:04.425147+00:00 app[web.1]: Traceback (most recent call last):
2023-09-28T04:57:04.425147+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
2023-09-28T04:57:04.425148+00:00 app[web.1]: result = await app(  # type: ignore[func-returns-value]
2023-09-28T04:57:04.425148+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425148+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
2023-09-28T04:57:04.425149+00:00 app[web.1]: return await self.app(scope, receive, send)
2023-09-28T04:57:04.425149+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425168+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/fastapi/applications.py", line 292, in __call__
2023-09-28T04:57:04.425169+00:00 app[web.1]: await super().__call__(scope, receive, send)
2023-09-28T04:57:04.425169+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
2023-09-28T04:57:04.425169+00:00 app[web.1]: await self.middleware_stack(scope, receive, send)
2023-09-28T04:57:04.425169+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
2023-09-28T04:57:04.425170+00:00 app[web.1]: raise exc
2023-09-28T04:57:04.425170+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
2023-09-28T04:57:04.425170+00:00 app[web.1]: await self.app(scope, receive, _send)
2023-09-28T04:57:04.425171+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/middleware/cors.py", line 91, in __call__
2023-09-28T04:57:04.425171+00:00 app[web.1]: await self.simple_response(scope, receive, send, request_headers=headers)
2023-09-28T04:57:04.425172+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/middleware/cors.py", line 146, in simple_response
2023-09-28T04:57:04.425172+00:00 app[web.1]: await self.app(scope, receive, send)
2023-09-28T04:57:04.425172+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
2023-09-28T04:57:04.425172+00:00 app[web.1]: raise exc
2023-09-28T04:57:04.425172+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
2023-09-28T04:57:04.425173+00:00 app[web.1]: await self.app(scope, receive, sender)
2023-09-28T04:57:04.425173+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
2023-09-28T04:57:04.425173+00:00 app[web.1]: raise e
2023-09-28T04:57:04.425173+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
2023-09-28T04:57:04.425173+00:00 app[web.1]: await self.app(scope, receive, send)
2023-09-28T04:57:04.425173+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
2023-09-28T04:57:04.425174+00:00 app[web.1]: await route.handle(scope, receive, send)
2023-09-28T04:57:04.425174+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
2023-09-28T04:57:04.425174+00:00 app[web.1]: await self.app(scope, receive, send)
2023-09-28T04:57:04.425174+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
2023-09-28T04:57:04.425174+00:00 app[web.1]: response = await func(request)
2023-09-28T04:57:04.425175+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425175+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/fastapi/routing.py", line 273, in app
2023-09-28T04:57:04.425175+00:00 app[web.1]: raw_response = await run_endpoint_function(
2023-09-28T04:57:04.425175+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425176+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
2023-09-28T04:57:04.425176+00:00 app[web.1]: return await dependant.call(**values)
2023-09-28T04:57:04.425176+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425176+00:00 app[web.1]: File "/app/app/main.py", line 109, in upload_files
2023-09-28T04:57:04.425176+00:00 app[web.1]: ocrmypdf.ocr(temp_pdf_path, output_pdf_path, deskew=True, force_ocr=True)
2023-09-28T04:57:04.425176+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/ocrmypdf/api.py", line 352, in ocr
2023-09-28T04:57:04.425177+00:00 app[web.1]: check_options(options, plugin_manager)
2023-09-28T04:57:04.425177+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 245, in check_options
2023-09-28T04:57:04.425177+00:00 app[web.1]: _check_plugin_options(options, plugin_manager)
2023-09-28T04:57:04.425177+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 238, in _check_plugin_options
2023-09-28T04:57:04.425177+00:00 app[web.1]: plugin_manager.hook.check_options(options=options)
2023-09-28T04:57:04.425177+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/pluggy/_hooks.py", line 493, in __call__
2023-09-28T04:57:04.425177+00:00 app[web.1]: return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
2023-09-28T04:57:04.425178+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425178+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/pluggy/_manager.py", line 115, in _hookexec
2023-09-28T04:57:04.425178+00:00 app[web.1]: return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
2023-09-28T04:57:04.425178+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425180+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/pluggy/_callers.py", line 113, in _multicall
2023-09-28T04:57:04.425180+00:00 app[web.1]: raise exception.with_traceback(exception.__traceback__)
2023-09-28T04:57:04.425181+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/pluggy/_callers.py", line 77, in _multicall
2023-09-28T04:57:04.425181+00:00 app[web.1]: res = hook_impl.function(*args)
2023-09-28T04:57:04.425182+00:00 app[web.1]: ^^^^^^^^^^^^^^^^^^^^^^^^^
2023-09-28T04:57:04.425182+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/ocrmypdf/builtin_plugins/tesseract_ocr.py", line 139, in check_options
2023-09-28T04:57:04.425182+00:00 app[web.1]: check_external_program(
2023-09-28T04:57:04.425183+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.11/site-packages/ocrmypdf/subprocess/__init__.py", line 340, in check_external_program
2023-09-28T04:57:04.425183+00:00 app[web.1]: raise MissingDependencyError(program)
2023-09-28T04:57:04.425183+00:00 app[web.1]: ocrmypdf.exceptions.MissingDependencyError: tesseract
2023-09-28T04:57:04.425738+00:00 heroku[router]: at=info method=POST path="/upload/" host=legal-tools-backend-036eb0ac010e.herokuapp.com request_id=5c6e9753-9172-4962-9196-fec0d86d0205 fwd="102.38.199.5" dyno=web.1 connect=0ms service=543ms status=500 bytes=193 protocol=https

I've already tried:

  • Added the Tesseract buildpack to my Heroku app.
  • Included an Aptfile with tesseract-ocr listed.
  • doing this in Procfile: web: TESSDATA_PREFIX=./.apt/usr/share/tesseract-ocr/4.00/tessdata gunicorn -w 4 -k uvicorn.workers.UvicornWorker app.main:app

Also tried setting the path Heroku gave me like in bash this:

    ocrmypdf.ocr(temp_pdf_path, output_pdf_path, deskew=True, force_ocr=True, tesseract_config={'tesseract_path': '/app/vendor/tesseract-ocr/bin/tesseract'})

Any ideas? It's driving me nuts.


Solution

  • Not sure how many of the things tried are necessary but when I used this buildpack instead (together with the other things which I haven't removed), it worked: https://github.com/ElHappy/heroku-buildpack-tesseract