I have an app deployed on Heroku that uses Pytesseract. To use Tesseract on the server, I had to install Tesseract through Aptfile
Aptfile
tesseract-ocr
After checking on the Heroku Bash I see that the installed version of Tesseract is 4.0.0. This version has some minor bugs that affects my app (it doesn't filter characters well, for example, as newer versions do). How can I install a specific version of Tesseract-OCR on the server?
How can I install a specific version of Tesseract-OCR on the server?
Put the version after the package name.
From the Ubuntu Manpage for apt-get:
A specific version of a package can be selected for installation by following the package name with an equals and the version of the package to select
From the heroku-buildpack-apt README:
To find out what packages are available, see: https://packages.ubuntu.com
If you use the Heroku-20 stack (current default stack), you should search for the packages for Ubuntu 20.04 because it is the base technology. From the Heroku Stacks article:
Stack Version Base Technology Available since Supported through Status Heroku-20 Ubuntu 20.04 2020 April 2025 Default
In the Ubuntu packages for tesseract-ocr for 20.04 the current package version is 4.1.1-2build2
:
Package: tesseract-ocr (4.1.1-2build2) [universe]
In that case, the Aptfile line could be:
tesseract-ocr=4.1.1-2build2
That is how you install a specific version.
In your case, I guess you are using Heroku-18 because 4.00~git2288-10f4998a-2
is the version of tesseract-ocr
for Ubuntu 18.04 according to Ubuntu packages and trying to install a higher version will probably fail because it is not available. If that is the case, I would recommend to use Heroku-20, which should use a more recent version of that package by default.