Search code examples
pythonibm-cloudtesseractpython-tesseractibm-cloud-plugin

Python app with tesseract does not work on Bluemix


I have a python application which uses tesseract for detecting checkboxes in scanned images, works perfectly fine on my local machine, but when I push my code to Bluemix along with the python-tesseract buildpack it fails generating the output file which means the tesseract is not getting detected on Bluemix.

This is my manifest.yml:

applications:
- path: .
memory: 512M
instances: 1
domain: mybluemix.net
name: edge-noise-detector-bluemix
host: edge-noise-detector-bluemix
disk_quota: 1024M
buildpack: https://github.com/LeoKotschenreuther/python-tesseract-buildpack.git

This is my requirements.txt:

Flask
numpy
Pillow==4.1.1
pycparser
pyOpenSSL
pyparsing
pytesseract
python-dateutil
python-swiftclient
pytz
PyWavelets
scikit-image
scipy
requests
matplotlib==1.4.3
opencv-python
cf_deployment_tracker
tesseract

Here is the logs from Bluemix:

Traceback (most recent call last):
  File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/app/.heroku/python/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "server.py", line 217, in predict_square_checkboxes
    ImgOcr = image_hocr_class.ocr_hocr('temporary.png')
  File "/home/vcap/app/src/image_hocr_class.py", line 39, in __init__
    self.HTMLTree = xml.etree.ElementTree.parse(self.HOCRFileName).getroot()
  File "/app/.heroku/python/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
    tree.parse(source, parser)
  File "/app/.heroku/python/lib/python3.6/xml/etree/ElementTree.py", line 586, in parse
    source = open(source, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'images/8e297b93a39f1e08a490f72c8db53bf0.hocr'

This normally happens when pytesseract could not locate the path of tesseract. Not sure how to get this work on Bluemix. Does anyone got python with tesseract working on Bluemix? Please help.


Solution

  • IBM Cloud gives you a number of possibilities to run your applications. Cloud Foundry Runtimes is one of them, but in your situation doesn't seem a good fit. Whenever you have a dependency that you need to install you need to create your custom buildpack which can be a rather complex task. (https://docs.cloudfoundry.org/buildpacks/custom.html) Ever heard about Docker/Kubernetes? If you have a number of application dependency (like tesseract in your case) I would suggest you to create a Kubernetes environment to build your app! have a look at these resources: https://hub.docker.com/r/tesseractshadow/tesseract4re/ https://console.bluemix.net/docs/containers/container_index.html#container_index