Search code examples
pythonpython-2.7tesseract

How to add options to python subprocess


I'm using Python 2.7.3.

I have a function that runs tesseract as a command line. Everything is working fine and now I would like to add a new parameter to the command -l rus (signifying russian language). Eventhough this works on my commandline, it doesn't seem to work from Python.

Command line:

$ /usr/local/bin/tesseract /Users/anthony/Downloads/rus.png outfile -l rus && more outfile.txt
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Полу-Милорд, полу-купец,
Полу-мудрец, полу-невежда,
Полу-подлец, но есть надежда,

Что будет полным наконец.

Python function

  def ocr(self,path):
      path = "/Users/anthony/Downloads/rus.png"
      process = subprocess.Popen(['/usr/local/bin/tesseract', path,'outfile','-l rus'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
      out, err = process.communicate()
      print err
      print out
      with open('outfile.txt', 'r') as handle:
          contents = handle.read()
      os.remove(temp.name + '.txt')
      os.remove(temp.name)

      return contents, out

the above returns "HOIIY nony HOIIY nony Hony no ecTb HHJICXQRI 6y11e" which suggests that the -l rus flag is being ignored.

Question

How can I execute the following command as a python subprocess?

/usr/local/bin/tesseract /Users/anthony/Downloads/rus.png outfile -l rus

Solution

  • You need to split the '-l rus' argument to two separate ones to make sure it's parsed correctly by the program:

    process = subprocess.Popen(
        ['/usr/local/bin/tesseract', path, 'outfile', '-l', 'rus'],
        stdout=subprocess.PIPE, stderr=subprocess.STDOUT
    )
    

    It might be handy to use str.split() or shlex.split() for this:

    cmd = '/usr/local/bin/tesseract /Users/anthony/Downloads/rus.png outfile -l rus'
    
    process = subprocess.Popen(
        cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT
    )