I'm using Python 2.7.3.
I have a function that runs tesseract
as a command line. Everything is working fine and now I would like to add a new parameter to the command -l rus
(signifying russian language). Eventhough this works on my commandline, it doesn't seem to work from Python.
Command line:
$ /usr/local/bin/tesseract /Users/anthony/Downloads/rus.png outfile -l rus && more outfile.txt
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Полу-Милорд, полу-купец,
Полу-мудрец, полу-невежда,
Полу-подлец, но есть надежда,
Что будет полным наконец.
Python function
def ocr(self,path):
path = "/Users/anthony/Downloads/rus.png"
process = subprocess.Popen(['/usr/local/bin/tesseract', path,'outfile','-l rus'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
out, err = process.communicate()
print err
print out
with open('outfile.txt', 'r') as handle:
contents = handle.read()
os.remove(temp.name + '.txt')
os.remove(temp.name)
return contents, out
the above returns "HOIIY nony HOIIY nony Hony no ecTb HHJICXQRI 6y11e" which suggests that the -l rus
flag is being ignored.
Question
How can I execute the following command as a python subprocess?
/usr/local/bin/tesseract /Users/anthony/Downloads/rus.png outfile -l rus
You need to split the '-l rus'
argument to two separate ones to make sure it's parsed correctly by the program:
process = subprocess.Popen(
['/usr/local/bin/tesseract', path, 'outfile', '-l', 'rus'],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT
)
It might be handy to use str.split()
or shlex.split()
for this:
cmd = '/usr/local/bin/tesseract /Users/anthony/Downloads/rus.png outfile -l rus'
process = subprocess.Popen(
cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.STDOUT
)