My context
I'm using tesseract to extract text from an image.
I'm generating a .tsv to retrieve the extracted text and perform some regex on it and a .pdf to have a searchable pdf.
The way I do it is by calling tesseract 2 times:
But I feel like this is not very efficient (the same computations must be made two times)
What I wish
I wish to make my computations go faster. And my idea is to call tesseract only once but specifying two output formats
Is it possible? If so how?
You can try the command:
tesseract yourimage.tif out pdf tsv