Search code examples
bashautomationtransformationsimultaneous

How to automate file transformation with simultanous execution?


I am working on transforming a lot of image files (png) into text files. I have the basic code to do this one by one, which is really time consuming. My process involves converting the image files into a black and white format and then using tesseract to transform those into a text file. This process works great but it would take days for me to acomplisyh my task if done file by file. Here is my code:

for f in $1
do
 echo "Processing $f file..."
 convert $f -resample 200 -colorspace Gray ${f%.*}BW.png
 echo "OCR'ing $f"
 tesseract ${f.*}BW.png ${f%.*} -l tla -psm 6
 echo "Removing black and white for $f"
 rn ${f%.*}BW.png
done
echo "Done!"

Is there a way to perform this process to each file at the same time, that is, how would I be able to run this process simultaneously instead of one by one? My goal is to significantly reduce the amount of time it would take for me to transform these images into text files.

Thanks in advance.


Solution

  • I want to thank contributors @Songy and @shellter. To answer my question... I ended up using GNU Parallel in order to make these processes run in intervals of 5. Here is the code that I used:

    parallel -j 5 convert {} "-resample 200 -colorspace Gray" {.}BW.png ::: *.png ; parallel -j 5 tesseract {} {} -l tla -psm 6 ::: *BW.png ; rm *BW.png
    

    I am now in the process of splitting my dataset in order to run this command simultaneously with different subgroups of my (very large) pool of images.

    Cheers