I am trying to train tesseract. The process involves creating triples of files: box files, text files and image (tif) files.
The tool that creates the .box files sometimes creates empty files. Those empty files cause problems for the engine. So, I want to delete the empty box files as well as their partners.
The whole pattern looks like the following
File2.box is an empty file (has zero size). I want to find and delete it as well as its partners (duplicates) such as File2.gt.txt and File2.tif.
Is this doable?
check this simple script,I used the find
command to search for all empty .box
files (-type f -name "*.box" -size 0
) and then I deletes the empty .box
files using the -delete
flag, at the end it removes the corresponding .gt.txt
and .tif
files by executing the rm
command within the -exec
flag :
#!/bin/bash
#specifing the directory where the files are located
directory="/path/to/files"
#changing to the specified directory
cd "$directory" || exit
#find and delete empty .box files along with their partners
find . -type f -name "*.box" -size 0 -delete -exec sh -c 'rm -f "${1%.box}.gt.txt" "${1%.box}.tif"' sh {} \;