Search code examples
bashawkduplicatesfind

Find empty files and their duplicates (partners)


I am trying to train tesseract. The process involves creating triples of files: box files, text files and image (tif) files.

The tool that creates the .box files sometimes creates empty files. Those empty files cause problems for the engine. So, I want to delete the empty box files as well as their partners.

The whole pattern looks like the following

  • File1.box
  • File1.gt.txt
  • File1.tif
  • File2.box
  • File2.gt.txt
  • File2.tif

File2.box is an empty file (has zero size). I want to find and delete it as well as its partners (duplicates) such as File2.gt.txt and File2.tif.

Is this doable?


Solution

  • check this simple script,I used the find command to search for all empty .box files (-type f -name "*.box" -size 0) and then I deletes the empty .box files using the -delete flag, at the end it removes the corresponding .gt.txt and .tif files by executing the rm command within the -exec flag :

    #!/bin/bash
    
    #specifing the directory where the files are located
    directory="/path/to/files"
    
    #changing to the specified directory
    cd "$directory" || exit
    
    #find and delete empty .box files along with their partners
    find . -type f -name "*.box" -size 0 -delete -exec sh -c 'rm -f "${1%.box}.gt.txt" "${1%.box}.tif"' sh {} \;