Search code examples
shellduplicates

Delete files based on availability on another directories


I have two directory, Faces and Faces_v2, I want to delete every file in Faces directory that does exist in Faces_v2 directory. This is a output from fdupes.

383051 bytes each:
livecpk/Faces_v2/Asset/model/character/face/real/153809/sourceimages/#windx11/face_srm.ftex
livecpk/Faces/Asset/model/character/face/real/153809/sourceimages/#windx11/face_srm.ftex

6410 bytes each:
livecpk/Faces_v2/Asset/model/character/face/real/153809/sourceimages/#windx11/eye_occlusion_alp.ftex
livecpk/Faces/Asset/model/character/face/real/153809/sourceimages/#windx11/eye_occlusion_alp.ftex

327654 bytes each:
livecpk/Faces_v2/Asset/model/character/face/real/153809/sourceimages/#windx11/face_bsm_alp.ftex
livecpk/Faces/Asset/model/character/face/real/153809/sourceimages/#windx11/face_bsm_alp.ftex

452968 bytes each:
livecpk/Faces_v2/Asset/model/character/face/real/110651/sourceimages/#windx11/face_trm.ftex
livecpk/Faces/Asset/model/character/face/real/110651/sourceimages/#windx11/face_trm.ftex

640680 bytes each:
livecpk/Faces_v2/Asset/model/character/face/real/110651/sourceimages/#windx11/face_srm.ftex
livecpk/Faces/Asset/model/character/face/real/110651/sourceimages/#windx11/face_srm.ftex

849208 bytes each:
livecpk/Faces_v2/Asset/model/character/face/real/110651/sourceimages/#windx11/face_bsm_alp.ftex
livecpk/Faces/Asset/model/character/face/real/110651/sourceimages/#windx11/face_bsm_alp.ftex

Take above as an example, I want to delete.

  • livecpk/Faces/Asset/model/character/face/real/153809
  • livecpk/Faces/Asset/model/character/face/real/110651

Because the directory already exist in livecpk/Faces_v2.


So basically I accidentally paste some file that was meant for Faces_v2 directory into Faces, now I want to clear those duplicate data without bothering other non-duplicate files. How would I do that ?


Solution

  • The fdupes output can probably be manipulated fairly easily to extract just the files to be deleted. Some extra information is needed to do this safely.

    Complications to writing the code include:

    • Does fdupes guarantee to output files in a particular order?
    • Could there be file groups that contain livecpk/Faces_v2 but not livecpck/Faces (or vice versa) ? (ie. groups that should be ignored)
    • How should a directory in livecpk/Faces be handled if it contains files that are in livecpk/Faces_v2 but also others that are not?
    • Extra code will be needed to remove empty directories after removing files inside them.

    With fdupes.out of form exactly as shown, files could be deleted by just using grep to extract the relevant lines and piping to xargs rm.


    You could also start from scratch. One idea:

    top=$(pwd -P)
    dir2="$top/livecpk/Faces"
    dir2="$top/livecpk/Faces_v2"
    (
        if cd "$dir1"; then
    
            find . -type f                    \
                -exec test -f "$dir2"/{} \;   \
                -exec cmp -s "$dir2"/{} {} \; \
                -exec echo rm {} \;
    
            find . -depth -type d           \
                -exec test -d "$dir2"/{} \; \
                -exec echo rmdir {} \;
        fi
    )
    
    • for every file under dir1:
      • if a file with the same relative path exists under dir2; then
      • if they are bytewise identical, then
      • do something with the file
    • for every directory under dir1 (processed depth-first):
      • if a directory with the same relative path exists under dir2; then
      • do something with the directory
        • (attempting to delete a non-empty directory will fail)

    This spawns a lot of test processes but that is probably not important since the bit-comparisons are much more expensive.