Search code examples
linuxbashimageresolution

Find same file with same name but in a different directory structure


I have a master directory A with high resolution images (around 100Gb) in various subdirectories. I have a selection of those images (same file name) that are lower resolution in another directory B with different subdirectories (a few thousands files).

I would like to be able to get a copy the same directory B structure but replaced with the high resolution version. A proxy for resolution could be file size as there is only one match in directory A.


Solution

  • It is somewhat unclear from the question why you need to consider file sizes or resolutions in the script. I’m going to assume that (1) file names are unique across the entire (sub)directory structure under both A and B and (2) A always contains an equal or higher resolution of images, some of which have thumbnails (matched by file name) under B. An outline could look as follows:

    replace_files_by_name() {
      local -r dir_A="$1"  # full size ("source")
      local -r dir_B="$2"  # thumbnails ("index")
      local -r dir_C="$3"  # full size copy by index ("destination")
      local path
    
      # Create an index of file names and paths under $dir_A
      local -A path_index  # maps file names to paths under $dir_A
      while IFS= read -r path; do
        path_index["${path##*/}"]="$path"
      done < <(find "$dir_A" -type f)
    
      # Make a recursive copy of $dir_B called $dir_C.
      echo cp -a --reflink "$dir_B" "$dir_C"
      cp -a --reflink "$dir_B" "$dir_C"
    
      # Replace each file under $dir_C with its counterpart from $dir_A.
      find "$dir_C" -type f | while IFS= read -r path; do
        echo cp -a --reflink "${path_index["${path##*/}"]}" "$path"
        cp -a --reflink "${path_index["${path##*/}"]}" "$path"
      done
    }
    

    Side note 0: If you have an outdated file system, then you will have to drop the --reflink, at an immense performance and space cost. This is why it’s good to use a reasonably full-featured file system (at least CoW-capable (CoW == Copy on Write)). (Examples include Btrfs or ZFS.)

    Side note 1: My outline skips all error checking and needs to be adjusted accordingly. (For example, what should happen when a file from C (B) is not found under A?)

    Now let’s test the solution:

    set -eu
    mkdir -p ~/tmp/test
    cd ~/tmp/test
    
    # Create directories A and B and 5 different subdirectories in each.
    mkdir -p A/{1..5}/ B/{a..e}/
    
    # Place a file in each subdirectory.
    # A and B contain different subdirectory names but same file names.
    files=('one' 'two' 'three' 'four' 'five')
    for dir in A B; do
      subdirs=("${dir}/"*)
      ((${#subdirs[@]} == ${#files[@]}))
      for ((i = 0; i < ${#files[@]}; ++i)); do
        touch "${subdirs[i]}/${files[i]}"
      done
    done
    
    ##############################
    replace_files_by_name A B C ##
    ##############################
    
    rm -Rf ~/tmp/test  # cleanup
    

    This↑↑↑ will output (and also do) the following:

    cp -a --reflink B C
    cp -a --reflink A/1/one C/a/one
    cp -a --reflink A/2/two C/b/two
    cp -a --reflink A/3/three C/c/three
    cp -a --reflink A/4/four C/d/four
    cp -a --reflink A/5/five C/e/five