Search code examples
linuxbashbackup

How to delete older files but keep recent ones during backup?


I have a remote server that copies 30-some backup files to a local server every day and I want to remove the old backups if and only if a newer backup successfully copied.

With different codes I tried, I managed to erase older files, but I got the problem that if it found one new backup, it deleted ALL older ones.

I have something like (picture this with 20 virtual machines):

vm001-2019-08-01.bck
vm001-2019-07-28.bck
vm002-2019-08-01.bck
vm003-2019-07-29.bck
vm004-2019-08-01.bck
vm004-2019-07-31.bck
vm004-2019-07-30.bck
vm004-2019-07-29.bck
...

And I'd want to erase all but keep only the most recent ones. i.e.: erase:

vm001-2019-07-28.bck
vm002-2019-07-29.bck
vm004-2019-07-31.bck
vm004-2019-07-30.bck
vm004-2019-07-29.bck

and keep only:

vm001-2019-08-01.bck
vm002-2019-08-01.bck
vm003-2019-07-29.bck
vm004-2019-08-01.bck

the problem I had is that if I have any recent backup of any machine, files like vm-003-2019-07-29 get deleted, because they are older, even if they are of different machines.

I know there are several variants of this question in the site, but I can't quite get this to work.

I've been trying variants of this code:

#!/bin/bash

for i in ./*.bck
do
  echo "found" "$i"
  if [[ -n $(find "$i" -type f -mmin -1440) ]]
  then
    echo "$i"
    find "$i" -type f -mmin +1440 -exec rm -f "$i" {} +
  fi
done

(The echos are for debugging purposes only)

At this time, this code finds the newer and the older files, but doesn't delete anything. If I put find "$i" -type f -mmin +1440 -exec echo "$i" {} +, it never prints anything, as if find $i is not finding anything, but when I run it as a solo command in the terminal, it does (minus the -exec part).

I've tested this script generating files with different timestamps using touch -d, but I had no success.


Solution

  • Unless you add the -name test before the filename find is going to consider "$i" to be the name of a directory to search in. So your find command should be:

    find -name "$i" -type f -mmin -1440
    

    which will search in the current directory. Or

    find /path/to/dir -name "$i" -type f -mmin -1440
    

    which will search in a directory named "/path/to/dir".

    But, based on BashFAQ/099, I would do this to delete all but the newest file for each VM (untested):

    #!/bin/bash
    declare -A newest  # associative array to store name of newest file for each VM
    for f in *
    do
        vm=${f%%-*}    # extracts vm name from filename (i.e. vmm001 from vm001-2019-08-01.bck)
        if [[ -f $f && $f -nt ${newest["$vm"]} ]]
        then
            newest["$vm"]=$f
        fi
    done
    
    for f in *
    do
        vm=${f%%-*}
        if [[ -f $f && $f != ${newest["$vm"]} ]]
        then
            rm "$f"
        fi
    done
    

    This is set up to run against files in the current directory. It assumes that the files are named as shown in the question (the VM name is separated from the rest of the file name by a hyphen). In order to use an associative array, Bash 4 or higher is required.