Search code examples
bashfindxargstail

Bash script to remove folders older than X, but skipping newest N... Compatible with special characters in path


I am writing a backup script in bash and want to delete all backups older than X days, while leaving at least the newest N backups. This seems so simple, yet I have not been able to find a solution to the whole problem.

In my case, each backup consists of one folder, which are all in the same parent folder. To decide which backups are the newest the created or modified date should be used. If this gets too complicated, I can also go alphabetically by filename (the folder names consist only of a constant part and a datestamp YYYYMMDD). I want my script to work for any paths and folder names, so I cannot assume that they do not contain spaces, linebreaks etc.. And it is supposed to work on most modern linux system. Currently I am running it on Ubuntu 22.04.3 LTS.

I had a few different ideas, which all fall short in some way.

Parameters I used

target_basefolder="/path/to/backups/parent/folder"
min_n_backups=3
backup_keepdays=28

Version A - Oneliner

I tried this:

find ${target_basefolder}/* -maxdepth 1 -type d -print0 | sort -rz | xargs -r0 rm -rf

But I do not know how to tell it to ignore the newest three results. I tried to put tail -n +$((min_n_backups+1)) in there, but could not find a way to tell tail to use NUL as seperator instead of newline (like the -0 option for xargs).

Version B - Two Parts

Count the folders first and then run the delete command only if there are enough newer backups.

n_backups=$(find ${target_basefolder}/* -maxdepth 1 -type d -ctime -$(backup_keepdays) -printf '.' | wc -m)
if (( $n_backups > $min_n_backups )) ; then
    find ${target_basefolder}/* -maxdepth 1 -type d -ctime +$((min_n_backups+1)) -print0 | sort -rz | xargs -r0 rm -rf
fi

The problem here is that it will not delete anything if there are not enough newer backups present. For example if min_n_backups is 3 and there are only 2 backups newer than backup_keepdays, it will not delete any of the maybe 100 older backups instead of leaving just 1 of them and deleting the rest.

Version C - Simple Loop

Go through backup folders one by one and check their date.

icount=1
for ifolder in $(find ${target_basefolder}/* -maxdepth 1 -type d -print0 | sort -rz ) ; do
    is_old=$(find "$ifolder" -maxdepth 0 -type d -ctime +$((min_n_backups+1)) -printf '.' | wc -m)
    if (( $icount > $min_n_backups && $is_old > 0 )) ; then
        rm -rf $ifolder
    fi
    ((icount++))
done

I have yet to test this, especially with paths containing spaces and linebreaks. I am not sure if the for loop can handle a NUL-separated list. And if I can give find a folder name with spaces/linebreaks as input. But it feels overly complicated anyway, so I hope I won't have to go there.

Version D - Brute Force

I had the idea to move the newest N backup folders to a different folder, then deleting the folders older than M, then moving the newer N folders back. This could get slow if the folders are big. And it does not feel "right" somehow...

I hope someone can help me solve my Version A or give me hints on the other versions.

Thanks in advance!


Solution

  • Assuming tail -z is available, would you please try:

    # count the number of directories newer than $backup_keepdays (forced to remain)
    keep=$(find "$target_basefolder" -maxdepth 1 -mindepth 1 -type d -mtime -"$backup_keepdays" -printf "." | wc -c)
    
    # take MAX($keep, $backup_keepdays)
    (( keep > min_n_backups )) && min_n_backups=$keep
    
    # keep $((min_n_backups+1)) directories from the top in the sorted list
    find "$target_basefolder" -maxdepth 1 -mindepth 1 -type d -print0 | sort -rz | tail -zn +$((min_n_backups+1)) | xargs -0r rm -rf --
    

    A small tip: replace xargs -0r rm -rf -- with tr '\0' '\n' to print the directories to be removed as a dry run.