Search code examples
bashdelete-filerm

Is this shell command to delete all but last X directories safe?


I've seen a lot of warnings against the dangers of filenames with funny characters wreaking havoc in shell scripts.

I've scoured SO and seen dozens of variants of xargs and -exec rm -rf {} \;, and "don't use ls for scripting" and I've come up with what I think is "safe" to run.

find /path/to/dir -mindepth 1 -maxdepth 1 -type d -print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf

I've got a directory full of sub-directores in this format:

# find /srv/mywebsite/releases -mindepth 1 -maxdepth 1 -type d | sort
/srv/mywebsite/releases/2017-01-01T01:43:23Z
/srv/mywebsite/releases/2017-01-01T02:09:44Z
/srv/mywebsite/releases/2017-01-01T02:20:06Z
...
/srv/mywebsite/releases/2017-04-22T01:34:45Z
/srv/mywebsite/releases/2017-04-30T03:24:19Z
/srv/mywebsite/releases/2017-05-02T01:48:39Z

I want to delete all but the last 10 of them, sorted by the date in the directory name, not the directory mod/create-time. This is just a precaution in case one of the dirs gets touched and mtime/ctime doesn't match.

I think my shell command above should do exactly that, but I just want to double check that it won't blow up my server if one of the dirs ever contains a * or . or something.


Solution

  • This is safe, in that:

    • No shell evaluation whatsoever is run on the names. This specifically includes glob expansion, so a name containing a * will not result in additional rm arguments.
    • Since all names are prefixed with /path/to/dir, we don't need to worry about leading dashes being interpreted as options. (In a scenario where you did have this concern, xargs -r0 rm -rf -- would be appropriate; per POSIX utility syntax guideline #10, passing the string -- ensures that all subsequent arguments are parsed as positional).
    • Since all names are separated with NULs, and NULs can't exist in names, we can't have a single name result in multiple arguments to rm. (Poorly-written scripts often make a similar assumption about newlines, but that assumption is unfounded).

    Inasmuch as you're depending on the names representing UTC timestamps in a specific format (and on new names continuing to match that format so they can be appropriately compared against old names), you might want to add an appropriate filter, making the full command something like:

    find /path/to/dir -mindepth 1 -maxdepth 1 -type d \
      -regextype posix-extended \
      -regex '.*/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}Z$' \
      -print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf --
    

    None of this is particularly portable -- both the original code and the above suggestion require non-POSIX extensions to find, sort, head and xargs; and the naming convention itself wouldn't be allowed on Windows filesystems (where : is reserved) -- but if you're running a modern GNU toolchain on a UNIXy platform, this looks good to me.