Search code examples
bashshellshxargs

Files with quotes, spaces causing bad behavior from xargs


I want to find some files and calculate the shasum by using a pipe command.

find . -type f | xargs shasum

But there are files withe quotes in my directory, for example the file named

file with "special" characters.txt

The pipe output look like this:

user@home ~ $ find . -type f | xargs shasum
da39a3ee5e6b4b0d3255bfef95601890afd80709  ./empty1.txt
da39a3ee5e6b4b0d3255bfef95601890afd80709  ./empty2.txt
da39a3ee5e6b4b0d3255bfef95601890afd80709  ./empty3.txt
shasum: ./file: 
shasum: with: No such file or directory
shasum: special: No such file or directory
shasum: characters.txt: No such file or directory
25ea78ccd362e1903c4a10201092edeb83912d78  ./file1.txt
25ea78ccd362e1903c4a10201092edeb83912d78  ./file2.txt

The quotes within the filename makes problems.

How can I tell shasum to process the files correctly?


Solution

  • The short explanation is that xargs is widely considered broken-by-design, unless using extensions to the standard that disable its behavior of trying to parse and honor quote and escaping content in its input. See the xargs section of UsingFind for more details.


    Using NUL Delimited Streams

    On a system with GNU or modern BSD extensions (including MacOS X), you can (and should) NUL-delimit the output from find:

    find . -type f -print0 | xargs -0 shasum --
    

    Using find -exec

    That said, you can do even better by getting xargs out of the loop entirely in a way that's fully compliant with modern (~2006) POSIX:

    find . -type f -exec shasum -- '{}' +
    

    Note that the -- argument specifies to shasum that all future arguments are filenames. If you'd used find * -type f ..., then you could have a result starting with a dash; using -- ensures that this result isn't interpreted as a set of options.


    Using Newline Delimiters (And Security Risks Thereof)

    If you have GNU xargs, but don't have the option of a NUL-delimited input stream, then xargs -d $'\n' (in shells such as bash with ksh extensions) will avoid the quoting and escaping behavior:

    xargs -d $'\n' shasum -- <files.txt
    

    However, this is suboptimal, because newline literals are actually possible inside filenames, thus making it impossible to distinguish between a newline that separates two names and a newline that is part of an actual name. Consider the following scenario:

    mkdir -p ./file.txt$'\n'/etc/passwd$'\n'/
    touch ./file.txt$'\n'/etc/passwd$'\n'file.txt file.txt
    find . -type f | xargs -d $'\n' shasum --
    

    This will have output akin to the following:

    da39a3ee5e6b4b0d3255bfef95601890afd80709  ./file.txt
    da39a3ee5e6b4b0d3255bfef95601890afd80709  ./file.txt
    c0c71bac843a3ec7233e99e123888beb6da8fbcf  /etc/passwd
    da39a3ee5e6b4b0d3255bfef95601890afd80709  file.txt
    

    ...thus allowing an attacker who can control filenames to cause a shasum for an arbitrary file outside the intended directory structure to be added to your output.