Search code examples
bashescapingfilenames

Is there an "escape converter" for file and directory names available?


The day came when I had to write a BASH script that walks arbitrary directory trees and looks at arbitrary files and attempts to determine something regarding a comparison among them. I thought it would be a simple couple-of-hours tops! process - Not So!

My hangup is that sometimes some idiot -ahem!- excuse me, lovely user chooses to put spaces in directory and file names. This causes my script to fail.

The perfect solution, aside from threatening the guillotine for those who insist on using spaces in such places (not to mention the guys who put this in operating systems' code!), might be a routine that "escapes" the file and directory names for us, kind of like how cygwin has routines to convert from unix to dos filename formats. Is there anything like this in a standard Unix / Linux distribution?

Note that the simple for file in * construct doesn't work so well when one is trying to compare directory trees as it ONLY works on "the current directory" - and, in this case as in many others, constantly CDing to various directory locations brings with it its own problems. So, in doing my homework, I found this question Handle special characters in bash for...in loop and the proposed solution there hangs up on spaces in directory names, but can simply be overcome like this:

dir="dirname with spaces"
ls -1 "$dir" | while read x; do
   echo $x
done

PLEASE NOTE: The above code isn't particularly wonderful because the variables used inside the while loop are INACCESSIBLE outside that while loop. This is because there's an implied subshell created when the ls command's output is piped. This is a key motivating factor to my query!

...OK, the code above helps for many situations but "escaping" the characters would be pretty powerful too. For example, dir above might contain:

dir\ with\ spaces

Does this already exist and I've just been overlooking it?

If not, does anyone have an easy proposal to write one - maybe with sed or lex? (I'm far from competent with either.)


Solution

  • Make a really nasty filename for testing:

    mkdir escapetest
    cd escapetest && touch "m'i;x&e\"d u(p\nmulti)\nlines'\nand\015ca&rr\015re;t"
    

    [ Edit: Chances are that I intended that touch command to be:

    touch $'m\'i;x&e\"d u(p\nmulti)\nlines\'\nand\015ca&rr\015re;t'
    

    which puts more ugly characters in the filename. The output will look a little different. ]

    Then run this:

    find -print0 | while read -d '' -r line; do echo -en "--[${line}]--\t\t"; echo "$line"|sed -e ':t;N;s/\n/\\n/;bt' | sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g'; done
    

    The output should look like this:

    --[./m'i;x&e"d u(p
    multi)
    lines'
    re;t]--         ./m\'i\;x\&e\"d\ u\(p\\nmulti\)\\nlines\'\\nand\\015ca\&rr\\015re\;t
    

    This consists of a condensed version of Pascal Thivent's sed monster, plus handling for carriage returns and newlines and maybe a bit more.

    The first pass through sed merges multiple lines into one delimited by "\n" for filenames that have newlines. The second pass replaces any from a list of characters with a backslash preceding itself. The last part replaces carriage returns with "\r".

    One thing to note is that, as you know, while will handle spaces and for won't but by sending the output of find with null termination and setting the delimiter of read to null, you can also handle newlines in filenames. The -r option causes read to accept backslashes without interpreting them.

    Edit:

    Another way to escape the special characters, this time without using sed, uses the quoting and variable-creating feature of the Bash printf builtin (this also illustrates using process substitution rather than a pipe):

    while read -d '' -r file; do echo "$file"; printf -v name "%q" "$file"; echo "$name"; done< <(find -print0)
    

    The variable $name will be available outside the loop, since using process substitution prevents the creation of a subshell around the loop.