Search code examples
pythonshellescapingls

Unescaping filenames generated by ls -R


I have a text file containing the output of a recursive directory listing that generally looks like this:

./subfolder/something with spaces:
something\ with\ spaces.txt*
something\ with\ spaces.dat*

./subfolder/yet another thing:
yet\ another\ thing.txt*
yet\ another\ thing.dat*

I need to get a list of the full paths to each .txt file:

./subfolder/something with spaces/something with spaces.txt
./subfolder/yet another thing/yet another thing.txt

I've almost got a solution for this, but what's the best solution for unescaping the filenames in Python? I don't know exactly what characters ls -R escaped (space and = are two such characters, though). I don't have access to the drive containing these files, either, so using a better command to obtain the list is out of the question, unfortunately.


Solution

  • I'm not sure if there's built-in for this, but a simple regex could be used.

    re.sub(r'(?<!\\)\\', '', filename)
    

    This would remove all backslashes (except for those following another backslash). This seems to be the behavior when you try and echo these values on the terminal (I've only tested this in bash).

    bash-3.2$ echo foo\\bar
    foo\bar
    bash-3.2$ echo foo\ bar
    foo bar
    bash-3.2$ echo foo\=bar
    foo=bar
    

    Here's a complete python example:

    import re
    
    def unescape(filename):
        return re.sub(r'(?<!\\)\\', '', filename)
    
    print unescape(r'foo\ bar')
    print unescape(r'foo\=bar')
    print unescape(r'foo\\bar')
    

    Output:

    foo bar
    foo=bar
    foo\bar