I have a text file containing the output of a recursive directory listing that generally looks like this:
./subfolder/something with spaces:
something\ with\ spaces.txt*
something\ with\ spaces.dat*
./subfolder/yet another thing:
yet\ another\ thing.txt*
yet\ another\ thing.dat*
I need to get a list of the full paths to each .txt file:
./subfolder/something with spaces/something with spaces.txt
./subfolder/yet another thing/yet another thing.txt
I've almost got a solution for this, but what's the best solution for unescaping the filenames in Python? I don't know exactly what characters ls -R
escaped (space and = are two such characters, though). I don't have access to the drive containing these files, either, so using a better command to obtain the list is out of the question, unfortunately.
I'm not sure if there's built-in for this, but a simple regex could be used.
re.sub(r'(?<!\\)\\', '', filename)
This would remove all backslashes (except for those following another backslash). This seems to be the behavior when you try and echo
these values on the terminal (I've only tested this in bash).
bash-3.2$ echo foo\\bar
foo\bar
bash-3.2$ echo foo\ bar
foo bar
bash-3.2$ echo foo\=bar
foo=bar
Here's a complete python example:
import re
def unescape(filename):
return re.sub(r'(?<!\\)\\', '', filename)
print unescape(r'foo\ bar')
print unescape(r'foo\=bar')
print unescape(r'foo\\bar')
Output:
foo bar
foo=bar
foo\bar