I'm programming an IRC and XMPP bot that needs to convert user provided input to a filename. I have already written a function to do this. Is it sane enough?
Here is the code:
allowednamechars = string.ascii_letters + string.digits + '_+/$.-'
def stripname(name, allowed=""):
""" strip all not allowed chars from name. """
n = name.replace(os.sep, '+')
n = n.replace("@", '+')
n = n.replace("#", '-')
n = n.replace("!", '.')
res = u""
for c in n:
if ord(c) < 31: continue
elif c in allowednamechars + allowed: res += c
else: res += "-" + str(ord(c))
return res
It's a whitelist with extra code to remove control characters and replace os.sep, as well as some repaces to make the filename Google App Engine compatible.
The bot in question is at http://jsonbot.googlecode.com.
So what do you think of it?
urllib.quote(name.encode("utf8"))
will produce something human-readable, which should also be safe. Example:
In [1]: urllib.quote(u"foo bar$=+:;../..(boo)\u00c5".encode('utf8'))
Out[1]: 'foo%20bar%24%3D%2B%3A%3B../..%28boo%29%C3%85'