Are there any standalonenish solutions for normalizing international unicode text to safe ids and filenames in Python?
E.g. turn My International Text: åäö
to my-international-text-aao
plone.i18n does really good job, but unfortunately it depends on zope.security
and zope.publisher
and some other packages making it fragile dependency.
What you want to do is also known as "slugify" a string. Here's a possible solution:
import re
from unicodedata import normalize
_punct_re = re.compile(r'[\t !"#$%&\'()*\-/<=>?@\[\\\]^_`{|},.:]+')
def slugify(text, delim=u'-'):
"""Generates an slightly worse ASCII-only slug."""
result = []
for word in _punct_re.split(text.lower()):
word = normalize('NFKD', word).encode('ascii', 'ignore')
if word:
result.append(word)
return unicode(delim.join(result))
Usage:
>>> slugify(u'My International Text: åäö')
u'my-international-text-aao'
You can also change the delimeter:
>>> slugify(u'My International Text: åäö', delim='_')
u'my_international_text_aao'
Source: Generating Slugs
For Python 3: pastebin.com/ft7Yb3KS (thanks @MrPoxipol).