Search code examples
pythonurlslug

Advance slug/url parameterization (incl. transliteration) library in Python


I'm new to Python and am looking for a slug/url parameterization library that offers similar function to that in the Ruby Stringex library. For example:

# A simple prelude
"simple English".to_url => "simple-english"
"it's nothing at all".to_url => "its-nothing-at-all"
"rock & roll".to_url => "rock-and-roll"

# Let's show off
"$12 worth of Ruby power".to_url => "12-dollars-worth-of-ruby-power"
"10% off if you act now".to_url => "10-percent-off-if-you-act-now"

# You don't even wanna trust Iconv for this next part
"kick it en Français".to_url => "kick-it-en-francais"
"rock it Español style".to_url => "rock-it-espanol-style"
"tell your readers 你好".to_url => "tell-your-readers-ni-hao"

I've come across webhelpers.text.urlify, which claims to do this however- the results weren't close. Any help is much appreciated.


Solution

  • Check slugify, which is based on Django's own slugify template filter, but with NFKD normalization. Here's the relevant code:

    re.sub(r'[-\s]+', '-',
                unicode(
                    re.sub(r'[^\w\s-]', '',
                        unicodedata.normalize('NFKD', string)
                        .encode('ascii', 'ignore'))
                    .strip()
                    .lower()))
    

    It's not nearly as powerful as Ruby's Stringex, but you could easily extend it to expand those ampersands, dollar symbols, etc. Take a look at Unidecode, a Python port of Text::Unidecode Perl module, the same thing Stringex uses for Unicode transliteration.