Search code examples
ruby-on-railsrubyunicodesanitizationslug

Sanitizing Unicode strings for URL slugs (Ruby/Rails)


I have UTF-8 encoded post titles which I'd rather show using the appropriate characters in slugs. An example is Amazon Japan's URL here.

How can any arbitrary string be converted to a safe URL slug such as this, with Ruby (or Rails)?

(There are some related PHP posts, but nothing I could find for Ruby.)


Solution

  • From reading here it seems like a solution is this:

    require 'open-uri'
    str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
    puts URI::encode(str)
    

    Here is the documentation for open-uri. and here is some info on utf-8 encoded url schema.

    EDIT: having looked into this more I noticed encode is just an alias for URI.escape which is documented here. example taken from the docs below:

    require 'uri'
    
    enc_uri = URI.escape("http://example.com/?a=\11\15")
    p enc_uri
    # => "http://example.com/?a=%09%0D"
    
    p URI.unescape(enc_uri)
    # => "http://example.com/?a=\t\r"
    
    p URI.escape("@?@!", "!?")
    # => "@%3F@%21"
    

    Let me know if this is what you were looking for?

    EDIT #2: I was interested and kept looking a little more, according to the comments ryan bates' railscasts on friendlyid also seems to work with chinese characters.