Search code examples
language-agnosticurlinternationalizationasciiurl-encoding

Is it advisable to have non-ascii characters in the URL?


We are currently working on a I18N project. I am wondering what are the complications of having the non-ascii characters in the URL. If its not advisable, what are the alternatives to deal with this problem?

EDIT (in response to Maxym's answer): The site is going to be local to specific country and I need not worry about the world wide public accessing this site. I understand that from usability point of view, It is really annoying. What are the other technical problem associated with this?


Solution

  • It is possible to use non-ASCII/non-Latin domain names using IDNA. Further, you can always use percent encoding (like %20 for space) in URLs. RFC 3986 recommends UTF-8 encoding combined with percents:

    the data should first be encoded as octets according to the UTF-8 character encoding; then only those octets that do not correspond to characters in the unreserved set should be percent-encoded. (...) For example, the character A would be represented as "A", the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as "%C3%80", and the character KATAKANA LETTER A would be represented as "%E3%82%A2".

    Modern clients (web browsers) are able to transform back and forth between percent encoding and Unicode, so the URL is transferred as ASCII but looks pretty for the user.

    Make sure you're using a web framework/CMS that understands this encoding as well, to simplify URL input from webmasters/content editors.