Search code examples
urlwebunicodeiri

What are the eligible characters in a URL's Fragment (location.hash)?


Context: I am creating an app that stores its data in the location.hash. I want to encode as few characters as possible to maintain maximum legibility.

As explained in this answer, reserved characters are different for each segment of the URL. So what are the limitations for URL Fragment/location.hash specifically?

Related post: Unicode characters in URLs


Solution

  • According to RFC 3986: Uniform Resource Identifier (URI):

    fragment      = *( pchar / "/" / "?" )
    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
    pct-encoded   = "%" HEXDIG HEXDIG
    sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
    

    Unpacking all that, and ignoring percent-encoding, I find the following set of characters:

    abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~!$&'()*+,;=:@/?
    

    Although the RFC does not mandate a particular encoding and deals in characters only (not bytes), according to Section 2.3 ALPHA means ASCII only, i.e. the 26 letters of the Latin alphabet. Any non-ASCII letters must therefore be percent-encoded.