Search code examples
urlurlencoderfc3986url-fragment

Must the percent character be percent-encoded in a URL fragment?


The fragment portion of a URL reaches from the first # character to the end of the URL. Since reserved characters like %, ? and & have no special meaning in the fragment, there should be no need to percent-encode them in a URL fragment, right?

In other words: I believe the URL fragments #% and #%25 should both be allowed and unequal. The following snippet supports this point of view, because loading it (in Google Chrome) with http://server/#% highlights the first paragraph whereas http://server/#%25 highlights the second. (Somewhat unexpectedly, http://server/#%2525 also highlights the second.)

p:target { background-color: yellow; }
<p id="%">Buy today</p>
<p id="%25">and get 25% off!</p>
<a href="#%">http://server/#%</a>
<a href="#%25">http://server/#%25</a>
<a href="#%2525">http://server/#%2525</a>

Is that behavior correct? I like it, but it seems to contradict the statement in RFC 3986:

Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.


Solution

  • You are correct that in the fragment portion of a URL, reserved characters like %, ?, and & do not have special meaning, and there is generally no need to percent-encode them in a URL fragment. This means that both #% and #%25 are allowed and considered unequal in a URL fragment.

    The statement from RFC 3986 states that the percent character % itself must be percent-encoded as %25 when used as data within a URI, such as in the path or query components, to avoid confusion with the percent-encoding mechanism. However, in the fragment component, the % character can be used without percent-encoding because it does not have a special meaning.

    So, the behavior you've observed is consistent with the handling of URL fragments and is not in contradiction with RFC 3986. The fragment identifier can contain reserved characters like % without needing percent-encoding.