Search code examples
javascripturlpathurlencodeencodeuricomponent

JavaScript `URL`: when to encode when setting `pathname`?


When setting the pathname of a URL, when should you encode the value you are setting it to?

When I say URL I mean this API: https://developer.mozilla.org/en-US/docs/Web/API/URL

When I say "setting the pathname" I mean to do this:

url.pathname = 'some/path/to/a/resource.html';

Based on the MDN documentation, I would think the answer is "you shouldn't need to", as there is an example covering this case:

URLs are encoded according to the rules found in RFC 3986. For instance:

url.pathname = 'démonstration.html';
console.log(url.href); // "http://www.example.com/d%C3%A9monstration.html"

However, I have run into a case where it seems I do need to encode the value I am setting pathname to:

url.pathname = 'atest/New Folder1234/!@#$%^&*().html';
console.log(url.href);

I would expect this to output: http://example.com/atest/New%20Folder1234/!%40%23%24%25%5E%26*().html

But instead I am getting: https://example.com/atest/New%20Folder1234/!@%23$%^&*().html

It seems to get what I expect I have to do:

url.pathname = 'atest/New Folder1234/!@#$%^&*()'.split('/').map(encodeURIComponent).join('/')

What is going on here? I cannot find anything on the MDN doc page for either URL or pathname that explains this. I took quick look through RFC 3986, but that just seems to describe the URI syntax. I have run some experiments in an effort to find some sort of pattern to this problem, but nothing is standing out to me.


Solution

  • See the specification for path state, in particular...

    UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

    with the path percent-encode set being defined as...

    the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).

    and the query percent-encode set being...

    the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).

    you can keep diving down the rabbit-hole if you want but I feel that's enough

    Note that none of these sets include @$%^& which are the characters you pointed out.

    Compare these to the specification for Encode which is much more thorough.