Search code examples
javascripthashwebkithashtag

Why URI-encoded ('#') anchors cause 404, and how to deal with it in JS?


prettyPhoto utilizes hashtags, but if they get encoded (to %23), most browsers will bring up a 404 error. This has been discussed before:

You get a 404 error because the #callback part is not part of the URL. It's a bookmark that is used by the browser, and it's never sent in the request to the server. If you encode the hash, it becomes part of the file name instead.

  1. Why would a hash become part of the file just because it's URI-encoded? Isn't it a bug?

  2. I'm asking because prettyPhoto uses hashtags and suffers from the same issue. I think adding a '?' before the hash is the most elegant solution, I'm just at a bit of a loss how to do it in the existing code:

    function getHashtag(){
    url=location.href;
    hashtag=url.indexOf('#gallery')!==-1)?decodeURI(url.substring(url.indexOf('#gallery')+1,url.length)):false;
    return hashtag;
    }
    function setHashtag(){
    if(typeof theRel=='undefined')return; location.hash=theRel+'/'+rel_index+'/';
    }
    function clearHashtag(){
    if(location.href.indexOf('#gallery')!==-1)location.hash="";
    }
  3. Any other suggestions? I'll look into tweaking my 404 page, but that seems more like handling a problem rather than preventing it.

Thanks!

EDIT: Since evidently there's nothing wrong with the way prettyphoto handles those hashes, I ended up adding these rules to my apache server:

RewriteRule ^(.*).shtml(%23|#)$ /$1.shtml [R=301,NE,L]
RewriteRule ^(.*).shtml([^g]+)gallery(.+)$ /$1.shtml#gallery$3 [R=301,NE,L]

They successfully handle the cases where %23 caused issues.


Solution

    1. Why would a hash become part of the file just because it's URI-encoded? Isn't it a bug?

    If you point your browser to http://example.com/index.html#title, the browser interprets this to make a request for the file index.html from the server example.com. Once the request is complete, the browser looks for an anchor element in the document with the name of 'title' (i.e. <a name="title">My title</a>).

    If you instead point to http://example.com/index.html%23title, the browser makes a request for the file index.html%23title from example.com, which probably doesn't exist on the server, giving you a 404. See the difference?

    And it's not a bug. It's part of an internet standard last updated in 1998. See RFC 2396. Quoting:

    The character "#" is excluded because it is used to delimit a URI from a fragment identifier in URI references (Section 4).

    As for 2 and 3, there's not enough context in your example code to tell what you're trying to do. How are you calling your code? What are you trying to do with prettyphoto that isn't working? Are you trying to redirect to a specific photo or gallery from a user click or other javascript event? Are you trying to open the gallery when someone visits a particular page?

    I checked the linked question with twitter/oauth, but I don't see how that ties into the code you provided. I started poking at prettyphoto as well, but I don't see how your code relates to that either.

    Instead of changing your 404 page, maybe what you need is an in-code handler or server rewrite rule that takes not-found requests with a %23 in them and redirects the user to the decoded url. That could have some drawbacks, but it would be fairly elegant if you're taking incoming requests from other sources you can't control. What is your server environment? (language, server tech, who owns the machine, etc.)

    I'd be happy to update my answer with a solution or a work around for you.