Search code examples
htmlhrefurl-encoding

What other characters beside ampersand (&) should be encoded in HTML href/src attributes?


Is the ampersand the only character that should be encoded in an HTML attribute?

It's well known that this won't pass validation:

<a href="http://domain.com/search?q=whatever&lang=en"></a>

Because the ampersand should be &amp;. Here's a direct link to the validation fail.

This guy lists a bunch of characters that should be encoded, but he's wrong. If you encode the first "/" in http:// the href won't work.

In ASP.NET, is there a helper method already built to handle this? Stuff like Server.UrlEncode and HtmlEncode obviously don't work - those are for different purposes.

I can build my own simple extension method (like .ToAttributeView()) which does a simple string replace.


Solution

  • Other than standard URI encoding of the values, & is the only character related to HTML entities that you have to worry about simply because this is the character that begins every HTML entity. Take for example the following URL:

    http://query.com/?q=foo&lt=bar&gt=baz
    

    Even though there aren't trailing semi-colons, since &lt; is the entity for < and &gt; is the entity for >, some old browsers would translate this URL to:

    http://query.com/?q=foo<=bar>=baz
    

    So you need to specify & as &amp; to prevent this from occurring for links within an HTML parsed document.