Search code examples
htmlhtml-entitiesampersand

Can I use unencoded ampersands (&) in html?


I'm building a website where I have to work with less then perfect masterdata (I guess I'm not the only one :-))

In my case I have to render an xml filte to html (using xsl). Sometimes the masterdata is using html-enitites allready (eg ;é in french words) so there I have to use 'disable-output-escaping='yes') there in order to avoid double encoding.

The easiest solution is disable output escaping all together, so I never run the risk of a double encoding.

The only characters that misses encoding for this masterdata are the ampersands. But when I parse them 'raw' (so rather & than & all browsers seem to be ok with it.

So the question : what are the consequenses of using not encoded ampersands in html?


Solution

  • AFAIK bare ampersands are illegal in HTML. With that out of the way, let's look at the consequences:

    • You are now relying on the browser's capabilities to detect and gracefully recover from the problem. Note that in order to do this, the browser has to guess: is "clearly" an ampersand followed by a space, and © is clearly the copyright symbol. But what about the text fragment edit&copy? The browser I 'm using right now mangles it.
    • If you are using XHTML, or if the content is ever going to be inserted into an XML document, the result will be a hard parser error.

    Since it's more difficult to detect and account for these cases manually than it is to replace all ampersands that are not part of entities (say with a regex), you should really do the latter.