I'm building a website where I have to work with less then perfect masterdata (I guess I'm not the only one :-))
In my case I have to render an xml filte to html (using xsl). Sometimes the masterdata is using html-enitites allready (eg ;é
in french words) so there I have to use 'disable-output-escaping='yes') there in order to avoid double encoding.
The easiest solution is disable output escaping all together, so I never run the risk of a double encoding.
The only characters that misses encoding for this masterdata are the ampersands. But when I parse them 'raw' (so rather & than &
all browsers seem to be ok with it.
So the question : what are the consequenses of using not encoded ampersands in html?
AFAIK bare ampersands are illegal in HTML. With that out of the way, let's look at the consequences:
&
is "clearly" an ampersand followed by a space, and ©
is clearly the copyright symbol. But what about the text fragment edit©
? The browser I 'm using right now mangles it.Since it's more difficult to detect and account for these cases manually than it is to replace all ampersands that are not part of entities (say with a regex), you should really do the latter.