Search code examples
c++parsingampersandexpat-parser

Parsing ampersands with expat fails. Invalid token?


I get an expat error when parsing specific characters only. Other HTML code is parsed just fine. I'm using the UTF-8 library of expat libexpatMT.lib and I'm working with char and std::string in a wrapper. No wide chars etc. used.

// The ampersand leads to: Expat error: *not well-formed (invalid token)*
<a href="http://www.myurl.com?a=b&c=d">Link</a>
<span>Tom & Jerry</span>
<h1>K&auml;se</h1>

I'm confused why the ampersand can be an invalid token here, since it's used even within HTML entities like &amp; Replacing the ampersands with &amp; or custom spacers doesn't work either.

Any suggestions? The ampersand is the issue here.


Solution

  • In XML, you escape ampersand, even in entities. So the valid value is <a href="http://www.myurl.com?a=b&amp;c=d">Link</a>
    Correct Web pages do that. Browsers are quite tolerant for the error you made, though.