Search code examples
c++xmlsaxxercesxerces-c

xerces sax last html double quotes in a value is ignored


I am using Xerces SAX to parse an XML file.

Values between 2 HTML quotes are not correctly parsed, Only one quote is displayed the last one isn't. Example :

<Rating_Text>&quot;a3&quot;</Rating_Text>

is parsed as :

"a3

Whereas it should be

"a3"

How to fix that please ?

//...
m_pXMLReader->setFeature(XMLUni::fgSAX2CoreValidation, true);
//...
void CXMLMsg::characters(
const   XMLCh* const    chars,
const   XMLSize_t       length
)
{
   char* szData = XMLString::transcode(chars);
if(!isspace(*szData))
{
//
}
XMLString::release(&szData);
}

In debug mode, I checked that in "chars" the last double quotes is not present and the length is correct (it doesnt include the last double quote). It sounds like Xerces is ignoring the last "

If I replace html code for double quotes with real quotes ", I got the entire value in the callback... so why in the case of using HTML code to represent the double quotes, Xerces decide to split the string ??


Solution

  • Well, I found an idea, I will put the processing part in endElement callback, and in the characters callback I will just concatenate a string containing the element value ;)