I have a Xerces (2.6) DOMNode object encoded UTF-8. I use to read its TEXT element like this:
CBuffer DomNodeExtended::getText( const DOMNode* node ) const {
char* p = XMLString::transcode( node->getNodeValue( ) );
CBuffer xNodeText( p );
delete p;
return xNodeText;
}
Where CBuffer is, well, just a buffer object which is lately persisted as it is in a DB.
This works until in the TEXT there are just common ASCII characters. If we have i.e. chinese ones they get lost in the transcode
operation.
I've googled a lot seeking for a solution. It looks like with Xerces 3, the DOMWriter class should solve the problem. With Xerces 2.6 I'm trying the XMLTranscoder, but no success yet. Could anybody help?
Edit It looks I was wrong and the DOMWriter class is already available in Xerces 2.6. I'm now trying a solution based on it.
I've now solved it as follows. I'm still not sure this is the optimal solution though
CBuffer DomNodeExtended::getText( const DOMNode* node ) const
{
XMLCh tempStr[100];
XMLString::transcode("LS", tempStr, 99);
DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(tempStr);
DOMWriter* myWriter = ((DOMImplementationLS*)impl)->createDOMWriter();
XMLCh *strNodeValue = myWriter->writeToString(*node);
XMLTransService::Codes resCode;
XMLTranscoder* t =
XMLPlatformUtils::fgTransService->makeNewTranscoderFor(
"UTF-8", resCode, 16*1024);
unsigned int charsEaten = 0;
unsigned int charsReturned = 0;
char bytesNodeValue[16*1024+4];
charsReturned = t->transcodeTo( strNodeValue,
XMLString::stringLen(strNodeValue),
(XMLByte*) bytesNodeValue,
16*1024,
charsEaten,
XMLTranscoder::UnRep_Throw);
CBuffer xNodeText( bytesNodeValue, charsReturned);
XMLString::release(&strNodeValue);
myWriter->release();
delete t;
return xNodeText;
}