Search code examples
c++encodingxercesiconvtranscode

Xerces 3.2 XMLString::transcode not working on special characters


I have this xml file :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cmh>
<value atr="éè€ç"></value> 
</cmh>

And this simple C++ program using Xerces 3.2.3:

...
//const XMLCh* xmlch_OptionA = currentElement->getAttribute(XMLString::transcode("atr")); --> this one always works
char* a =  "éèç€";
//char* a =  XMLString::transcode(xmlch_OptionA); --> this one does not work with these characters
cout << sizeof(char) << " " << a << std::endl;
cout << std::hex << (unsigned int)(a[0] &0xFF) << " " << (unsigned int)(a[1] &0xFF) << " " << (unsigned int)(a[2] &0xFF) << " " << (unsigned int)(a[3] &0xFF) << std::endl;
...

Output:

1 éèç€
c3 a9 c3 a8

This program works just fine but when I try to retrieve the char* from the XML file with XMLString:transcode (see the commented lines), I get nothing and I can't figure out why. I built this Xerces with Iconv as its transcoder, isn't it supposed to correctly handle these situations? Or maybe is there a way to achieve the same result without using transcode()?

Wrong output:

1
0 0 0 0

NB: Of course, it works if I replace the "éèç€" by something like "abcd".


Solution

  • The problem was coming from the Docker image I was using (gcc:10.2). The locale for en_US.UTF-8 was not installed on it. So, I installed it and wrote at the beginning of my program:

    setlocale(LC_ALL, "en_US.UTF-8");
    

    XMLString::transcode works just fine now.