Search code examples
c++encodingunicodetinyxml

UTF-8 and TinyXML


For some reason I can not read data from a xml file properly. For example instead of "Schrüder" I get something like "Schrüder".

My code:

tinyxml2::XMLDocument doc;

bool open(string path) {
    if(doc.LoadFile(path.c_str()) == XML_SUCCESS)
        return true;
    return false;
}



int main() {
    if(open("C:\\Users\\Admin\\Desktop\\Test.xml"))
    cout << "Success" << endl;

    XMLNode * node = doc.RootElement();
    string test = node->FirstChild()->GetText();

    cout << test << endl;
    return 0;
}

Part of XML:

<?xml version="1.0" encoding="UTF-8"?>
<myXML>
    <my:TXT_UTF8Test>Schrüder</my:TXT_UTF8Test>
</myXML>

Notice that if I convert it to ANSI and change the encoding type to "ISO-8859-15" it works fine.

I read that something like "LoadFile( filename, TIXML_ENCODING_UTF8 )" should help. However that's not the case (error: Invalid arguments, it just expects a const char). I have the latest version of TinyXML2 (I guess?). I downloaded it just a couple minutes ago from https://github.com/leethomason/tinyxml2.

Any ideas?

Edit: When I write the string to a .xml or .txt file it works fine. There might be some problem with the eclipse ide console. Anyway, when I try to send the string via E-Mail, I also get the same problems. Here's the MailSend script:

bool sendMail(std::string params) {

    if( (int) ShellExecute(NULL, "open", "H:\\MailSend\\MailSend_anhang.exe", params.c_str(), NULL, SW_HIDE) <= 32 )
        return false;
    return true;

}

I call it in the main method like this:

sendMail("-f:[email protected] -t:[email protected] -s:Subject -b:Body " + test);

Solution

  • I think the problem is with your terminal; can you try run your test code in a different terminal ? one with known good UTF-8 support ?

    Output with terminal in UTF-8 mode:

    $ ./a.out 
    Success
    Schrüder
    

    Output with terminal in ISO-8859-15 mode:

    $ ./a.out 
    Success
    SchrÃŒder
    

    Also - please try and follow http://sscce.org/ - for posterity sake here is your code with everything needed to compile (17676169.cpp):

    #include <tinyxml2.h>
    #include <string>
    #include <iostream>
    
    using namespace std;
    using namespace tinyxml2;
    
    tinyxml2::XMLDocument doc;
    
    bool open(string path) {
        if(doc.LoadFile(path.c_str()) == XML_SUCCESS)
            return true;
        return false;
    }
    
    
    
    int main() {
        if(open("Test.xml"))
        cout << "Success" << endl;
    
        XMLNode * node = doc.RootElement();
        string test = node->FirstChildElement()->GetText();
    
        cout << test << endl;
        return 0;
    }
    

    compiled with:

    g++ -o 17676169 17676169.cpp -ltinyxml2
    

    and uuencoded Test.xml - to ensure exact same data is used

    begin 660 Test.xml
    M/#]X;6P@=F5R<VEO;CTB,2XP(B!E;F-O9&EN9STB551&+3@B/SX*/&UY6$U,
    M/@H@("`@/&UY.E185%]55$8X5&5S=#Y38VARP[QD97(\+VUY.E185%]55$8X
    /5&5S=#X*/"]M>5A-3#X*
    `
    end
    

    Edit 1:

    If you want to confirm this theory - run this in eclipse:

    #include <iostream>
    #include <string>
    #include <fstream>
    
    int main()
    {
        std::ifstream ifs("Test.xml");
        std::string xml_data((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
        std::cout << xml_data;
    }
    

    Output with terminal in UTF-8 mode:

    $ ./17676169.cat 
    <?xml version="1.0" encoding="UTF-8"?>
    <myXML>
        <my:TXT_UTF8Test>Schrüder</my:TXT_UTF8Test>
    </myXML>
    

    Output with terminal in ISO-8859-15 mode:

    $ ./17676169.cat 
    <?xml version="1.0" encoding="UTF-8"?>
    <myXML>
        <my:TXT_UTF8Test>SchrÃŒder</my:TXT_UTF8Test>
    </myXML>