Search code examples
c++xmlparsingrapidxml

RapidXML: "expected <" error at end-of-file related to whitespace bug?


I created a C++ application that reads in XML files with the RapidXML parser. At one XML file that was shaped exactly the same as another one that worked, the parser threw an error:

"expected <"

The last five characters before the error were from the closing tag of the root element, so the error happened at the end-of-file:

</UW>

What I suspect this error to be related to, is a whitespace skipping bug being an issue with RapidXML v1.12 (I am using v1.13). I used no parsing flags (doc.parse<0>(bfr);).

According to this site, the bug was believed to be caused by faulty implementation of the "parse_trim_whitespace" parse flag. A patch was provided on that site, but there also seemed to be a problem with that patch.

The following is the XML document that caused this error. What I also don't understand - besides the reason for the error - is why the error didn't happen parsing another file with content of the same fashion. My application also successfully parses several other files before that file.

<?xml version="1.0" encoding="UTF-8"?>
<UW>
    <Bez>EV005</Bez>
    <Herst>Trumpf</Herst>
    <Gesw>16</Gesw>
    <Rad>1.6</Rad>
    <Hoehe>100</Hoehe>
    <Wkl>30</Wkl>
    <BgVerf>Freibiegen</BgVerf>
    <MaxBel>50</MaxBel>
    <Kontur>0</Kontur>
    <Grafik>0</Grafik>
</UW>

Part of my application were the error occours (this is the inside of a loop):

    // Get "Bezeichnung" attribute
    attr = subnode->first_attribute("Bezeichnung");
    if ( !attr ){   err(ERR_FILE_INVALID,"Werkzeuge.xml");  return 0; }
    name = attr->value();
    // Get file name/URL
    string fileName = name;
    fileName.append(".xml");
    // Open file
    ifstream werkzeugFile(concatURL(PFAD_WERKZEUGE,fileName));
    if(!werkzeugFile.is_open()) {   err(ERR_FILE_NOTFOUND,fileName);    return 0;   }
    // Get length
    werkzeugFile.seekg(0,werkzeugFile.end);
    int len = werkzeugFile.tellg();
    werkzeugFile.seekg(0,werkzeugFile.beg);
    // Allocate buffer
    char * bfr = new char [len+1];
    werkzeugFile.read(bfr,len);
    werkzeugFile.close();
    // Parse
    SetWindowText(hwndProgress,"Parsing data: Werkzeuge/*.xml");
    btmDoc.parse<0>(bfr);

    // Get type of tool & check validity
    xml_node<> *rt_node = btmDoc.first_node();
    if ( strcmp(rt_node->name(),"OW") == 0 ){
        isOW = true;
    }
    else if ( strcmp(rt_node->name(),"UW") == 0 ){
        isUW = true;
    }
    else {  err(ERR_FILE_INVALID,fileName); return 0;   }

    // Prepare for next loop iteration
    delete[] bfr;
    btmDoc.clear();
    subnode = subnode->next_sibling();

Solution

  • Ah, I think I see it. Two things:

    First, the ifstream is suspicious -- shouldn't it be opened in binary mode if you're jumping around in it using byte offsets (and somebody else is doing the parsing)? Passstd::ios::in | std::ios::binary as the second argument to the ifstream constructor.

    Second, your memory management seems fine, except that you allocate one byte extra (the +1) but never seem to make use of it. I'm assuming you're missing bfr[len] = '\0'; after the contents are read in -- this explains the odd parse error at the end of the file, since the XML parser doesn't know it reached the end of the file -- it's parsing a null terminated string that isn't null terminated, and tries to parse random bytes of memory ;-)