Search code examples
c++xmlboostboost-propertytree

End tag xml validation incorrect in Boost ptree read xml


I am trying to do some simple xml parsing using Boost Ptrees in C++. However it seems like the read_xml function only throws an error if there is no end tag present. The below throws an error. For eg:

<?xml version="1.0" encoding="utf-8"?>
<Grandparent>
<Parent>test<Parent>
</Grandparent>

Note the end tag of Parent doesnt have a closing forward slash and this is thrown as an error. Even the lack of closing tag like <Parent>test throws a valid error which is expected.

However if closing tag string doesnt match with the opening tag string it does not throw an error. For eg:

<?xml version="1.0" encoding="utf-8"?>
<Grandparent>
<Parent>test</Child>
</Grandparent>

The above parses just fine. My code is very simple as below:

using boost::property_tree::ptree;
ptree pt;
read_xml(xmlpath, pt);

Is there any thing that I am overlooking here?


Solution

  • Yes.

    Most importantly: Boost Property Tree is not an XML library.

    Secondly, the rapidxml implementation used under the hood has closing-tag validation as opt-in:

    if (Flags & parse_validate_closing_tags)
    {
        // Skip and validate closing tag name
        Ch *closing_name = text;
        skip<node_name_pred, Flags>(text);
        if (!internal::compare(node->name(), node->name_size(), closing_name, text - closing_name, true))
            BOOST_PROPERTY_TREE_RAPIDXML_PARSE_ERROR("invalid closing tag name", text);
    }
    

    Luck has it that Boost Property didn't opt-in. In fact, it doesn't let you opt in:

    /// Text elements should be put in separate keys,
    /// not concatenated in parent data.
    static const int no_concat_text  = 0x1;
    /// Comments should be omitted.
    static const int no_comments     = 0x2;
    /// Whitespace should be collapsed and trimmed.
    static const int trim_whitespace = 0x4;
    
    inline bool validate_flags(int flags)
    {
        return (flags & ~(no_concat_text | no_comments | trim_whitespace)) == 0;
    }
    

    No other flags are allowed.

    I suggest you turn to an XML library if you need XML parsing.