Search code examples
c++boostboost-propertytree

Boost property tree to parse custom configuration format


Following this link provided by @sehe in this post Boost_option to parse a configuration file, I need to parse configuration files that may have comments.

https://www.boost.org/doc/libs/1_76_0/doc/html/property_tree/parsers.html#property_tree.parsers.info_parser

But since there are comments (leading #), so in addition to read_info(), should a grammer_spirit be used to take out the comments as well? I am referring to info_grammar_spirit.cpp in the /property_tree/examples folder


Solution

  • You would do good to avoid depending on implementation details, so instead I'd suggest pre-processing your config file just to strip the comments.

    A simple replace of "//" with "; " may be enough.

    Building on the previous answer:

    std::string tmp;
    {
        std::ifstream ifs(file_name.c_str());
        tmp.assign(std::istreambuf_iterator<char>(ifs), {});
    } // closes file
    
    boost::algorithm::replace_all(tmp, "//", ";");
    std::istringstream preprocessed(tmp);
    read_info(preprocessed, pt);
    

    Now if you change the input to include comments:

    Resnet50 {
        Layer CONV1 {
            Type: CONV // this is a comment
            Stride { X: 2, Y: 2 }       ; this too
            Dimensions { K: 64, C: 3, R: 7, S: 7, Y:224, X:224 }
        }
    
        // don't forget the CONV2_1_1 layer
        Layer CONV2_1_1 {
            Type: CONV
            Stride { X: 1, Y: 1 }       
            Dimensions { K: 64, C: 64, R: 1, S: 1, Y: 56, X: 56 }
        }
    }
    

    It still parses as expected, if we also extend the debug output to verify:

    ptree const& resnet50 = pt.get_child("Resnet50");
    for (auto& entry : resnet50) {
        std::cout << entry.first << " " << entry.second.get_value("") << "\n";
    
        std::cout << " --- Echoing the complete subtree:\n";
        write_info(std::cout, entry.second);
    }
    

    Prints

    Layer CONV1
     --- Echoing the complete subtree:
    Type: CONV
    Stride
    {
        X: 2,
        Y: 2
    }
    Dimensions
    {
        K: 64,
        C: 3,
        R: 7,
        S: 7,
        Y:224, X:224
    }
    Layer CONV2_1_1
     --- Echoing the complete subtree:
    Type: CONV
    Stride
    {
        X: 1,
        Y: 1
    }
    Dimensions
    {
        K: 64,
        C: 64,
        R: 1,
        S: 1,
        Y: 56,
        X: 56
    }
    

    See it Live On Coliru

    Yes, But...?

    What if '//' occurs in a string literal? Won't it also get replaced. Yes.

    This is not a library-quality solution. You should not expect one, because you didn't have to put in any effort to parse your bespoke configuration file format.

    You are the only party who can judge whether the short-comings of this approach are a problem for you.

    However, short of just copying and modifying Boost's parser or implementing your own from scratch, there's not a lot one can do.

    For The Masochists

    If you don't want to reimplement the entire parser, but still want the "smarts" to skip string literals, here's a pre_process function that does all that. This time, it's truly employing Boost Spirit

    #include <boost/spirit/home/x3.hpp>
    std::string pre_process(std::string const& input) {
        std::string result;
        using namespace boost::spirit::x3;
        auto static string_literal
            = raw[ '"' >> *('\\'>> char_ | ~char_('"')) >> '"' ];
    
        auto static comment
            = char_(';') >> *~char_("\r\n")
            | "//" >> attr(';') >> *~char_("\r\n")
            | omit["/*" >> *(char_ - "*/") >> "*/"];
    
        auto static other
            = +(~char_(";\"") - "//" - "/*");
    
        auto static content
            = *(string_literal | comment | other) >> eoi;
    
        if (!parse(begin(input), end(input), content, result)) {
            throw std::invalid_argument("pre_process");
        }
        return result;
    }
    

    As you can see, it recognizes string literals (with escapes), it treats "//" and ';' style linewise comments as equivalent. To "show off" I threw in /block comments/ which cannot be represented in proper INFO syntax, so we just omit[] them.

    Now let's test with a funky example (extended from the "Complicated example demonstrating all INFO features" from the documentation):

    #include <boost/property_tree/info_parser.hpp>
    #include <iostream>
    using boost::property_tree::ptree;
    
    int main() {
        boost::property_tree::ptree pt;
        std::istringstream iss(
                pre_process(R"~~( ; A comment
    key1 value1   // Another comment
    key2 "value with /* no problem */ special // characters in it {};#\n\t\"\0"
    {
       subkey "value split "\
              "over three"\
              "lines"
       {
          a_key_without_value ""
          "a key with special characters in it {};#\n\t\"\0" ""
          "" value    /* Empty key with a value */
          "" /*also empty value: */ ""       ; Empty key with empty value!
       }
    })~~"));
    
        read_info(iss, pt);
    
        std::cout << " --- Echoing the parsed tree:\n";
        write_info(std::cout, pt);
    }
    

    Prints (Live On Coliru)

     --- Echoing the parsed tree:
    key1 value1
    key2 "value with /* no problem */ special // characters in it {};#\n    \"\0"
    {
        subkey "value split over threelines"
        {
            a_key_without_value ""
            "a key with special characters in it {};#\n     \"\0" ""
            "" value
            "" ""
        }
    }