Search code examples
c++parsingboostkey-valueboost-spirit

Why does parsing a blank line with Spirit produce an empty key value pair in map?


I'm trying to use Spirit.Qi to parse a simple file format that has key value pairs separated with an equals sign. The file also supports comments and blank lines, as well as quoted values.

I can get nearly all of this to work as expected, however, any blank lines or comments cause an empty key value pair to be added to the map. When the map is traded for a vector, no blank entries are produced.

Example Program:

#include <fstream> 
#include <iostream> 
#include <string> 
#include <map> 

#include "boost/spirit/include/qi.hpp" 
#include "boost/spirit/include/karma.hpp" 
#include "boost/fusion/include/std_pair.hpp" 

using namespace boost::spirit; 
using namespace boost::spirit::qi; 

//////////////////////////////////////////////////////////////////////////////// 
int main(int argc, char** argv) 
{ 
   std::ifstream ifs("file"); 
   ifs >> std::noskipws; 

   std::map< std::string, std::string > vars; 

   auto value = as_string[*print]; 
   auto quoted_value = as_string[lexeme['"' >> *(print-'"') >> '"']]; 
   auto key = as_string[alpha >> *(alnum | char_('_'))]; 
   auto kvp = key >> '=' >> (quoted_value | value); 

   phrase_parse( 
      istream_iterator(ifs), 
      istream_iterator(), 
      -kvp % eol, 
      ('#' >> *(char_-eol)) | blank, 
      vars); 

   std::cout << "vars[" << vars.size() << "]:" << std::endl; 
   std::cout << karma::format(*(karma::string << " -> " << karma::string << karma::eol), vars); 

   return 0; 
}

Input File:

one=two
three=four

# Comment
five=six

Output:

vars[4]:
 ->
one -> two
three -> four
five -> six

Where is the empty key value pair coming from? And how can I prevent it from being generated?


Solution

  • Firstly, your program has undefined behaviour (and indeed it crashes on my system). The reason is you can't use auto expressions to store stateful parser expressions.

    See Assigning parsers to auto variables, boost spirit V2 qi bug associated with optimization level and others. See e.g. these answers for useful strategies to get around this limitation.

    Secondly, the empty line is because of the grammar.

    There's a difference between

      (-kvp) % qi::eol
    

    or

      -(kvp % qi::eol)
    

    The first will result in "optionally parsing a kvp" followed by "push the result into the attribute container".

    The latter will optionally "parse 1 or more kvp into a container". Note that this won't push the empty value if it wasn't matched.

    Fixed/demo

    I suggest

    • making key and value lexemes as well (just by dropping the Skipper on the rule declarations, really); You probably didn't want 'key name 1=value 1 to parse as "keyname1" -> "value1". You probably didn't want to allow key # no value\n either.
    • using BOOST_SPIRIT_DEBUG to see what's going on
    • not blanket using namespace boost::spirit. It's a bad idea. Trust me :/
    • rule declarations may appear to be verbose, but they do reduce the cruft in the rule definitions
    • using +eol instead of eol allows for the empty lines, which appears to be what you want

    Live On Coliru

    #define BOOST_SPIRIT_DEBUG
    #include "boost/spirit/include/qi.hpp" 
    #include "boost/spirit/include/karma.hpp" 
    #include "boost/fusion/include/std_pair.hpp" 
    #include <fstream> 
    #include <map> 
    
    namespace qi    = boost::spirit::qi;
    namespace karma = boost::spirit::karma;
    
    template <typename It, typename Skipper, typename Data>
    struct kvp_grammar : qi::grammar<It, Data(), Skipper> {
        kvp_grammar() : kvp_grammar::base_type(start) {
            using namespace qi;
    
            value        = raw [*print];
            quoted_value = '"' >> *~char_('"') >> '"';
            key          = raw [ alpha >> *(alnum | '_') ];
    
            kvp          = key >> '=' >> (quoted_value | value);
            start        = -(kvp % +eol);
    
            BOOST_SPIRIT_DEBUG_NODES((value)(quoted_value)(key)(kvp))
        }
      private:
        using Pair = std::pair<std::string, std::string>;
        qi::rule<It, std::string(), Skipper> value;
        qi::rule<It, Pair(),        Skipper> kvp;
        qi::rule<It, Data(),        Skipper> start;
        // lexeme:
        qi::rule<It, std::string()> quoted_value, key;
    };
    
    template <typename Map>
    bool parse_vars(std::istream& is, Map& data) {
        using It = boost::spirit::istream_iterator;
        using Skipper = qi::rule<It>;
    
        kvp_grammar<It, Skipper, Map> grammar;
        It f(is >> std::noskipws), l;
    
        Skipper skipper = ('#' >> *(qi::char_-qi::eol)) | qi::blank;
        return qi::phrase_parse(f, l, grammar, skipper, data); 
    }
    
    int main() { 
        std::ifstream ifs("input.txt"); 
    
        std::map<std::string, std::string> vars; 
    
        if (parse_vars(ifs, vars)) {
            std::cout << "vars[" << vars.size() << "]:" << std::endl; 
            std::cout << karma::format(*(karma::string << " -> " << karma::string << karma::eol), vars); 
        }
    }
    

    Output (currently broken on Coliru):

    vars[3]:
    five -> six
    one -> two
    three -> four
    

    With debug info:

    <kvp>
      <try>one=two\nthree=four\n\n</try>
      <key>
        <try>one=two\nthree=four\n\n</try>
        <success>=two\nthree=four\n\n# C</success>
        <attributes>[[o, n, e]]</attributes>
      </key>
      <quoted_value>
        <try>two\nthree=four\n\n# Co</try>
        <fail/>
      </quoted_value>
      <value>
        <try>two\nthree=four\n\n# Co</try>
        <success>\nthree=four\n\n# Comme</success>
        <attributes>[[t, w, o]]</attributes>
      </value>
      <success>\nthree=four\n\n# Comme</success>
      <attributes>[[[o, n, e], [t, w, o]]]</attributes>
    </kvp>
    <kvp>
      <try>three=four\n\n# Commen</try>
      <key>
        <try>three=four\n\n# Commen</try>
        <success>=four\n\n# Comment\nfiv</success>
        <attributes>[[t, h, r, e, e]]</attributes>
      </key>
      <quoted_value>
        <try>four\n\n# Comment\nfive</try>
        <fail/>
      </quoted_value>
      <value>
        <try>four\n\n# Comment\nfive</try>
        <success>\n\n# Comment\nfive=six</success>
        <attributes>[[f, o, u, r]]</attributes>
      </value>
      <success>\n\n# Comment\nfive=six</success>
      <attributes>[[[t, h, r, e, e], [f, o, u, r]]]</attributes>
    </kvp>
    <kvp>
      <try>five=six\n</try>
      <key>
        <try>five=six\n</try>
        <success>=six\n</success>
        <attributes>[[f, i, v, e]]</attributes>
      </key>
      <quoted_value>
        <try>six\n</try>
        <fail/>
      </quoted_value>
      <value>
        <try>six\n</try>
        <success>\n</success>
        <attributes>[[s, i, x]]</attributes>
      </value>
      <success>\n</success>
      <attributes>[[[f, i, v, e], [s, i, x]]]</attributes>
    </kvp>
    <kvp>
      <try></try>
      <key>
        <try></try>
        <fail/>
      </key>
      <fail/>
    </kvp>