Search code examples
c++parsingboostboost-spirit-x3

Boost Spirit X3 Skip Parser Implementation?


For the current grammar I am parsing with X3, whitespace and Perl-style comments are ignored.

It seems to me that a skip parser in X3 is just a normal parser, and whatever input it consumes is considered "skipped." I came up with this:

namespace x3 = boost::spirit::x3;
auto const blank_comment = 
   x3::blank | x3::lexeme[ '#' >> *(x3::char_ - x3::eol) >> x3::eol ];

On parsing a very basic input (a couple comment lines and one quoted string line), this seems to work well. (Live on Coliru)

However, as I can't find any documentation on the matter and the details of current skip parsers are tucked away in an intricate system of templates, I was hoping for some input.

  1. Is this the proper way of defining a "skip parser"? Is there a standard method?
  2. Are there performance concerns with an implementation like this? How would it be improved?

I previously searched SO for the details, and found an answer using Qi (Custom Skip Parser with Boost::Spirit). As I never learned Qi, much of the details are hard to follow. The method I described above seems more intuitive.


Solution

  • Yeah that's fine.

    The skipper seems pretty optimal. You could optimize the quoted_string rule by reordering and using character set negation (operator~):

    Live On Coliru

    #include <boost/spirit/home/x3.hpp>
    
    namespace parser {
        namespace x3 = boost::spirit::x3;
        auto const quoted_string = x3::lexeme [ '"' >>  *('\\' >> x3::char_ | ~x3::char_("\"\n")) >> '"' ];
        auto const space_comment = x3::space | x3::lexeme[ '#' >> *(x3::char_ - x3::eol) >> x3::eol];
    }
    
    #include <iostream>
    int main() {
        std::string result, s1 = "# foo\n\n#bar\n   \t\"This is a simple string, containing \\\"escaped quotes\\\"\"";
    
        phrase_parse(s1.begin(), s1.end(), parser::quoted_string, parser::space_comment, result);
    
        std::cout << "Original: `" << s1 << "`\nResult: `" << result << "`\n";
    }
    

    Prints

    Original: `# foo
    
    #bar
        "This is a simple string, containing \"escaped quotes\""`
    Result: `This is a simple string, containing "escaped quotes"`