Search code examples
c++boostboost-spirit

Boost Spirit (classic): Inline parser, working except for skipping comments


EDIT: As mentioned in the comment on sehe's answer, turns out the code below works just fine; it was my handling of the iterators (not shown here) that was faulty. Sorry, my bad. Voting to close for off-topic / non-reproducable.

EDIT2: Elaboration... if you're using boost::spirit::istream_iterator to feed a ifstream to the parse function (like I did), do not forget to call unsetf( std::ios::skipws ) on that ifstream first, or your parsing will fail...


I've got a DSL (domain specific language) file that's looking like this:

# Comment (optional)

codepage = "ISO-8859-2";
...

I.e., either the codepage specification is the first non-comment statement in the file, or the file is considered to be of the default codepage.

I commandeered Boost Spirit for the task. I had to stay with Spirit Classic for technical reasons (cough AIX / XLC cough), and after some head-scratching through the tutorials -- which invariably aim at much more involved setups, being much more complicated than I would like this to be -- I came up with this little piece of code:

#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_rule.hpp>
#include <boost/spirit/include/classic_utility.hpp>

#include <string>

namespace spirit_classic = boost::spirit::classic;

template< typename Iterator >
static std::string getCodepage( Iterator first, Iterator last )
{
    std::string codepage;

    spirit_classic::parse(
        first,
        last,
        spirit_classic::as_lower_d[ "codepage" ]
            >> spirit_classic::ch_p( '=' )
            >> spirit_classic::lexeme_d[ spirit_classic::ch_p( '"' )
                >> ( +( spirit_classic::anychar_p - spirit_classic::ch_p( '"' ) ) )[spirit_classic::assign_a( codepage )]
                >> spirit_classic::ch_p( '"' ) ]
        >> spirit_classic::ch_p( ';' ),
        spirit_classic::space_p | spirit_classic::comment_p( '#' )
        );

    if ( codepage.empty() )
    {
        codepage = "UTF-8";
    }

    return codepage;
}

This works pretty well... except for the skipper:

...
spirit_classic::space_p | spirit_classic::comment_p( '#' )
...

This skips whitespaces allright -- but utterly fails at skipping comments (i.e. anything from '#' to end-of-line), which I understood `comment_p('#') to achieve.

So apparently I have understood something wrong. I just cannot figure out what. Help?


Solution

  • I don't have much insight here, and I'm only testing with MSVC/GCC here, but perhaps the problem is with comment_p trying to consume eol (which is eaten by the space_p skipper instead)?

    So either you could use spirit_classic::blank_p (and be explicit about your eols), or you might have luck reversing the branches of the skipper:

        spirit_classic::comment_p( '#' ) | spirit_classic::space_p
    

    See it Live On Coliru:

    #include <boost/spirit/include/support_istream_iterator.hpp>
    #include <boost/spirit/include/classic_core.hpp>
    #include <boost/spirit/include/classic_rule.hpp>
    #include <boost/spirit/include/classic_utility.hpp>
    
    #include <string>
    
    namespace spirit_classic = boost::spirit::classic;
    
    template< typename Iterator >
    static std::string getCodepage( Iterator first, Iterator last )
    {
        std::string codepage;
    
        spirit_classic::parse(
            first,
            last,
            spirit_classic::as_lower_d[ "codepage" ]
                >> spirit_classic::ch_p( '=' )
                >> spirit_classic::lexeme_d[ spirit_classic::ch_p( '"' )
                    >> ( +( spirit_classic::anychar_p - spirit_classic::ch_p( '"' ) ) )[spirit_classic::assign_a( codepage )]
                    >> spirit_classic::ch_p( '"' ) ]
            >> spirit_classic::ch_p( ';' ),
             spirit_classic::comment_p( '#' ) | spirit_classic::space_p
            );
    
        if ( codepage.empty() )
        {
            codepage = "UTF-8";
        }
    
        return codepage;
    }
    
    int main()
    {
        std::string input = "# Comment (optional)\n"
            "\n"
            "\n"
            "\n"
            "codepage = \"ISO-8859-2\"; \n";
    
        std::cout << getCodepage(input.begin(), input.end());
    }
    

    Prints

    ISO-8859-2