Boost Spirit (classic): Inline parser, working except for skipping comments

EDIT: As mentioned in the comment on sehe's answer, turns out the code below works just fine; it was my handling of the iterators (not shown here) that was faulty. Sorry, my bad. Voting to close for off-topic / non-reproducable.

EDIT2: Elaboration... if you're using boost::spirit::istream_iterator to feed a ifstream to the parse function (like I did), do not forget to call unsetf( std::ios::skipws ) on that ifstream first, or your parsing will fail...

I've got a DSL (domain specific language) file that's looking like this:

# Comment (optional)

codepage = "ISO-8859-2";
...

I.e., either the codepage specification is the first non-comment statement in the file, or the file is considered to be of the default codepage.

I commandeered Boost Spirit for the task. I had to stay with Spirit Classic for technical reasons (cough AIX / XLC cough), and after some head-scratching through the tutorials -- which invariably aim at much more involved setups, being much more complicated than I would like this to be -- I came up with this little piece of code:

#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_rule.hpp>
#include <boost/spirit/include/classic_utility.hpp>

#include <string>

namespace spirit_classic = boost::spirit::classic;

template< typename Iterator >
static std::string getCodepage( Iterator first, Iterator last )
{
    std::string codepage;

    spirit_classic::parse(
        first,
        last,
        spirit_classic::as_lower_d[ "codepage" ]
            >> spirit_classic::ch_p( '=' )
            >> spirit_classic::lexeme_d[ spirit_classic::ch_p( '"' )
                >> ( +( spirit_classic::anychar_p - spirit_classic::ch_p( '"' ) ) )[spirit_classic::assign_a( codepage )]
                >> spirit_classic::ch_p( '"' ) ]
        >> spirit_classic::ch_p( ';' ),
        spirit_classic::space_p | spirit_classic::comment_p( '#' )
        );

    if ( codepage.empty() )
    {
        codepage = "UTF-8";
    }

    return codepage;
}

This works pretty well... except for the skipper:

...
spirit_classic::space_p | spirit_classic::comment_p( '#' )
...

This skips whitespaces allright -- but utterly fails at skipping comments (i.e. anything from '#' to end-of-line), which I understood `comment_p('#') to achieve.

So apparently I have understood something wrong. I just cannot figure out what. Help?

Solution

I don't have much insight here, and I'm only testing with MSVC/GCC here, but perhaps the problem is with comment_p trying to consume eol (which is eaten by the space_p skipper instead)?

So either you could use spirit_classic::blank_p (and be explicit about your eols), or you might have luck reversing the branches of the skipper:

    spirit_classic::comment_p( '#' ) | spirit_classic::space_p

See it Live On Coliru:

#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_rule.hpp>
#include <boost/spirit/include/classic_utility.hpp>

#include <string>

namespace spirit_classic = boost::spirit::classic;

template< typename Iterator >
static std::string getCodepage( Iterator first, Iterator last )
{
    std::string codepage;

    spirit_classic::parse(
        first,
        last,
        spirit_classic::as_lower_d[ "codepage" ]
            >> spirit_classic::ch_p( '=' )
            >> spirit_classic::lexeme_d[ spirit_classic::ch_p( '"' )
                >> ( +( spirit_classic::anychar_p - spirit_classic::ch_p( '"' ) ) )[spirit_classic::assign_a( codepage )]
                >> spirit_classic::ch_p( '"' ) ]
        >> spirit_classic::ch_p( ';' ),
         spirit_classic::comment_p( '#' ) | spirit_classic::space_p
        );

    if ( codepage.empty() )
    {
        codepage = "UTF-8";
    }

    return codepage;
}

int main()
{
    std::string input = "# Comment (optional)\n"
        "\n"
        "\n"
        "\n"
        "codepage = \"ISO-8859-2\"; \n";

    std::cout << getCodepage(input.begin(), input.end());
}

Prints

ISO-8859-2