EDIT: As mentioned in the comment on sehe's answer, turns out the code below works just fine; it was my handling of the iterators (not shown here) that was faulty. Sorry, my bad. Voting to close for off-topic / non-reproducable.
EDIT2: Elaboration... if you're using boost::spirit::istream_iterator
to feed a ifstream
to the parse function (like I did), do not forget to call unsetf( std::ios::skipws )
on that ifstream
first, or your parsing will fail...
I've got a DSL (domain specific language) file that's looking like this:
# Comment (optional)
codepage = "ISO-8859-2";
...
I.e., either the codepage
specification is the first non-comment statement in the file, or the file is considered to be of the default codepage.
I commandeered Boost Spirit for the task. I had to stay with Spirit Classic for technical reasons (cough AIX / XLC cough), and after some head-scratching through the tutorials -- which invariably aim at much more involved setups, being much more complicated than I would like this to be -- I came up with this little piece of code:
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_rule.hpp>
#include <boost/spirit/include/classic_utility.hpp>
#include <string>
namespace spirit_classic = boost::spirit::classic;
template< typename Iterator >
static std::string getCodepage( Iterator first, Iterator last )
{
std::string codepage;
spirit_classic::parse(
first,
last,
spirit_classic::as_lower_d[ "codepage" ]
>> spirit_classic::ch_p( '=' )
>> spirit_classic::lexeme_d[ spirit_classic::ch_p( '"' )
>> ( +( spirit_classic::anychar_p - spirit_classic::ch_p( '"' ) ) )[spirit_classic::assign_a( codepage )]
>> spirit_classic::ch_p( '"' ) ]
>> spirit_classic::ch_p( ';' ),
spirit_classic::space_p | spirit_classic::comment_p( '#' )
);
if ( codepage.empty() )
{
codepage = "UTF-8";
}
return codepage;
}
This works pretty well... except for the skipper:
...
spirit_classic::space_p | spirit_classic::comment_p( '#' )
...
This skips whitespaces allright -- but utterly fails at skipping comments (i.e. anything from '#'
to end-of-line), which I understood `comment_p('#') to achieve.
So apparently I have understood something wrong. I just cannot figure out what. Help?
I don't have much insight here, and I'm only testing with MSVC/GCC here, but perhaps the problem is with comment_p
trying to consume eol
(which is eaten by the space_p
skipper instead)?
So either you could use spirit_classic::blank_p
(and be explicit about your eols), or you might have luck reversing the branches of the skipper:
spirit_classic::comment_p( '#' ) | spirit_classic::space_p
See it Live On Coliru:
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/spirit/include/classic_core.hpp>
#include <boost/spirit/include/classic_rule.hpp>
#include <boost/spirit/include/classic_utility.hpp>
#include <string>
namespace spirit_classic = boost::spirit::classic;
template< typename Iterator >
static std::string getCodepage( Iterator first, Iterator last )
{
std::string codepage;
spirit_classic::parse(
first,
last,
spirit_classic::as_lower_d[ "codepage" ]
>> spirit_classic::ch_p( '=' )
>> spirit_classic::lexeme_d[ spirit_classic::ch_p( '"' )
>> ( +( spirit_classic::anychar_p - spirit_classic::ch_p( '"' ) ) )[spirit_classic::assign_a( codepage )]
>> spirit_classic::ch_p( '"' ) ]
>> spirit_classic::ch_p( ';' ),
spirit_classic::comment_p( '#' ) | spirit_classic::space_p
);
if ( codepage.empty() )
{
codepage = "UTF-8";
}
return codepage;
}
int main()
{
std::string input = "# Comment (optional)\n"
"\n"
"\n"
"\n"
"codepage = \"ISO-8859-2\"; \n";
std::cout << getCodepage(input.begin(), input.end());
}
Prints
ISO-8859-2