I have a file containing data on the form:
fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
I want to extract the name of data containers, so I want to retrieve a vector containing in the specific case mand1
, mand2
, julia1
.
I've read the sample about parsing a number list into a vector, but I want to maintain the grammar in a separate file.
I've create a struct representing the grammar, and then I use it in order to parse the string containing data. I would expect an output like
mand1
mand2
julia1
Instead I obtain
mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
My parser recognizes the first fractal
term but then it parses the rest of the file as single string item instead that parse it as I want.
What I'm doing wrong?
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <vector>
#include <iostream>
using boost::spirit::ascii::space;
using boost::spirit::ascii::space_type;
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::lit;
using boost::spirit::qi::lexeme;
using boost::spirit::qi::skip;
using boost::spirit::ascii::char_;
using boost::spirit::ascii::no_case;
using boost::spirit::qi::rule;
typedef std::string::const_iterator sit;
template <typename Iterator>
struct FractalListParser : boost::spirit::qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type> {
FractalListParser() : FractalListParser::base_type(start) {
no_quoted_string %= *(lexeme[+(char_ - '"')]);
start %= *(no_case[lit("fractal")] >> no_quoted_string >> '{' >> *(skip[*(char_)]) >> '}');
}
rule<Iterator, std::string(), space_type> no_quoted_string;
rule<Iterator, std::vector<std::string>(), space_type> start;
};
int main() {
const std::string fractalListFile(R"(
fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
)");
std::cout << "Read Test:" << std::endl;
FractalListParser<sit> parser;
std::vector<std::string> data;
bool r = phrase_parse(fractalListFile.begin(), fractalListFile.end(), parser, space, data);
for (auto& i : data) std::cout << i << std::endl;
return 0;
}
If you use error handling, you'll find that the parse failed, and nothing got effectively parsed:
Output:
Read Test:
Parse success:
----
mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
Remaining unparsed input: 'fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
'
You probably want to ignore the "body" (between {}
). Therefore I suppose you actually wanted to omit
the attribute:
>> '{' >> *(omit[*(char_)]) >> '}'
rather than skip(*char_)
.
The expression *char_
is greedy, and will always match to the end of input... You probably wanted to limit the charset:
in the "name" *~char_("\"{")
to avoid "eating" all of the body as well. To avoid matching spaces use graph
(e.g. +graph - '"'
). In case you want to parse "identifiers" be explicit e.g.
alpha > *(alnum | char_('_'))
in the body *~char_('}')
or *(char_ - '}')
(the latter being less efficient).
The nesting of optional quantifiers is not productive:
*(omit[*(char_)])
Will just have very slow worst-case runtime (because *char_
could be empty, and *(omit[*(char_)])
could also be empty). Say what you mean instead:
omit[*char_]
The simplest way to have a lexeme is to drop the skipper from the rule declaration (see also Boost spirit skipper issues)
Program logic:
Since your sample contains nested blocks (mand2
for example), you need to treat the blocks recursively in order to avoid calling the first }
the end of the outer block:
block = '{' >> -block % (+~char_("{}")) >> '}';
Loose hints:
use BOOST_SPIRIT_DEBUG
to find out where parsing is rejected/matched. E.g. after refactoring the rules a bit:
we got the output (On Coliru):
Read Test:
<start>
<try>fractal mand1 {\n </try>
<no_quoted_string>
<try>mand1 {\n ;lkkj;kj</try>
<success> {\n ;lkkj;kj;\n}\n\n</success>
<attributes>[[m, a, n, d, 1]]</attributes>
</no_quoted_string>
<body>
<try>{\n ;lkkj;kj;\n}\n\nf</try>
<fail/>
</body>
<success>fractal mand1 {\n </success>
<attributes>[[]]</attributes>
</start>
Parse success:
Remaining unparsed input: 'fractal mand1 {
;lkkj;kj;
}
fractal mand2 {
if (...) {
blablah;
}
}
fractal julia1 {
a = ss;
}
'
That output helped me spot that I actually forgot the - '}'
part in the body rule... :)
No need for %=
when there are no semantic actions involved in that rule definition (docs)
you probably want to make sure fractal
is actually a separate word, so you don't match fractalset multi { .... }
With these in place we can have a working demo:
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
template <typename Iterator>
struct FractalListParser : qi::grammar<Iterator, std::vector<std::string>(), qi::space_type> {
FractalListParser() : FractalListParser::base_type(start) {
using namespace qi;
identifier = alpha > *(alnum | char_('_'));
block = '{' >> -block % +~char_("{}") >> '}';
start = *(
no_case["fractal"] >> identifier >> block
);
BOOST_SPIRIT_DEBUG_NODES((start)(block)(identifier))
}
qi::rule<Iterator, std::vector<std::string>(), qi::space_type> start;
// lexemes (just drop the skipper)
qi::rule<Iterator, std::string()> identifier;
qi::rule<Iterator> block; // leaving out the attribute means implicit `omit[]`
};
int main() {
using It = boost::spirit::istream_iterator;
It f(std::cin >> std::noskipws), l;
std::cout << "Read Test:" << std::endl;
FractalListParser<It> parser;
std::vector<std::string> data;
bool r = qi::phrase_parse(f, l, parser, qi::space, data);
if (r) {
std::cout << "Parse success:\n";
for (auto& i : data)
std::cout << "----\n" << i << "\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Prints:
Read Test:
Parse success:
----
mand1
----
mand2
----
julia1