So I have a parser that parses string like 7.5*[someAlphanumStr]
or 7.5[someAlphanumStr]
into this struct:
struct summand {
float factor;
std::string name;
summand(const float & f):factor(f), name(""){}
summand(const std::string & n):factor(1.0f), name(n){}
summand(const float & f, const std::string & n):factor(f), name(n){}
summand():factor(0.0f), name(""){}
};
but in addition i need to be able parse strings like [someAlphanumStr]*7.4
, [someAlphanumStr]5
, 7.4
and [someAlphanumStr]
. In the last two cases(7.4
and [someAlphanumStr]
) i want to set values for fields which are omitted into default values and for this sake i have written for my struct summand
constructors with one argument.
Below is my code and result which it produces:
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
#include <string>
#include <vector>
namespace client
{
namespace spirit = boost::spirit;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
struct summand {
float factor;
std::string name;
summand(const float & f):factor(f), name(""){}
summand(const std::string & n):factor(1.0f), name(n){}
summand(const float & f, const std::string & n):factor(f), name(n){}
summand():factor(0.0f), name(""){}
};
}
BOOST_FUSION_ADAPT_STRUCT(client::summand,
(float, factor)
(std::string, name)
)
namespace client {
template <typename Iterator>
struct summand_parser : qi::grammar<Iterator, summand(), ascii::space_type>
{
summand_parser() : summand_parser::base_type(summand_rule)
{
using namespace ascii;
summand_rule %= (qi::float_ >> -qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']')|('[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -qi::lit('*') >> qi::float_)|(qi::float_)|('[' >> qi::lexeme[alpha >> *alnum] >> ']');
}
qi::rule<Iterator, summand(), ascii::space_type> summand_rule;
};
}
void parseSummandsInto(std::string const& str, client::summand& summands)
{
typedef std::string::const_iterator It;
static const client::summand_parser<It> g;
It iter = str.begin(),
end = str.end();
bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, summands);
if (r && iter == end)
return;
else
throw "Parse failed";
}
int main()
{
std::vector<std::string> inputStrings = {"7.5*[someAlphanumStr]", "7.5[someAlphanumStr]", "[someAlphanumStr]*7.4", "[someAlphanumStr]5", "7.4", "[someAlphanumStr]"};
std::for_each(inputStrings.begin(), inputStrings.end(), [&inputStrings](std::string & inputStr) {
client::summand parsed;
parseSummandsInto(inputStr, parsed);
std::cout << inputStr << " -> " << boost::fusion::as_vector(parsed) << std::endl;
});
}
results (Coliru):
+ clang++ -std=c++11 -O0 -Wall -pedantic main.cpp
+ ./a.out
+ c++filt -t
7.5*[someAlphanumStr] -> (7.5 someAlphanumStr)
7.5[someAlphanumStr] -> (7.5 someAlphanumStr)
[someAlphanumStr]*7.4 -> (115 )
[someAlphanumStr]5 -> (115 )
7.4 -> (7.4 )
[someAlphanumStr] -> (115 omeAlphanumStr)
Thanks to all for clear answers and advices and especially I'm grateful to @sehe.
The way to get anything done with Spirit[1] is to use small steps, simplify rigorously along the way.
Don't live with "cruft" (like, randomly repeated sub expressions). Also, being explicit is good. In this case, I'd start with extracting the repeated sub-expressions and reformatting for legibility:
name_rule = '[' >> qi::lexeme[alpha >> *alnum] >> ']';
factor_rule = qi::float_;
summand_rule %=
(factor_rule >> -qi::lit('*') >> name_rule)
| (name_rule >> -qi::lit('*') >> factor_rule)
| (factor_rule)
| (name_rule)
;
There, much better already, and I haven't changed a thing. But wait! It doesn't compile anymore
qi::rule<Iterator, std::string(), ascii::space_type> name_rule;
qi::rule<Iterator, float(), ascii::space_type> factor_rule;
It turns out that the grammar only "happened" to compile because Spirit's Attribute compatibility rules are so lax/permissive that the characters matched for the name were just being assigned to the factor part (that's where 115
came from: 0x73 is ASCII for s
from someAlphanumStr
).
OOPS/TL;DW I had quite a lenghty analysis write up here, once, but I clobbered it by closing my browser and SO had only an old draft cached server-side :( I'll boil it down to the bottomline now:
Guideline Use either constructor overloads to assign to your exposed attribute type, or use Fusion Sequence adaptation, but don't mix the two: they will interfere in surprising/annoying ways.
Don't worry, I won't let you go empty handed, of course. I'd just 'manually' direct the factor
and name
components in their respective 'slots' (members)[2].
Inherited attributes are a sweet way to have keep this legible and convenient:
// assuming the above rules redefined to take ("inherit") a summand& attribute:
qi::rule<Iterator, void(summand&), ascii::space_type> name_rule, factor_rule;
Just add a simple assignment in the semantic action:
name_rule = as_string [ '[' >> lexeme[alpha >> *alnum] >> ']' ]
[ _name = _1 ];
factor_rule = double_ [ _factor = _1 ];
Now, the 'magic dust' is of course in how the _name
and _factor
actors are defined. I prefer using binds for this, over phx::at_c<N>
due to maintenance costs:
static const auto _factor = phx::bind(&summand::factor, qi::_r1);
static const auto _name = phx::bind(&summand::name, qi::_r1);
See? That's pretty succinct and clearly shows what is happening. Also, there's no actual need to have Fusion adaptation for summand
here.
Now, finally, we can simplify the main rule as well:
summand_rule =
factor_rule (_val) >> - ( -lit('*') >> name_rule (_val) )
| name_rule (_val) >> - ( -lit('*') >> factor_rule (_val) )
;
What this does, is simply combine the single-component branches into the dual-component branches by making the trailing part optional.
Note how the summand
default constructor takes care of the default values:
struct summand {
float factor;
std::string name;
summand() : factor(1.f), name("") {}
};
Notice how this removed quite some complexity there.
See the fully adapted sample running Live on Coliru which prints:
7.5*[someAlphanumStr] -> (7.5 someAlphanumStr)
7.5[someAlphanumStr] -> (7.5 someAlphanumStr)
[someAlphanumStr]*7.4 -> (7.4 someAlphanumStr)
[someAlphanumStr]5 -> (5 someAlphanumStr)
7.4 -> (7.4 )
[someAlphanumStr] -> (1 someAlphanumStr)
#define BOOST_SPIRIT_USE_PHOENIX_V3
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace client {
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
namespace ascii = boost::spirit::ascii;
struct summand {
float factor;
std::string name;
summand() : factor(1.f), name("") {}
};
}
namespace client {
template <typename Iterator>
struct summand_parser : qi::grammar<Iterator, summand(), ascii::space_type>
{
summand_parser() : summand_parser::base_type(summand_rule)
{
using namespace ascii;
static const auto _factor = phx::bind(&summand::factor, qi::_r1);
static const auto _name = phx::bind(&summand::name, qi::_r1);
name_rule = qi::as_string [ '[' >> qi::lexeme[alpha >> *alnum] >> ']' ]
[ _name = qi::_1 ] ;
factor_rule = qi::double_ [ _factor = qi::_1 ] ;
summand_rule =
factor_rule (qi::_val) >> - ( -qi::lit('*') >> name_rule (qi::_val) )
| name_rule (qi::_val) >> - ( -qi::lit('*') >> factor_rule (qi::_val) )
;
BOOST_SPIRIT_DEBUG_NODES((summand_rule)(name_rule)(factor_rule))
}
qi::rule<Iterator, void(summand&), ascii::space_type> name_rule, factor_rule;
qi::rule<Iterator, summand(), ascii::space_type> summand_rule;
};
}
bool parseSummandsInto(std::string const& str, client::summand& summand)
{
typedef std::string::const_iterator It;
static const client::summand_parser<It> g;
It iter(str.begin()), end(str.end());
bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, summand);
return (r && iter == end);
}
int main()
{
std::vector<std::string> inputStrings = {
"7.5*[someAlphanumStr]",
"7.5[someAlphanumStr]",
"[someAlphanumStr]*7.4",
"[someAlphanumStr]5",
"7.4",
"[someAlphanumStr]",
};
std::for_each(inputStrings.begin(), inputStrings.end(), [&inputStrings](std::string const& inputStr) {
client::summand parsed;
if (parseSummandsInto(inputStr, parsed))
std::cout << inputStr << " -> (" << parsed.factor << " " << parsed.name << ")\n";
else
std::cout << inputStr << " -> FAILED\n";
});
}
[1] And arguably, anything else in technology
[2] You can keep the FUSION_ADAPT_STRUCT but it's no longer required as you can see