I have a little grammar that I want to use for a work project. A minimum executable example is:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi_grammar.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#pragma GCC diagnostic pop // pops
#include <iostream>
int main()
{
typedef unsigned long long ull;
std::string curline = "1;2;;3,4;5";
std::cout << "parsing: " << curline << "\n";
namespace qi = boost::spirit::qi;
auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
auto match_type_res = ids % ';' ;
std::vector<std::vector<ull> > r;
qi::parse(curline.begin(), curline.end(), match_type_res, r);
std::cout << "got: ";
for (auto v: r){
for (auto i : v)
std::cout << i << ",";
std::cout << ";";
}
std::cout <<"\n";
}
On my personal machine this produces the correct output: parsing: 1;2;;3,4;5 got: 1,;2,;;3,4,;5,;
But at work it produces: parsing: 1;2;;3,4;5 got: 1,;2,;;3,
In other words, it fails to parse the vector of long integers as soon as there's more than one element in it.
Now, I have identified that the work system is using boost 1.56, while my private computer is using 1.57. Is this the cause?
Knowning we have some real spirit experts here on stack overflow, I was hoping someone might know where this issue is coming from, or can at least narrow down the number of things I need to check.
If the boost version is the problem, I can probably convince the company to upgrade, but a workaround would be welcome in any case.
You're invoking Undefined Behaviour in your code.
Specifically where you use auto
to store a parser expression. The Expression Template contains references to temporaries that become dangling at the end of the containing full-expression¹.
UB implies that anything can happen. Both compilers are right! And the best part is, you will probably see varying behaviour depending on the compiler flags used.
Fix it either by using:
qi::copy
(or boost::proto::deep_copy
before v.1.55 IIRC)BOOST_SPIRIT_AUTO
instead of BOOST_AUTO
(mostly helpful iff you also support C++03)use qi::rule<>
and qi::grammar<>
(the non-terminals) to type-erase and the expressions. This has performance impact too, but also gives more features like
lexeme[]
(see here)Note also that Spirit X3 promises to drop there restrictions on the use with auto. It's basically a whole lot more lightweight due to the use of c++14 features. Keep in mind that it's not stable yet.
Showing that GCC with -O2 shows undefined results; Live On Coliru
The fixed version:
//#pragma GCC diagnostic push
//#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
//#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
//#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi.hpp>
//#pragma GCC diagnostic pop // pops
#include <iostream>
int main() {
typedef unsigned long long ull;
std::string const curline = "1;2;;3,4;5";
std::cout << "parsing: '" << curline << "'\n";
namespace qi = boost::spirit::qi;
#if 0 // THIS IS UNDEFINED BEHAVIOUR:
auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
auto grammar = ids % ';';
#else // THIS IS CORRECT:
auto ids = qi::copy(-qi::ulong_long % ','); // '-' allows for empty vecs.
auto grammar = qi::copy(ids % ';');
#endif
std::vector<std::vector<ull> > r;
qi::parse(curline.begin(), curline.end(), grammar, r);
std::cout << "got: ";
for (auto v: r){
for (auto i : v)
std::cout << i << ",";
std::cout << ";";
}
std::cout <<"\n";
}
Printing (also with GCC -O2!):
parsing: '1;2;;3,4;5'
got: 1,;2,;;3,4,;5,;
¹ (that's basically "at the next semicolon" here; but in standardese)