So the objective is to not tolerate characters from 80h through FFh in the input string. I was under the impression that
using ascii::char_;
would take care of this. But as you can see in the example code it will happily print Parsing succeeded.
In the following Spirit mailing list post, Joel suggested to let parse to fail on these non-ascii characters. But I'm not sure whether he proceeded in doing so. [Spirit-general] ascii encoding assert on invalid input ...
Here my example code:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
namespace client::parser
{
namespace x3 = boost::spirit::x3;
namespace ascii = boost::spirit::x3::ascii;
using ascii::char_;
using ascii::space;
using x3::lexeme;
using x3::skip;
const auto quoted_string = lexeme[char_('"') >> *(char_ - '"') >> char_('"')];
const auto entry_point = skip(space) [ quoted_string ];
}
int main()
{
for(std::string const input : { "\"naughty \x80" "bla bla bla\"" }) {
std::string output;
if (parse(input.begin(), input.end(), client::parser::entry_point, output)) {
std::cout << "Parsing succeeded\n";
std::cout << "input: " << input << "\n";
std::cout << "output: " << output << "\n";
} else {
std::cout << "Parsing failed\n";
}
}
}
How can I change the example to have Spirit to fail on this invalid input?
Furthermore, but very related, I would like to know how I should use the character parser that defines a char_set encoding. You know char_(charset)
from X3 docs: Character Parsers develop branch.
The documentation is lacking so strongly to describe the basic functionality. Why can't the boost top level people force library authors to come with documentation at least on the level of cppreference.com?
Nothing bad about the docs here. It's just a library bug.
Where the code for any_char
says:
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) || encoding::ischar(ch_));
}
It should have said
template <typename Char, typename Context>
bool test(Char ch_, Context const&) const
{
return ((sizeof(Char) <= sizeof(char_type)) && encoding::ischar(ch_));
}
That makes your program behave as expected and required. That behaviour also matches the Qi behaviour:
#include <boost/spirit/include/qi.hpp>
int main() {
namespace qi = boost::spirit::qi;
char const* input = "\x80";
assert(!qi::parse(input, input+1, qi::ascii::char_));
}
Filed a bug here: https://github.com/boostorg/spirit/issues/520