For my express parser project i would like to use CSV like escaping: ""
to escape "
Examples:
"\"hello\"",
" \" hello \" ",
" \" hello \"\"stranger\"\" \" ",
online compile&try: https://wandbox.org/permlink/5uchQM8guIN1k7aR
my current parsing rule only parses the first 2 tests
qi::rule<std::string::const_iterator, qi::blank_type, utree()> double_quoted_string
= '"' >> qi::no_skip[+~qi::char_('"')] >> '"';
i've found this stackoverflow question and one answer is given using spirit:
How can I read and parse CSV files in C++?
start = field % ',';
field = escaped | non_escaped;
escaped = lexeme['"' >> *( char_ -(char_('"') | ',') | COMMA | DDQUOTE) >> '"'];
non_escaped = lexeme[ *( char_ -(char_('"') | ',') ) ];
DDQUOTE = lit("\"\"") [_val = '"'];
COMMA = lit(",") [_val = ','];
(i don't know how to link answers, so if interesed search for "You gotta feel proud when you use something so beautiful as boost::spirit")
sadly it does not compile for me - and even years of C++ error msg analysis didn't prepared me for spirit error msg floods :)
and if i understand it correct the rule will wait for ,
as a string delimiter, what is maybe not the correct thing for my expression parser project
expression = "strlen( \"hello \"\"you\"\" \" )+1";
expression = "\"hello \"";
expression = "strlen(concat(\"hello\",\"you\")+3";
or do the rule need to wait optionally for ,
and )
in this case?
i hope i don't ask too many silly questions but the answers help me alot to get into spirit the expression parse itself is nearly working except string escaping
thx for any help
UPDATE: this seems to work for me, at least it parses the strings
but removes the escaped "
from the string, is there a better debug output available for strings? " " " " "h" "e" "l" "l" "o" " " "s" "t" "r" "a" "n" "g" "e" "r" " "
isn't really that readable
qi::rule<std::string::const_iterator, utree()> double_quoted_string
= qi::lexeme['"' >> *(qi::char_ - (qi::char_('"')) | qi::lit("\"\"")) >> '"'];
You can simplify the question down to this. How to make a double-quoted string accept "double double quotes" to escape an embedded double-quote character?
A simple string parser without escapes:
qi::rule<It, std::string()> s = '"' >> *~qi::char_('"') >> '"';
Now, to also accept the single escaped "
as desired, simply add:
s = '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';
no_skip
is sloppy: it would parse "foo bar"
and " foo bar "
to foo bar
(trimming the whitespace).. Instead, drop the skipper from the rule to make it implicitly lexeme (again).#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <iomanip>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
namespace fu = boost::fusion;
int main()
{
auto tests = std::vector<std::string>{
R"( "hello" )",
R"( " hello " )",
R"( " hello ""escaped"" " )",
};
for (const std::string& str : tests) {
auto iter = str.begin(), end = str.end();
qi::rule<std::string::const_iterator, std::string()> double_quoted_string
= '"' >> *("\"\"" >> qi::attr('"') | ~qi::char_('"')) >> '"';
std::string ut;
bool r = qi::phrase_parse(iter, end, double_quoted_string >> qi::eoi, qi::blank, ut);
std::cout << str << " ";
if (r) {
std::cout << "OK: " << std::quoted(ut, '\'') << "\n";
}
else {
std::cout << "Failed\n";
}
if (iter != end) {
std::cout << "Remaining unparsed: " << std::quoted(std::string(iter, end)) << "\n";
}
std::cout << "----\n";
}
}
Prints
"hello" OK: 'hello'
----
" hello " OK: ' hello '
----
" hello ""escaped"" " OK: ' hello "escaped" '
----