Search code examples
c++boost-spirit

boost spirit wide char rule create null char´s


With this rule

name_valid %= (lexeme[+(boost::spirit::standard_wide::alpha | lit('_'))]);

of type

typedef qi::rule<Iterator, std::wstring()> name_valid;

Running in debug mode all is fine. name_valid contains the correct string. When going to release mode in VC2017 I got a NUL char on such inputs

Input  : a_b  
Output : a(NULL)b

I found out that I have to rewrite the rule like this. Can´t see a lit as wide char operation. Do I miss something here?

 name_valid %= +(boost::spirit::standard_wide::alpha | wide::char_(L'_'));

Solution

  • i found out that i have to rewrite the rule like this

    Well, if the goal was to match '_' as part of a name, then you NEED to write that anyway. Because +(alpha | '_') exposes an attribute which is the character sequence of all alpha characters, but not '_' since literals do not expose an attribute.

    Can´t see a lit as wide char operation.

    That's qi::lit(L'_')

    Do i miss something here

    What I think is happening is that alpha|'_' synthesizes an optional<char>. Apparently, the propagation rules are so relaxed that optional<char> can be assigned to char through its conversion-to-bool operation (resulting in a NUL characater). Wide characters have nothing to do with it:

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    namespace qi = boost::spirit::qi;
    namespace enc = boost::spirit::standard;
    
    int main() {
        std::string const input = "A.B";
        auto f = input.begin(), l = input.end();
    
        std::string output;
        if (qi::parse(f, l, +(enc::alpha | '.'), output)) {
            std::cout << "Parsed: '" << output << "'\n";
        } else {
            std::cout << "Failed\n";
        }
    
        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
    

    Prints

    00000000: 5061 7273 6564 3a20 2741 0042 270a       Parsed: 'A.B'.
    

    Testing The Hypothesis:

    Splitting it up in separate rules makes it visible: Live On Coliru

    qi::rule<It, char()> c = enc::alpha | '.';
    qi::rule<It, std::string()> s = +c;
    BOOST_SPIRIT_DEBUG_NODES((s)(c))
    

    Prints

    <s>
      <try>A.B</try>
      <c>
        <try>A.B</try>
        <success>.B</success>
        <attributes>[A]</attributes>
      </c>
      <c>
        <try>.B</try>
        <success>B</success>
        <attributes>[NUL]</attributes>
      </c>
      <c>
        <try>B</try>
        <success></success>
        <attributes>[B]</attributes>
      </c>
      <c>
        <try></try>
        <fail/>
      </c>
      <success></success>
      <attributes>[[A, NUL, B]]</attributes>
    </s>
    

    This highlights that the char exposed by c indeed becomes the NUL char. The following, however, makes clear that wasn't completely intentional: Live On Coliru

    qi::rule<It, boost::optional<char>()> c = enc::alpha | '.';
    qi::rule<It, std::string()> s = +c;
    BOOST_SPIRIT_DEBUG_NODES((s)(c))
    

    which will abort with an assertion:

    sotest: /home/sehe/custom/boost_1_65_0/boost/optional/optional.hpp:1106: boost::optional::reference_const_type boost::optional<char>::get() const [T = char]: Assertion `this->is_initialized()' failed.
    

    Out of curiosity: this fixes it: Live On Coliru

    qi::rule<It, std::string()> c = enc::alpha | '.';
    qi::rule<It, std::string()> s = +c;
    

    Prints

    Parsed: 'AB'
    

    fully as expected

    Summary

    Automatic attribute propagation rules are powerful, but can be surprising.

    Don't play fast and loose with attribute compatibility: say what you mean. In your case alpha | char_('_') is conceptually the only thing that SHOULD do what you expect.