I'm triying to write a parser for javascript identifiers so far this is what I have:
// All this rules have string as attribute.
identifier_ = identifier_start
>>
*(
identifier_part >> -(qi::char_(".") > identifier_part)
)
;
identifier_part = +(qi::alnum | qi::char_("_"));
identifier_start = qi::char_("a-zA-Z$_");
This parser work fine for the list of "good identifiers" in my tests:
"x__",
"__xyz",
"_",
"$",
"foo4_.bar_3",
"$foo.bar",
"$foo",
"_foo_bar.foo",
"_foo____bar.foo"
but I'm having trouble with one of the bad identifiers: foo$bar
. This is supposed to fail, but it success!! And the sintetized attribute has the value "foo"
.
Here is the debug ouput for foo$bar
:
<identifier_>
<try>foo$bar</try>
<identifier_start>
<try>foo$bar</try>
<success>oo$bar</success>
<attributes>[[f]]</attributes>
</identifier_start>
<identifier_part>
<try>oo$bar</try>
<success>$bar</success>
<attributes>[[f, o, o]]</attributes>
</identifier_part>
<identifier_part>
<try>$bar</try>
<fail/>
</identifier_part>
<success>$bar</success>
<attributes>[[f, o, o]]</attributes>
</identifier_>
What I want is to the parser fails when parsing foo$bar
but not when parsing $foobar
.
What I'm missing?
You don't require that the parser needs to consume all input.
When a rule stops matching before the $
sign, it returns with success, because nothing says it can't be followed by a $
sign. So, you would like to assert that it isn't followed by a character that could be part of an identifier:
identifier_ = identifier_start
>>
*(
identifier_part >> -(qi::char_(".") > identifier_part)
) >> !identifier_start
;
A related directive is distinct
from the Qi repository: http://www.boost.org/doc/libs/1_55_0/libs/spirit/repository/doc/html/spirit_repository/qi_components/directives/distinct.html