I am interested to use nom parser combinators to recognize identifiers of this kind:
"a"
"a1"
"a_b"
"aA"
"aB_3_1"
The first character of the identifier should be an alphabetic lower cased character then any combination of alphanumeric character and underscore (so [a-zA-Z0-9_]*
) could follow, with the restriction that a double (or more) underscore must not occurred and an underscore must not end the identifier, rejecting those cases:
"Aa"
"aB_"
"a__a"
"_a"
So far I have come with this solution but unsure about correctness of my approach:
pub fn identifier(s: &str) -> IResult<&str, &str> {
let (i, _) = verify(anychar, |c: &char| c.is_lowercase())(s)?;
let (j, _) = alphanumeric0(i)?;
let (k, _) = recognize(opt(many1(preceded(underscore, alphanumeric1))))(j)?;
Ok((k,s))
}
Also I need to wrap around a recognize
this identifier
parser when using it, like this:
pub fn identifier2(s: &str) -> IResult<&str, &str> {
(recognize(identifier))(s)
}
Here's the variant I came up with. It's mostly the same as yours; I made the following changes:
all_consuming
, which ensures that the entire input matches. The bug in your proposed implementation is that "aBa_" would successfully match the identifier "aBa" and leave the trailing "_" unparsed (returning it in the input side).?
statements.many1
to many0_count
, simply because the latter doesn't allocate a vector.pub fn identifier<'a, E: ParseError<&'a str>>(s: &'a str) -> IResult<&'a str, &'a str, E> {
recognize(all_consuming(pair(
verify(anychar, |&c| c.is_lowercase()),
many0_count(preceded(opt(char('_')), alphanumeric1)),
)))(s)
}
This function as written passes all test cases you provided. If you specifically don't want the all_consuming
, perhaps because this is being used as part of a larger set of parsers, you'll have to manually check that the recognized identifier doesn't end in a _
character.