Search code examples
parsingrustnom

Nom 7 doesn't backtrack when `alt` branch fails to parse


I am encountering some odd behavior in Nom 7 that I don't understand. I was under the impression that a failed parse in a branch of alt would backtrack and try the next branch of the alt, but this seems not to be the case.

In my case, I am trying to parse text of the form {stuff between braces}, where if the stuff is a valid identifier (alphanumerics and underscores), I return Ok(Good(stuff)), if it's not a valid identifier I return Ok(Bad(stuff)), and if the text isn't a bunch of stuff between curly braces than a nom::Err is returned.

use nom::{
    branch::alt,
    bytes::complete::is_not,
    character::complete::{char, multispace0, satisfy},
    combinator::{map, opt, recognize},
    multi::many0_count,
    sequence::{delimited, pair},
    IResult,
};

#[derive(Debug)]
enum Res<'a> {
    Good(&'a str),
    Bad(&'a str),
}

fn ident(input: &str) -> IResult<&str, &str> {
    recognize(pair(
        satisfy(|c| c.is_alphabetic() || c == '_'),
        many0_count(satisfy(|c| c.is_alphanumeric() || c == '_')),
    ))(input)
}

fn brace_expr(input: &str) -> IResult<&str, Res<'_>> {
    delimited(
        char('{'),
        alt((
            // Try to parse an identifier optionally surrounded by whitespace
            delimited(multispace0, map(ident, Res::Good), multispace0),
            // Otherwise, just take everything between the braces
            map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
        )),
        char('}'),
    )(input)
}

fn main() {
    println!("1. {:?}", brace_expr("{}"));
    println!("2. {:?}", brace_expr("{a}"));
    println!("3. {:?}", brace_expr("{ a }"));
    println!("4. {:?}", brace_expr("{?a}"));
    println!("5. {:?}", brace_expr("{ ?a }"));
    println!("6. {:?}", brace_expr("{a?}"));
    println!("7. {:?}", brace_expr("{ a? }"));
}

Output

1. Ok(("", Bad("")))
2. Ok(("", Good("a")))
3. Ok(("", Good("a")))
4. Ok(("", Bad("?a")))
5. Ok(("", Bad(" ?a ")))
6. Err(Error(Error { input: "?}", code: Char }))
7. Err(Error(Error { input: "? }", code: Char }))

I understand 1-5. They attempted to parse delimited(multispace0, map(ident, Res::Good), multispace0), and either succeeded (#2, #3), or failed (#1, #4, #5) and moved onto the second branch of the alt, but in either case the alt succeeds and we get an Ok.

But I do not understand #6 and #7. When the first branch of the alt fails, instead of moving onto the second, they fail outright, resulting in an Err. Doesn't this contradict the fact that alt is supposed to try the next branch in the case of failure?

The only thing that seems different about #6 and #7 is that they succeed at parsing some ident before failing, whereas the Ok(Bad) ones never succeed at parsing an ident. But that shouldn't affect whether alt tries the next branch, should it?

EDIT:

If I “un-factor-out” the chars, then it works:

fn brace_expr2(input: &str) -> IResult<&str, Res<'_>> {
    alt((
        delimited(
            char('{'),
            delimited(multispace0, map(ident, Res::Good), multispace0),
            char('}'),
        ),
        delimited(
            char('{'),
            map(opt(is_not("}")), |s| Res::Bad(s.unwrap_or_default())),
            char('}'),
        ),
    ))(input)
}

fn main() {
    println!("1. {:?}", brace_expr2("{}"));
    println!("2. {:?}", brace_expr2("{a}"));
    println!("3. {:?}", brace_expr2("{ a }"));
    println!("4. {:?}", brace_expr2("{?a}"));
    println!("5. {:?}", brace_expr2("{ ?a }"));
    println!("6. {:?}", brace_expr2("{a?}"));
    println!("7. {:?}", brace_expr2("{ a? }"));
}
1. Ok(("", Bad("")))
2. Ok(("", Good("a")))
3. Ok(("", Good("a")))
4. Ok(("", Bad("?a")))
5. Ok(("", Bad(" ?a ")))
6. Ok(("", Bad("a?")))
7. Ok(("", Bad(" a? ")))

I still don't understand why alt doesn't work in the first case, but does in the second case.


Solution

  • That's a fun one!

    Your first alt-arm eats up, for example 6 and 7 "a" resp " a" and not, then expects the character "}". But it finds "?" resp "? ".

    The second alt-arm is never reached in 6 and 7.

    To solve this, in the manner you provided your solution, you need to throw an error when ident encounters anything to it's disliking. I guess peek() is an order to see if "}" comes a long, meaning it's really good.

    But, IMHO, I thin take_while might be better fit.