I want to parse recursive enum representing arithmetic expressions using Nom. I have the following code.
#[derive(Debug, PartialEq, Eq)]
enum ArithExp {
Const(i32),
Add(Rc<ArithExp>, Rc<ArithExp>),
}
#[allow(dead_code)]
fn parse_ae_const(input: &str) -> IResult<&str, ArithExp> {
let (input, (numb, _)) = pair(digit1, multispace0)(input)?;
println!("ok: {}", numb);
Ok((input, ArithExp::Const(numb.parse::<i32>().unwrap())))
}
#[allow(dead_code)]
fn parse_ae_add(input: &str) -> IResult<&str, ArithExp> {
let (input, (exp1, exp2)) = terminated(
separated_pair(parse_arithexp, tag(" + "), parse_arithexp),
multispace0,
)(input)?;
Ok((input, ArithExp::Add(Rc::new(exp1), Rc::new(exp2))))
}
#[allow(dead_code)]
fn parse_arithexp(input: &str) -> IResult<&str, ArithExp> {
alt((
parse_ae_add,
parse_ae_const,
))(input)
}
#[test]
fn arithexp1() {
assert_eq!(parse_ae_const("3"), Ok(("", ArithExp::Const(3))));
}
#[test]
fn arithexp2() {
assert_eq!(parse_arithexp("3"), Ok(("", ArithExp::Const(3))));
}
arithexp1
passes just fine, arithexp2
causes stack overflow on the other hand:
thread 'parser::tests::arithexp2' has overflowed its stack
fatal runtime error: stack overflow
It looks that parse_ae_add
causes a problem what may indicate that I'm parsing recursive enum incorrectly but I have literally no idea how to fix it.
As kmdreko's comment outlines, this is because parse_arithexp
calls parse_ae_add
, which immediately calls parse_arithexp
on the same input. This is called left-recursion, and is something that cannot be directly evaluated by recursive descent parsers (like this one). Instead, you must add a 'base case': instead of trying to parse ae_add -> ae_add " + " ae_add
directly, you can parse ae_add -> ae_const " + " ae_add
and achieve the same thing:
#[allow(dead_code)]
fn parse_ae_add(input: &str) -> IResult<&str, ArithExp> {
let (input, (exp1, exp2)) = terminated(
separated_pair(parse_ae_const, tag("+ "), parse_arithexp),
multispace0,
)(input)?;
Ok((input, ArithExp::Add(Rc::new(exp1), Rc::new(exp2))))
}
(Note: tag(" + ")
was changed to tag("+ ")
, because the parse_ae_const
parser consumes any whitespace after it.)