Search code examples
rustparser-generatornom

How to distinguish between minus sign and negative number in nom?


Using the parser generator nom, how can I write a parser which extracts the difference of the minus sign in the terms 1-2 and 1*-2 ?

In the first example, I expect the tokens 1, - and 2. In the second the "minus" sign specifies the number being negative. The expected tokens are 1, * and -2. Not 1, *, - and 2.

How can I make nom stateful, with user-defined states such as expect_literal: bool?


Solution

  • The best solution I found for now is using nom_locate with a span defined as

    use nom_locate::LocatedSpanEx;
    
    #[derive(Clone, PartialEq, Debug)]
    struct LexState {
        pub accept_literal: bool,
    }
    
    type Span<'a> = LocatedSpanEx<&'a str, LexState>;
    

    Then you can modify the state via

    fn set_accept_literal(
        value: bool,
        code: IResult<Span, TokenPayload>,
    ) -> IResult<Span, TokenPayload> {
        match code {
            Ok(mut span) => {
                span.0.extra.accept_literal = value;
                Ok(span)
            }
            _ => code,
        }
    }
    

    where TokenPayload is an enum representing my token content.

    Now you can write the operator parser:

    fn mathematical_operators(code: Span) -> IResult<Span, TokenPayload> {
        set_accept_literal(
            true,
            alt((
                map(tag("*"), |_| TokenPayload::Multiply),
                map(tag("/"), |_| TokenPayload::Divide),
                map(tag("+"), |_| TokenPayload::Add),
                map(tag("-"), |_| TokenPayload::Subtract),
                map(tag("%"), |_| TokenPayload::Remainder),
            ))(code),
        )
    }
    

    And the integer parser as:

    fn parse_integer(code: Span) -> IResult<Span, TokenPayload> {
        let chars = "1234567890";
        // Sign ?
        let (code, sign) = opt(tag("-"))(code)?;
        let sign = sign.is_some();
        if sign && !code.extra.accept_literal {
            return Err(nom::Err::Error((code, ErrorKind::IsNot)));
        }
        let (code, slice) = take_while(move |c| chars.contains(c))(code)?;
        match slice.fragment.parse::<i32>() {
            Ok(value) => set_accept_literal(
                false,
                Ok((code, TokenPayload::Int32(if sign { -value } else { value }))),
            ),
            Err(_) => Err(nom::Err::Error((code, ErrorKind::Tag))),
        }
    }
    

    This might not win a beauty contest but it works. The remaining pieces should be trivial.