Search code examples
rustnom

Parsing single-quoted string with escaped quotes with Nom 5


I'm new to Rust and Nom and I'm trying to parse a (single) quoted string which may contain escaped quotes, e.g. 'foo\' 🤖 bar' or 'λx → x', '' or ' '.

I found the escaped! macro, whose documentation says:

The first argument matches the normal characters (it must not accept the control character), the second argument is the control character (like \ in most languages), the third argument matches the escaped characters

Since I want to match anything but a backslash in the matcher for “normal characters”, I tried using take_till!:

    named!(till_backslash<&str, &str>, take_till!(|ch| ch == '\\'));
    named!(esc<&str, &str>, escaped!(call!(till_backslash), '\\', one_of!("'n\\")));

    let (input, _) = nom::character::complete::char('\'')(input)?;
    let (input, value) = esc(input)?;
    let (input, _) = nom::character::complete::char('\'')(input)?;

    // … use `value`

However, when trying to parse 'x', this returns Err(Incomplete(Size(1))). When searching for this, people generally recommend using CompleteStr, but that's not in Nom 5. What's the correct way to approach this problem?


Solution

  • When operating in the so-called streaming mode, nom may returns Incomplete to indicate that it can't decide and needs more data. The nom 4 introduced CompleteStr. Alongside with CompleteByteSlice, they were complete input counterpart of &str and &[u8]. The parsers taken them as input work in complete mode.

    They are gone in nom 5. In nom 5, macro based parsers always work in streaming mode as you've observed. For parser combinators that would work differently in streaming and complete mode, there are different versions of them in separate sub-modules, such as nom::bytes::streaming and nom::bytes::complete.

    For all these gory details you may want to check out this blog post, especially the section Streaming VS complete parsers.

    Also, the function combinators are preferred over the macro ones in nom 5. Here is one way to do it:

    //# nom = "5.0.1"
    use nom::{
        branch::alt,
        bytes::complete::{escaped, tag},
        character::complete::none_of,
        sequence::delimited,
        IResult,
    };
    
    fn main() {
        let (_, res) = parse_quoted(r#"'foo\' 🤖 bar'"#).unwrap();
        assert_eq!(res, r#"foo\' 🤖 bar"#);
        let (_, res) = parse_quoted("'λx → x'").unwrap();
        assert_eq!(res, "λx → x");
        let (_, res) = parse_quoted("'  '").unwrap();
        assert_eq!(res, "  ");
        let (_, res) = parse_quoted("''").unwrap();
        assert_eq!(res, "");
    }
    
    fn parse_quoted(input: &str) -> IResult<&str, &str> {
        let esc = escaped(none_of("\\\'"), '\\', tag("'"));
        let esc_or_empty = alt((esc, tag("")));
        let res = delimited(tag("'"), esc_or_empty, tag("'"))(input)?;
    
        Ok(res)
    }