I am not sure If I'm thinking wrong about the whole thing. Maybe there is a simpler solution.
In nom I want to parse C-style single line comments. Each line that I parse could theoretically contain a "// some comment" on the right side. I wrote a parser that can Parse these comments:
pub fn parse_single_line_comments(i: &str) -> IResult<&str, &str> {
recognize(pair(tag("//"), is_not("\n\r")))(i)
}
It works in the case of a comment being present. But unfortunately if there is no comment it returns an error. Now I would like it to return an empty String instead (or later I could return an option, which would be more elegant). In my nom-learning I had this problem quite often - that I want to replace an error with a custom OK-variant. But I am never sure If I did it in the "right" way i.e. the idiomatic way of nom/rust. It always felt ugly as I was matching the return value of the parsing function. Think of it like this:
pub fn parse_single_line_comments(i: &str) -> IResult<&str, &str> {
match recognize(pair(tag("//"), is_not("\n\r")))(i) {
Ok((rest, comment)) => Ok((rest, comment)),
_ => Ok((i, "")),
}
It looks kind of strange to me. There should be a better way to do this, right?
You already hinted a bit at it yourself. You could use opt
ional to parse zero-or-one line comments, or many0
to parse zero-to-many. Then combine that with preceded
, and you can easily discard zero-to-many comments (and whitespace).
Let's consider a simple parse_ident
to parse identifiers, that looks like this:
use nom::bytes::complete::take_while1;
use nom::{AsChar, IResult};
fn parse_ident(input: &str) -> IResult<&str, &str> {
take_while1(|c: char| c.is_alpha() || (c == '_'))(input)
}
Now, again, let's say we want to skip zero-to-many whitespace and comments beforehand. First we can define our line comment parser (which you already did):
fn parse_single_line_comment(input: &str) -> IResult<&str, &str> {
recognize(pair(tag("//"), is_not("\n\r")))(input)
}
Now we'll change parse_ident
to use preceded
and many0
to skip zero-to-many line comments. Additionally, we can also throw in multispace1
to skip zero-to-many whitespace as well:
use nom::branch::alt;
use nom::bytes::complete::{is_not, tag, take_while1};
use nom::character::complete::multispace1;
use nom::combinator::recognize;
use nom::multi::many0;
use nom::sequence::{pair, preceded};
use nom::{AsChar, IResult};
fn parse_ident(input: &str) -> IResult<&str, &str> {
preceded(
// Parsers to skip anything that is ignored
many0(alt((
parse_single_line_comment,
multispace1,
))),
// Identifier parsing
take_while1(|c: char| c.is_alpha() || (c == '_')),
)(input)
}
Which now allows us to successfully parse the following:
assert_eq!(
parse_ident("identifier")
Ok(("", "identifier"))
);
assert_eq!(
parse_ident(" identifier"),
Ok(("", "identifier"))
);
assert_eq!(
parse_ident("// Comment\n identifier"),
Ok(("", "identifier"))
);
assert_eq!(
parse_ident("// Comment\n// Comment\n identifier"),
Ok(("", "identifier"))
);
Depending on what you're parsing, then you'll need to sprinkle that preceded
in various parsers. We can simplify the duplicate code a bit, by introducing our own skip_ignored
parser:
fn skip_ignored<'a, F>(parser: F) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str>
where
F: FnMut(&'a str) -> IResult<&'a str, &'a str>,
{
preceded(
many0(alt((
parse_single_line_comment,
multispace1,
))),
parser,
)
}
fn parse_ident(input: &str) -> IResult<&str, &str> {
skip_ignored(
take_while1(|c: char| c.is_alpha() || (c == '_')),
)(input)
}
Whether there's easier ways to do this highly depends on your data. But as long as you simply want to discard the whitespace and comments, then it's relatively straight-forward.
Since you actually asked about custom errors, then you can define your own enum
as you otherwise would, and then impl ParseError
:
use nom::error::{ErrorKind, ParseError};
#[derive(Debug)]
pub enum MyParseError<'a> {
IdentTooLong,
Nom(&'a str, ErrorKind),
}
impl<'a> ParseError<&'a str> for MyParseError<'a> {
fn from_error_kind(input: &'a str, kind: ErrorKind) -> Self {
Self::Nom(input, kind)
}
fn append(_: &'a str, _: ErrorKind, other: Self) -> Self {
other
}
}
Using it could look like this:
use nom::bytes::complete::take_while1;
use nom::{AsChar, IResult};
fn parse_ident<'a>(input: &'a str) -> IResult<&'a str, &'a str, MyParseError<'a>> {
let (input, ident) = take_while1(|c: char| c.is_alpha() || (c == '_'))(input)?;
// Return error if identifier is longer than 10 bytes
if ident.len() > 10 {
Err(nom::Err::Failure(MyParseError::IdentTooLong))
} else {
Ok((input, ident))
}
}
fn main() {
println!("{:?}", parse_ident(""));
// Err(Error(Nom("", TakeWhile1)))
println!("{:?}", parse_ident("hello"));
// Ok(("hello", "hello"))
println!("{:?}", parse_ident("this_is_a_very_long_name"));
// Err(Failure(IdentTooLong))
}
There's also FromExternalError
, which works hand-in-hand with map_res
. This is useful if say you want to call str::parse()
and be able to easy map it into your MyParseError
.
See also: