I have a struct that holds a list of words and a list of numbers
struct NumbersAndWords {
pub words: Vec<String>,
// ideally numbers would be Vec<i32>
pub numbers: Vec<String>,
}
impl NumbersAndWords {
pub fn insert_number(&mut self, num: &str) {
self.numbers.push(String::from(num));
}
pub fn insert_word(&mut self, word: &str) {
self.words.push(String::from(word));
}
}
Now I want to parse a piece of text that holds words and numbers e.g. "monkey 10 apple tree 20"
. I want to use the alt
-combinator to parse either a word using alpha1
or a number using digit1
. Depending on which parser is used I want to add the parsed result to the respective fields of NumbersAndWords
. Right now both fields are vectors of Strings - ideally numbers
would be Vec<i32>
, but that is too difficult for me right now as
The parser-function I tried to write looks like this (but does now work, as there appear two borrows. I am unable to resolve this problem unfortunately:
fn parser<'a>(i: &'a str, numbers_and_words: &mut NumbersAndWords) -> IResult<&'a str, ??? > {
alt((
map(alpha1, |word| numbers_and_words.insert_word(word)),
map(digit1, |num| numbers_and_words.insert_number(num)),
))(i)
Is there a way this can be solved? Even a way nom would prefer? Would it be better use an enum and match the value of alt(...)
somehow?
I am unsure if using alt and map together is such a great idea. Is there any more or less "elegant" way of solving this? Ideally with parsing num
to i32 as well and the field numbers
being a Vec. (but then the return type of this alt(...)-parser would be very complicated.
I think there might be 2 or even 3 problems mixed toghether that confuse me.
First things first, if you want to parse a i32
, then you can simply replace digit1
with the i32
parser, as in nom::character::complete::i32
.
Collecting into a struct like that, can lead to more annoyances in the future, e.g. if you plan on adding more variants, or even just reusing parsers.
It's usually more common to have enum
s, and then based on context you parse into a variant of that enum. Consider parsing a (programming) language, you could have an enum Expr
and fn parse_expr()
, then then also enum Stmt
and fn parse_stmt()
.
In your case, you can make a enum Token
and fn parse_token()
:
use nom::character::complete::i32;
#[derive(PartialEq, Clone, Debug)]
enum Token<'a> {
Word(&'a str),
Number(i32),
}
fn parse_token<'a>(i: &'a str) -> IResult<&'a str, Token<'a>> {
alt((
map(alpha1, Token::Word),
map(i32, Token::Number),
))(i)
}
That makes it easier on the calling end, to consume Token
s into something else. Maybe you want to collect into Vec<Token>
or you can map()
/match token
and build your NumbersAndWords
struct.
Implementing fn parse_tokens()
that simply collects all Token
s into a Vec<Token>
would look like this:
fn parse_tokens<'a>(i: &'a str) -> IResult<&'a str, Vec<Token<'a>>> {
many0(parse_token)(i)
}
fn main() {
let input = "Hello123World456";
let (input, tokens) = parse_tokens(input).unwrap();
assert_eq!(
tokens.as_slice(),
&[
Token::Word("Hello"),
Token::Number(123),
Token::Word("World"),
Token::Number(456),
]
);
}
If you still want to collect into NumbersAndWords
that could look like this:
fn parse_numbers_and_words<'a>(
i: &'a str,
numbers_and_words: &mut NumbersAndWords,
) -> IResult<&'a str, ()> {
let (input, tokens) = parse_tokens(i)?;
for token in tokens {
match token {
Token::Word(word) => numbers_and_words.insert_word(word),
// TODO: `num` is a `i32` so change your `insert_number()` to take `i32`
Token::Number(num) => numbers_and_words.insert_number(num),
}
}
Ok((input, ()))
}
Note that parse_tokens()
(many0()
) is allocating and constructing a Vec<Token>
. To avoid allocating, you could loop parse_token()
instead.
I mentioned it on your previous question, but check out the json.rs
example, it gives a nice starting point when it comes to parsing into multiple variants.
For completion sake, if you truly want to have your fn parser()
that takes a numbers_and_words: &mut NumbersAndWords
. Then the easiest workaround is to simply access .words
and .numbers
directly, then you get around the borrowing issue:
fn parser<'a>(i: &'a str, numbers_and_words: &mut NumbersAndWords) -> IResult<&'a str, ()> {
alt((
map(alpha1, |word| {
numbers_and_words.words.push(String::from(word))
}),
map(i32, |num| {
numbers_and_words.numbers.push(num);
}),
))(i)
}
Assuming you already thought of that, and your example is just a minimal example. Then I'm guessing your actual issue is that your insert_*()
methods are more complex. So you can't simply replace:
numbers_and_words.insert_word(word)
With:
numbers_and_words.words.push(String::from(word))
If that is the case, then this is a perfect example of when the new type idiom applies.
In short, instead of words: Vec<String>
, then you introduce a new type struct Words(Vec<String>)
. Now you change your field to words: Words
. This now allows you to implement methods on Words
. So you now just access .words
directly. (Same for numbers: Numbers
.)
This allows you to get around the borrowing issues, while also being able to implement any methods you want. You just implement them on Words
(and Numbers
) instead of on NumbersAndWords
.
struct Words(Vec<String>);
impl Words {
pub fn insert(&mut self, word: impl Into<String>) {
self.0.push(word.into());
}
}
struct Numbers(Vec<i32>);
impl Numbers {
pub fn insert(&mut self, num: i32) {
self.0.push(num);
}
}
struct NumbersAndWords {
pub words: Words,
pub numbers: Numbers,
}
fn parser<'a>(i: &'a str, numbers_and_words: &mut NumbersAndWords) -> IResult<&'a str, ()> {
alt((
map(alpha1, |word| numbers_and_words.words.insert(word)),
map(i32, |num| numbers_and_words.numbers.insert(num)),
))(i)
}