To get better with Rust, I've decided to implement a simple lexer that analyzes some documents line by line.
As I have to iterate at least two times over the lines of the trait BufRead
, I am cloning the lines of my BufRead
but I get the following error:
error[E0271]: type mismatch resolving `<std::io::Lines<T> as std::iter::Iterator>::Item == &_`
--> <anon>:18:23
|
18 | let lines = lines.cloned();
| ^^^^^^ expected enum `std::result::Result`, found reference
|
= note: expected type `std::result::Result<std::string::String, std::io::Error>`
= note: found type `&_
error[E0271]: type mismatch resolving `<std::io::Lines<T> as std::iter::Iterator>::Item == &_`
I understand what the error is, but based on the following code, how can I tell the compiler what the Item
of the Iterator
should be so it can correctly cast the type?
use std::fmt::Write;
use std::io::{BufRead, BufReader, Lines, Read};
pub struct DocumentMetadata {
language: String,
// ...
}
pub fn analyze<T: BufRead>(document: T) -> Result<DocumentMetadata, ()> {
let lines = document.lines();
let language = guess_language(&lines);
// Do more lexical analysis based on document language
Ok(DocumentMetadata {
language: language,
// ...
})
}
fn guess_language<T: BufRead>(lines: &Lines<T>) -> String {
let lines = lines.cloned();
for line in lines {
let line = line.unwrap();
// Try to guess language
}
"en".to_string()
}
#[test]
fn it_guesses_document_language() {
let mut document = String::new();
writeln!(&mut document, "# language: en").unwrap();
let document = BufReader::new(document.as_str().as_bytes());
match analyze(document) {
Ok(metadata) => assert_eq!("en".to_string(), metadata.language),
Err(_) => panic!(),
}
}
For unit testing purpose, I am building a buffer with a String
but in a normal usage I read it from a File
.
Review the Iterator::cloned
definition:
fn cloned<'a, T>(self) -> Cloned<Self>
where Self: Iterator<Item=&'a T>,
T: 'a + Clone
And the implementation of Iterator
for io::Lines
:
impl<B: BufRead> Iterator for Lines<B> {
type Item = Result<String>;
}
You cannot use cloned
because the iterator item is not a reference. You cannot "tell" the compiler otherwise; that's not how types work.
As I have to iterate at least two times over the lines of the trait
BufRead
, I am cloning the lines of myBufRead
That doesn't really make sense. Cloning the lines of the reader wouldn't save anything. In fact, it would probably just make things worse. You'd be creating the strings once, not using them except for cloning them, then creating them a third time when you iterate again.
If you wish to avoid recreating all the strings, collect
all the strings into a Vec
or other collection and then iterate over that multiple times:
pub fn analyze<T: BufRead>(document: T) -> Result<DocumentMetadata, ()> {
let lines: Result<Vec<_>, _> = document.lines().collect();
let lines = lines.unwrap();
let language = guess_language(&lines);
// Do more lexical analysis based on document language
Ok(DocumentMetadata {
language: language,
// ...
})
}
fn guess_language<'a, I>(lines: I) -> String
where I: IntoIterator<Item = &'a String>,
{
for line in lines {
// Try to guess language
}
"en".to_string()
}