I am new to Rust and still fighting with the borrow-checker.
I am using this question as my general learning opportunity. So, I would appreciate an explanation that is applicable generally to similar situation instead of a response that simply applies to this particular example.
I want to implement a wrapper over RowIterator. I want to build a struct that holds the RowIterator
. On next()
and has_next()
, I will add the desired behavior that's not meaningful to the context of this question.
Here is how my wrapper code looks like -
use std::{fs::File, path::Path};
use parquet::{
file::{reader::FileReader, serialized_reader::SerializedFileReader},
record::reader::RowIter,
};
pub struct PIterator<'a> {
row_iterator: RowIter<'a>,
}
impl<'a> PIterator<'a> {
pub fn new(file_path: String) -> PIterator<'a> {
let path = Path::new(&file_path);
let file = File::open(path).unwrap();
let reader = SerializedFileReader::new(file).unwrap();
let row_iter = reader.get_row_iter(None); // reader is borrowed here
PIterator { // cannot return value referencing local variable `reader`...
row_iterator: row_iter.unwrap(),
}
}
pub fn has_next(&self) -> bool {
// TODO:
false
}
pub fn next(&self) -> Option<String> {
// TODO:
None
}
}
I get the concept that when a method is invoked on a mutable reference, the object may mutate leaving the references dangling. So, the ownership transfer must happen.
What I cannot comprehend is - who is the "borrower" here? Is it the method "new"? How we determine that?
Also, in this situation, what is the idiomatic way of wrapping the iterator?
In this particular example I'd tell you to just make a function that creates a RowIter
for you as it seems you don't actually need the iterator adapter, but let's assume you need to have it for some reason.
Your problem is that in PIterator
you're trying to refer to something that is going to be dropped. You make a RowIter
that refers to the reader
and then try to return it even though reader
is going to be dropped at the end of new
.
Instead of referring to it, transfer ownership of it. To create an iterator that owns it's data, not just refers to it, you idiomatically use the into_iter
function from the IntoIter
trait. Once you do that you can remove the lifetime from your struct.
Then, put your next
function into an Iterator
impl block so that the language knows it is meant to be Iterator::next
and not a function whose name just so happens to be next
. As for has_next
, you don't have to implement that yourself, just use the existing peekable
adapter.
Some other minor adjustments I made are (more) proper error handling instead of unwrap
, and &Path
instead of needlessly owned String
as the argument to new
(If you already have a Path
you'd have to needlessly convert it back to String just to call new
...).
use std::{fs::File, io, path::Path};
use parquet::{file::serialized_reader::SerializedFileReader, record::reader::RowIter};
use thiserror::Error;
pub struct PIterator {
row_iterator: RowIter<'static>,
}
#[derive(Debug, Error)]
pub enum PIteratorCreationError {
#[error(transparent)]
IOError(#[from] io::Error),
#[error(transparent)]
ParquetError(#[from] parquet::errors::ParquetError),
}
impl PIterator {
pub fn new(file_path: &Path) -> Result<Self, PIteratorCreationError> {
let file = File::open(file_path)?;
let reader = SerializedFileReader::new(file)?;
let row_iter = reader.into_iter();
Ok(PIterator { row_iterator: row_iter })
}
}
impl Iterator for PIterator {
type Item = <RowIter<'static> as Iterator>::Item;
fn next(&mut self) -> Option<Self::Item> {
self.row_iterator.next()
}
}
fn main() {
let p_iterator = match PIterator::new(Path::new("Some filename...")) {
Ok(iterator) => iterator,
Err(creation_error) => match creation_error {
PIteratorCreationError::IOError(io_error) => {
// handle io error here somehow
eprintln!("IO error: {io_error}");
return;
}
PIteratorCreationError::ParquetError(parquet_error) => {
// handle parquet error here somehow
eprintln!("Parquet error: {parquet_error}");
return;
}
},
};
let mut iterator = p_iterator.peekable();
let has_next: bool = iterator.peek().is_some();
}