Search code examples
rustparquet

How do I box two objects whose lifetimes are linked?


I'm trying to use the parquet library to create a record iterator object in a function that can be iterated using my own trait called RecordIterator. It looks like this:

    fn blah(dataset_info: DatasetInfo, file: File) -> Result<Box<dyn RecordIterator<Item=Record>>, Box<dyn Error>> {
        let reader = SerializedFileReader::new(file).unwrap();
        let iter = (&reader).get_row_iter(None).unwrap();
        let column_type = *dataset_info.column_type.clone();
        let iterator = ParquetRecordIterator {
            iterator: iter,
            column_type: column_type,
            i: 0,
        };
        Ok(Box::new(iterator))
    }

The problem is that because iter has a lifetime attached to the SerializedFileReader variable it is created from, returning the ParquetRecordIterator (which implements the RecordIterator trait) object complains with an error saying:

cannot return value referencing local variable `reader` [E0515]

I would ideally not like to break the abstraction here, so how would you suggest implementing this function? Effectively, I would like to break the lifetime link between the reader and the iterator, but not sure how best I could do that, or use a different parquet API to do this.

I've tried wrapping the file reader in a Box::new, hoping that the lifetimes would not be tied to an object in the heap, but unfortunately that doesn't seem to work.

I've NOT tried any libraries to deal with self-referential structs because it doesn't seem like they're recommended, so I was hoping for a standard way to solve this.

I've NOT tried looking into other parquet libraries.


Solution

  • This is a hard problem in Rust since you're essentially trying to create a self-referential struct.

    Fortunately for you, you don't need to solve this problem in general, since the library you're using provides direct support for this use-case: impl IntoIterator for SerializedFileReader<File>. So:

        fn blah(dataset_info: DatasetInfo, file: File) -> Result<Box<dyn RecordIterator<Item=Record>>, Box<dyn Error>> {
            let reader = SerializedFileReader::new(file).unwrap();
            let iter = reader.into_iter();
            let column_type = *dataset_info.column_type.clone();
            let iterator = ParquetRecordIterator {
                iterator: iter,
                column_type: column_type,
                i: 0,
            };
            Ok(Box::new(iterator))
        }