Search code examples
rustpyo3

Unbounded memory usage using pyo3::types::PyIterator


I have a simple #[pymythods]-impl that uses a &pyo3::types::PyIterator in the most naive way:

/// Construct a `Foo` from an iterator
#[staticmethod]
fn from_iter(iter: &pyo3::types::PyIterator) -> PyResult<Self> {
    for mut foo = Self::new()?;
    for obj in iter {
        foo.bar(obj?)?;
    }
    Ok(foo)
}

I've noticed that memory-usage grows unbounded while the iterator is executing. The pyo3 documentation on memory management seems to specifically mention this situation, albeit it is unclear to me if I understand the problem correctly:

As far as I can see, since we enter the function using a &'a PyIterator, we already hold the GIL, with 'a being bound to the GIL. As the PyIterator returns &'a PyAny during iteration, and because those objects must be valid for at least 'a, the iterated-over objects do not get destroyed during after each iteration of the loop; therefor memory usage grows until the function returns and everything is collected in one fell swoop.

What's the correct strategy here to have each obj destroyed while looping? The documentation points to using unsafe, which I am unsure if the simple code above actually needs.


Solution

  • Answering my own question:

    The solution is, indeed, to use unsafe as described in PyO3's documentation on memory management. The unsafety is brought in because we need to destroy the objects that are being iterated over, while the interpreter has no way to determine if the Rust-part is secretly holding on to them.

    fn from_iter(py: Python, mut iter: &pyo3::types::PyIterator) -> PyResult<Self> {
        let mut foo = Self::new()?
        // Explicit `loop` instead of `for`-loop, so that the `pool`
        // is active *before* `obj` is returned from the `PyIterator`
        loop {
            // SAFETY: We only derive, and then never observe
            // `obj` outside/after each iteration
            let pool = unsafe { py.new_pool() };
            match iter.next() {
                Some(obj) => {
                    foo.bar(obj?)?;
                }
                None => {
                    break;
                }
                drop(pool); // Explicit for clarity
            }
        }
        Ok(Self { inner: foo })
    }