Search code examples
rustreferencelifetimemutable

Call a function that uses `&mut self.0` in a loop (E0499)


I'm looking for a way around the lack-of-polonius problem in this specific circumstance. The other answers seem inapplicable, as far as I can understand at the moment.

I have two structures, SourceBytes<S> and SourceChars. The former is decoupled, but the second is heavily coupled to the former. SourceBytes<S> should be constructed from any S: Iterator<Item = u8>, and SourceChars should be constructed from the same, S: Iterator<Item = u8>.

This is what the definition looks like for each:

#[derive(Clone, Debug)]
pub struct SourceBytes<S>
where
    S: Iterator<Item = u8>,
{
    iter: S,
    buffer: Vec<S::Item>,
}

#[derive(Clone, Debug)]
pub struct SourceChars<S>(S)
where
    S: Iterator<Item = u8>;

The purpose of SourceBytes<S> is to abstract over S so that each S::Item can be buffered, and be read immutably without taking/popping the item from the iterator. That looks like this:

impl<S> Iterator for SourceBytes<S>
where
    S: Iterator<Item = u8>,
{
    type Item = S::Item;

    fn next(&mut self) -> Option<Self::Item> {
        self.buffer.pop().or_else(|| self.iter.next())
    }
}

This works fine, and the buffer is handled like so:

impl<S> SourceBytes<S>
where
    S: Iterator<Item = u8>,
{
    // pub fn new<I>(iter: I) -> Self
    // where
    //     I: IntoIterator<Item = S::Item, IntoIter = S>,
    // {
    //     Self {
    //         iter: iter.into_iter(),
    //         buffer: Vec::new(),
    //     }
    // }

    fn buffer(&mut self, count: usize) -> Option<&[u8]> {
        if self.buffer.len() < count {
            self.buffer
                .extend(self.iter.by_ref().take(count - self.buffer.len()));
        }
        self.buffer.get(0..count)
    }
}

So that each time SourceBytes<S>::buffer is called, the items will be taken from S and pushed to buffer. Each time <SourceBytes as Iterator>::next is called, it will first take from self.buffer, and then from self.iter where the type of the latter field is S.

Now, the purpose of SourceChars<S> is provide an Iterator interface to read bytes from self.0 (which is S) until it finds a valid UTF-8 char, and then return it:

impl<S> Iterator for SourceChars<S>
where
    S: Iterator<Item = u8>,
{
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        let mut buf = [0; 4];
        // A single character can be at most 4 bytes.
        for (i, byte) in self.0.by_ref().take(4).enumerate() {
            buf[i] = byte;
            if let Ok(slice) = std::str::from_utf8(&buf[..=i]) {
                return slice.chars().next();
            }
        }
        None
    }
}

This also works fine.

Now, I also wish to provide an impl for SourceChars<&mut SourceBytes<S>>, so that SourceChars can rely on the buffer provided by self.0 (which, in this circumstance, is &mut SourceBytes<S>).

impl<S> SourceChars<&mut SourceBytes<S>>
where
    S: Iterator<Item = u8>,
{
    fn buffer(&mut self, count: usize) -> Option<&str> {
        // let mut src = self.0.by_ref();
        for byte_count in 0.. {
            let Some(buf) = self.0.buffer(byte_count) else {
                return None;
            };
            if let Ok(slice) = std::str::from_utf8(buf) {
                if slice.chars().count() >= count {
                    return Some(slice);
                }
            }
        }
        unreachable!()
    }
}

This SourceChars<&mut SourceBytes<S>>::buffer relies on SourceBytes<S>::buffer to actually buffer the bytes, but instead SourceChars behaves as a wrapper to change the interpretation of the iterator S from bytes to chars.

The problem is that self.0 cannot be borrowed mutably more than once, and in the loop, the reference &mut self.0 does not appear to be dropped by the compiler.

How can I implement this in such a way that SourceChars relies on SourceBytes::buffer without running into this compiler error?

error[E0499]: cannot borrow `*self.0` as mutable more than once at a time
   --> src/parser/iter.rs:122:29
    |
119 |     fn buffer(&mut self, count: usize) -> Option<&str> {
    |               - let's call the lifetime of this reference `'1`
...
122 |             let Some(buf) = self.0.buffer(byte_count) else {
    |                             ^^^^^^ `*self.0` was mutably borrowed here in the previous iteration of the loop
...
127 |                     return Some(slice);
    |                            ----------- returning this value requires that `*self.0` is borrowed for `'1`

Solution

  • One option that I previously tried was the crate polonius-the-crab, but that ended up causing more problems with the usage of the API, in addition to making trait bounds difficult to get right.

    Because of this inconvenience, I ended up using an unsafe pointer coercion to reduce the lifetime of the buf to no longer be dependent upon the &mut SourceBytes.

    impl<S> Buffered for SourceChars<&mut S>
    where
        for<'a> S: Iterator<Item = u8> + Buffered<ItemSlice<'a> = &'a [u8]> + 'a,
    {
        type ItemSlice<'items> = &'items str where Self: 'items;
    
        // Allowed specifically here because the borrow checker is incorrect.
        #[allow(unsafe_code)]
        fn buffer(&mut self, count: usize) -> Option<Self::ItemSlice<'_>> {
            for byte_count in 0.. {
                let buf = self.0.buffer(byte_count)?;
                // SAFETY:
                //
                // This unsafe pointer coercion is here because of a limitation
                // in the borrow checker. In the future, when Polonius is merged as
                // the de-facto borrow checker, this unsafe code can be removed.
                //
                // The lifetime of the byte slice is shortened to the lifetime of
                // the return value, which lives as long as `self` does.
                //
                // This is referred to as the "polonius problem",
                // or more accurately, the "lack-of-polonius problem".
                //
                // <https://github.com/rust-lang/rust/issues/54663>
                let buf: *const [u8] = buf;
                let buf: &[u8] = unsafe { &*buf };
    
                if let Ok(slice) = std::str::from_utf8(buf) {
                    if slice.chars().count() >= count {
                        return Some(slice);
                    }
                }
            }
            unreachable!()
        }
    }
    

    Additionally, here are the tests that show usage of the API. Using the polonius-the-crab crate failed to solve some lifetime issues that I ran across while implementing these tests.

    #[cfg(test)]
    mod tests {
        use super::{Buffered, SourceBytes, SourceChars};
    
        #[test]
        fn test_source_chars() {
            let source = "abcdefg";
            let chars = SourceChars::new(source.bytes());
            assert_eq!(source, chars.collect::<String>());
        }
    
        #[test]
        fn test_source_chars_buffer() {
            let source = "abcdefg";
            let mut bytes = SourceBytes::new(source.bytes());
            let mut chars = SourceChars::new(&mut bytes);
            // Ensure that the `buffer` function works.
            assert_eq!(&source[0..3], chars.buffer(3).unwrap());
            // Ensure that the characters are taken from the buffer,
            // and that `buffer` correctly preserves them.
            assert_eq!(&source[0..4], chars.by_ref().take(4).collect::<String>());
            // Ensure that the iterator has been advanced.
            assert_eq!(&source[4..7], chars.buffer(3).unwrap());
        }
    }