I'm looking for a way around the lack-of-polonius problem in this specific circumstance. The other answers seem inapplicable, as far as I can understand at the moment.
I have two structures, SourceBytes<S>
and SourceChars
. The former is decoupled, but the second is heavily coupled to the former. SourceBytes<S>
should be constructed from any S: Iterator<Item = u8>
, and SourceChars
should be constructed from the same, S: Iterator<Item = u8>
.
This is what the definition looks like for each:
#[derive(Clone, Debug)]
pub struct SourceBytes<S>
where
S: Iterator<Item = u8>,
{
iter: S,
buffer: Vec<S::Item>,
}
#[derive(Clone, Debug)]
pub struct SourceChars<S>(S)
where
S: Iterator<Item = u8>;
The purpose of SourceBytes<S>
is to abstract over S
so that each S::Item
can be buffered, and be read immutably without taking/popping the item from the iterator. That looks like this:
impl<S> Iterator for SourceBytes<S>
where
S: Iterator<Item = u8>,
{
type Item = S::Item;
fn next(&mut self) -> Option<Self::Item> {
self.buffer.pop().or_else(|| self.iter.next())
}
}
This works fine, and the buffer is handled like so:
impl<S> SourceBytes<S>
where
S: Iterator<Item = u8>,
{
// pub fn new<I>(iter: I) -> Self
// where
// I: IntoIterator<Item = S::Item, IntoIter = S>,
// {
// Self {
// iter: iter.into_iter(),
// buffer: Vec::new(),
// }
// }
fn buffer(&mut self, count: usize) -> Option<&[u8]> {
if self.buffer.len() < count {
self.buffer
.extend(self.iter.by_ref().take(count - self.buffer.len()));
}
self.buffer.get(0..count)
}
}
So that each time SourceBytes<S>::buffer
is called, the items will be taken from S
and pushed to buffer
. Each time <SourceBytes as Iterator>::next
is called, it will first take from self.buffer
, and then from self.iter
where the type of the latter field is S
.
Now, the purpose of SourceChars<S>
is provide an Iterator
interface to read bytes from self.0
(which is S
) until it finds a valid UTF-8 char
, and then return it:
impl<S> Iterator for SourceChars<S>
where
S: Iterator<Item = u8>,
{
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let mut buf = [0; 4];
// A single character can be at most 4 bytes.
for (i, byte) in self.0.by_ref().take(4).enumerate() {
buf[i] = byte;
if let Ok(slice) = std::str::from_utf8(&buf[..=i]) {
return slice.chars().next();
}
}
None
}
}
This also works fine.
Now, I also wish to provide an impl
for SourceChars<&mut SourceBytes<S>>
, so that SourceChars
can rely on the buffer provided by self.0
(which, in this circumstance, is &mut SourceBytes<S>
).
impl<S> SourceChars<&mut SourceBytes<S>>
where
S: Iterator<Item = u8>,
{
fn buffer(&mut self, count: usize) -> Option<&str> {
// let mut src = self.0.by_ref();
for byte_count in 0.. {
let Some(buf) = self.0.buffer(byte_count) else {
return None;
};
if let Ok(slice) = std::str::from_utf8(buf) {
if slice.chars().count() >= count {
return Some(slice);
}
}
}
unreachable!()
}
}
This SourceChars<&mut SourceBytes<S>>::buffer
relies on SourceBytes<S>::buffer
to actually buffer the bytes, but instead SourceChars
behaves as a wrapper to change the interpretation of the iterator S
from bytes to char
s.
The problem is that self.0
cannot be borrowed mutably more than once, and in the loop, the reference &mut self.0
does not appear to be dropped by the compiler.
How can I implement this in such a way that SourceChars
relies on SourceBytes::buffer
without running into this compiler error?
error[E0499]: cannot borrow `*self.0` as mutable more than once at a time
--> src/parser/iter.rs:122:29
|
119 | fn buffer(&mut self, count: usize) -> Option<&str> {
| - let's call the lifetime of this reference `'1`
...
122 | let Some(buf) = self.0.buffer(byte_count) else {
| ^^^^^^ `*self.0` was mutably borrowed here in the previous iteration of the loop
...
127 | return Some(slice);
| ----------- returning this value requires that `*self.0` is borrowed for `'1`
One option that I previously tried was the crate polonius-the-crab
, but that ended up causing more problems with the usage of the API, in addition to making trait bounds difficult to get right.
Because of this inconvenience, I ended up using an unsafe pointer coercion to reduce the lifetime of the buf
to no longer be dependent upon the &mut SourceBytes
.
impl<S> Buffered for SourceChars<&mut S>
where
for<'a> S: Iterator<Item = u8> + Buffered<ItemSlice<'a> = &'a [u8]> + 'a,
{
type ItemSlice<'items> = &'items str where Self: 'items;
// Allowed specifically here because the borrow checker is incorrect.
#[allow(unsafe_code)]
fn buffer(&mut self, count: usize) -> Option<Self::ItemSlice<'_>> {
for byte_count in 0.. {
let buf = self.0.buffer(byte_count)?;
// SAFETY:
//
// This unsafe pointer coercion is here because of a limitation
// in the borrow checker. In the future, when Polonius is merged as
// the de-facto borrow checker, this unsafe code can be removed.
//
// The lifetime of the byte slice is shortened to the lifetime of
// the return value, which lives as long as `self` does.
//
// This is referred to as the "polonius problem",
// or more accurately, the "lack-of-polonius problem".
//
// <https://github.com/rust-lang/rust/issues/54663>
let buf: *const [u8] = buf;
let buf: &[u8] = unsafe { &*buf };
if let Ok(slice) = std::str::from_utf8(buf) {
if slice.chars().count() >= count {
return Some(slice);
}
}
}
unreachable!()
}
}
Additionally, here are the tests that show usage of the API. Using the polonius-the-crab
crate failed to solve some lifetime issues that I ran across while implementing these tests.
#[cfg(test)]
mod tests {
use super::{Buffered, SourceBytes, SourceChars};
#[test]
fn test_source_chars() {
let source = "abcdefg";
let chars = SourceChars::new(source.bytes());
assert_eq!(source, chars.collect::<String>());
}
#[test]
fn test_source_chars_buffer() {
let source = "abcdefg";
let mut bytes = SourceBytes::new(source.bytes());
let mut chars = SourceChars::new(&mut bytes);
// Ensure that the `buffer` function works.
assert_eq!(&source[0..3], chars.buffer(3).unwrap());
// Ensure that the characters are taken from the buffer,
// and that `buffer` correctly preserves them.
assert_eq!(&source[0..4], chars.by_ref().take(4).collect::<String>());
// Ensure that the iterator has been advanced.
assert_eq!(&source[4..7], chars.buffer(3).unwrap());
}
}