Search code examples
phpgenerator

How to iterate an SplObjectContainer inside nested yield statements while preserving outer iterations?


I have a code base where a really common operation findAll is implemented on a SplObjectStorage backed implementation. Sometimes it is required to findAll (looping over every entry in the collection) multiple times in a nested manner - but there seems to be no easy way to do it without manually tracked the state in each stack-frame to reset the outer iteration back to its original state - or keep a seperate iteration count.

This is the same when using its iterator implementation but note that if i had just used an array you can actually loop over them in a stack safe way that does not interfere with the ongoing iteration - this is not the case when manually moving its internal iteration pointer.

<?php

$objects = new \SplObjectStorage();
$objects->attach((object)[0]);
$objects->attach((object)[1]);
$objects->attach((object)[2]);
$objects->attach((object)[3]);
$objects->attach((object)[4]);

$array = [ 0, 1, 2, 3, 4 ];

function iterate($objects) {
    $objects->rewind();
    
    while ($objects->valid()) {
        $object = $objects->current();
        
        yield ((array)$object)[0];
        
        $objects->next();
    }
}

function iterate2($objects) {
    foreach ($objects as $object) {
        yield ((array)$object)[0];
    }
}

function iterate3(array &$array) {
    foreach ($array as $v) {
        yield $v;
    }
}

function iterate4(array &$array) {
    reset($array);
    
    while (current($array) !== false) {
        
        yield current($array);
        
        next($array);
    }
}

echo "iterate:\n";

foreach (iterate($objects) as $a) {
    foreach (iterate($objects) as $b) {
        echo "$a -> $b\n";
    }
}

echo "iterate2:\n";


foreach (iterate2($objects) as $a) {
    foreach (iterate2($objects) as $b) {
        echo "$a -> $b\n";
    }
}

echo "iterate3:\n";

foreach (iterate3($array) as $a) {
    foreach (iterate3($array) as $b) {
        echo "$a -> $b\n";
    }
}

echo "iterate4:\n";


foreach (iterate4($array) as $a) {
    foreach (iterate4($array) as $b) {
        echo "$a -> $b\n";
    }
}

Outputs:

iterate:
0 -> 0
0 -> 1
0 -> 2
0 -> 3
0 -> 4
iterate2:
0 -> 0
0 -> 1
0 -> 2
0 -> 3
0 -> 4
iterate3:
0 -> 0
0 -> 1
0 -> 2
0 -> 3
0 -> 4
1 -> 0
1 -> 1
1 -> 2
1 -> 3
1 -> 4
2 -> 0
2 -> 1
2 -> 2
2 -> 3
2 -> 4
3 -> 0
3 -> 1
3 -> 2
3 -> 3
3 -> 4
4 -> 0
4 -> 1
4 -> 2
4 -> 3
4 -> 4
iterate4:
0 -> 0
0 -> 1
0 -> 2
0 -> 3
0 -> 4

Basically the last inner iteration moves the iteration to the last element stopping the outer iteration, so without tracking the last iteration manually and reseting to that there is no way to recover.

Obviously a work around would be to manually track the iteration by first consuming the entire SplObjectStorage and then manually iterating through it with ->offsetGet, but that approach feels lacking in several ways.

Here are some of the ways i would attempt to solve it:

Cloning the storage

function iterate($objects) {
    $objects = clone $objects;
    $objects->rewind();
    
    while ($objects->valid()) {
        $object = $objects->current();
        
        yield ((array)$object)[0];
        
        $objects->next();
    }
}

I am not happy with this approach for several reasons, but it is very clean, i would prefer something more performant.

Saving the previous key and correcting with next()

function iterate($objects) {
    $objects->rewind();
    
    while ($objects->valid()) {
        $object = $objects->current();
        
        $before = $objects->key();
        yield ((array)$object)[0];
        
        
        if ($objects->key() > $before) {
            $objects->rewind();
        }
        
        if ($objects->key() < $before) {
            while ($objects->key() !== $before)
                $objects->next();
        }
        
        $objects->next();
    }
}

This is better in some ways but worse in others, where both of them fall short is when the yield modifies the original container and neither of these fit when using foreach on the storage, and i can't really see if there even is a logical way to piece that together.

What would a "correct" or clean way to approach this issue?


Solution

  • For future reference i ended up with a combination of the two approaches with some important fixes and adjustments.

    It should be noted that a lot of libraries such as php-ds polyfills make use of plain arrays as their storage backing, which behave normally when accessed in a nested yield context, and some iterator implementations such as SplQueue/SplDoubleLinkedList actually have a totally different underlying way to calculating next - see ext/spl/spl_dllist.c#L834.

    But the simple approach of keeping an internal iteration pointer on the class instance and incrementing it normally as with the SplObjectStorage seen here ext/spl/spl_observer.c#L749 all seem to experience this issue.

    The "re-seek after yield" approach where we depend on an ordered key such as an integer index can be generelised as so:

    readonly class KeySeekableIterator implements \SeekableIterator {
        public function __construct(private \Iterator $iterator) {}
        
        public function key(): int { return $this->iterator->key(); }
        public function current(): mixed { return $this->iterator->current(); }
        public function next(): void { $this->iterator->next(); }
        public function rewind(): void { $this->iterator->rewind(); }
        public function valid(): bool { return $this->iterator->valid(); }
        
        public function seek(int $offset): void {
            if ($offset < 0)
                throw new \InvalidArgumentException("Offset needs to be >= 0");
            
            if ($this->key() > $offset) {
                $this->rewind();
            }
            
            while ($this->key() < $offset) {
                $this->next();
                
                if (!$this->valid())
                    throw new \OutOfBoundsException();
            }
        }
        
    }
    
    readonly class ContinuousIterator implements \IteratorAggregate {
        private \SeekableIterator $iterator;
        
        public static function asSeekableIterator(\Iterator $iterator): \SeekableIterator {
            if ($iterator instanceof \SeekableIterator)
                return $iterator;
            
            return new KeySeekableIterator($iterator);
        }
        
        public function __construct(\Iterator $iterator) {
            $this->iterator = self::asSeekableIterator($iterator);
        }
        
        public function getIterator(): \Traversable {
            foreach ($this->iterator as $key => $value) {   
                try {
                    yield $key => $value;
                } finally {
                    $this->iterator->seek($key);
                }
            }
        }
    }
    

    And the much simplier approach that @Sammitch also endorsed of simply cloning the iterator every iteration could be incapsulated as such:

    readonly class ImmutableIterator implements \IteratorAggregate {
        public function __construct(private \Iterator $iterator) {
        }
        
        public function getIterator(): \Traversable {
            yield from clone $this->iterator;
        }
    }
    

    In my own codebase i ended up extending the function with some flags to choose what method to use, but the advantage of these wrapper classes are they fit in nicely with the rest of the standard iterator library and can be combined seemlessly with something like FilterIterator etc.

    Here are some additional tests you can use

    function val($obj) {
        if ($obj instanceof \stdClass) {
            return ((array)$obj)[0];
        }
        
        return $obj;
    }
    
    function test(iterable $iterator) {
        foreach ($iterator as $a) {
            foreach ($iterator as $b) {
                echo val($a) . " -> " . val($b) . PHP_EOL;
            }
        }
    }
    
    /*
    // SplObjectStorage
    $objects = new \SplObjectStorage();
    $objects->attach((object)[0]);
    $objects->attach((object)[1]);
    $objects->attach((object)[2]);
    $objects->attach((object)[3]);
    $objects->attach((object)[4]);
    */
    
    /*
    // SplQueue / SplDoubleLinkedList
    $objects = new \SplQueue();
    $objects[] = 0;
    $objects[] = 1;
    $objects[] = 2;
    $objects[] = 3;
    $objects[] = 4;
    */
    
    // ArrayIterator
    $objects = new \ArrayIterator([0,1,2,3,4]);
    
    echo "Plain:\n";
    test($objects);
    
    
    echo "Immutable:\n";
    $iter = new ImmutableIterator($objects);
    test($iter);
    
    echo "Continuous:\n";
    $iter = new ContinuousIterator($objects);
    test($iter);