Search code examples
c++iteratormoduloterminationdata-partitioning

Using an iterator to Divide an Array into Parts with Unequal Size


I have an array which I need to divide up into 3-element sub-arrays. I wanted to do this with iterators, but I end up iterating past the end of the array and segfaulting even though I don't dereference the iterator. given: auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; I'm doing:

auto bar = cbegin(foo);

for (auto it = next(bar, 3); it < foo.end(); bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << endl; });

Now I can solve this by defining a finish iterator:

auto bar = cbegin(foo);
auto finish = next(cend(foo), -(size(foo) % 3));

for (auto it = next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, finish, [](const auto& i) { cout << i << endl; });
for_each(finish, cend(foo), [](const auto& i) { cout << i << endl; });

But this seems unnecessary when I don't dereference the iterator. Why can't I do the first version?


Solution

  • The reason this is prohibited is covered well at your other question Are iterators past the "one past-the-end" iterator undefined behavior? so I'll just address improved solutions.

    For random-access iterators (which you must have if you are using <), there's no need whatsoever for the expensive modulo operation.

    The salient points are that:

    • it + stride fails when it nears the end
    • end() - stride fails if the container contains too few elements
    • end() - it is always legal

    From there, it's simple algebraic manipulation to change it + stride < end() into a legal form (subtract it from both sides).

    The final result, which I have used many times:

    for( auto it = c.cbegin(), end = c.cend(); end - it >= stride; it += stride )
    

    The compiler is free to optimize that back to comparison to a precomputed end - stride * sizeof(*it) if the memory model is flat -- the limitations of C++ behavior don't apply to the primitive operations which the compiler translates C++ into.

    You may of course use std::distance(it, end) if you prefer to use the named functions instead of operators, but that will only be efficient for random-access iterators.

    For use with forward iterators, you should use something that combines the increment and termination conditions like

    struct less_preferred { size_t value; less_preferred(size_t v) : value(v){} };
    
    template<typename Iterator>
    bool try_advance( Iterator& it, less_preferred step, Iterator end )
    {
         while (step.value--) {
             if (it == end) return false;
             ++it;
         }
         return true;
    }
    

    With this additional overload, you'll get efficient behavior for random-access iterators:

    template<typename RandomIterator>
    auto try_advance( RandomIterator& it, size_t stride, RandomIterator end )
         -> decltype(end - it < stride) // SFINAE
    {
         if (end - it < stride) return false;
         it += stride;
         return true;
    }