Search code examples
swiftcollectionsiterationlanguage-lawyerfor-in-loop

Iterating with for .. in on a changing collection


I'm experimenting with iteration on an array using a for .. in .. loop. My question is related to the case where the collection is changed within the loop body.

It seems that the iteration is safe, even if the list shrinks in the meantime. The for iteration variables successively take the values of the (indexes and) elements that were in the array at the start of the loop, despite the changes made on the flow. Example:

var slist = [ "AA", "BC", "DE", "FG" ]

for (i, st) in slist.enumerated() {   // for st in slist gives a similar result
    print ("Index \(i): \(st)")
    if st == "AA" {    // at one iteration change completely the list
        print (" --> check 0: \(slist[0]), and 2: \(slist[2])")
        slist.append ("KLM") 
        slist.insert(st+"XX", at:0)   // shift the elements in the array
        slist[2]="bc"                 // replace some elements to come
        print (" --> check again 0: \(slist[0]), and 2: \(slist[2])")
        slist.remove(at:3)
        slist.remove(at:3)
        slist.remove(at:1)            // makes list shorter
    }
}
print (slist)

This works very well, the iteration being made on the values [ "AA", "BC", "DE", "FG" ] even if after the first iteration the array is completely changed to ["AAXX", "bc", "KLM"]

I wanted to know if I can safely rely on this behavior. Unfortunately, the language guide does not tell anything about iterating on a collection when the collection is modified. And the for .. in section doesn't address this question either. So:

  1. Can I safely rely on a guarantee about this iteration behavior provided in the language specifications ?
  2. Or am I simply lucky with the current version of Swift 5.4? In this case, is there any clue in the language specification that one cannot take it for granted? And is there a performance overhead for this iteration behavior (e.g. some copy) compared to indexed iteration?

Solution

  • The documentation for IteratorProtocol says "whenever you use a for-in loop with an array, set, or any other collection or sequence, you’re using that type’s iterator." So, we are guaranteed that a for in loop is going to be using .makeIterator() and .next() which is defined most generally on Sequence and IteratorProtocol respectively.

    The documentation for Sequence says that "the Sequence protocol makes no requirement on conforming types regarding whether they will be destructively consumed by iteration." As a consequence, this means that an iterator for a Sequence is not required to make a copy, and so I do not think that modifying a sequence while iterating over it is, in general, safe.

    This same caveat does not occur in the documentation for Collection, but I also don't think there is any guarantee that the iterator makes a copy, and so I do not think that modifying a collection while iterating over it is, in general, safe.

    But, most collection types in Swift are structs with value semantics or copy-on-write semantics. I'm not really sure where the documentation for this is, but this link does say that "in Swift, Array, String, and Dictionary are all value types... You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back." In particular, this means that for Array, .makeIterator() cannot hold a reference to your array because the iterator for Array does not have to "do anything special" to prevent other code (i.e. your code) from modifying the data it holds.

    We can explore this in more detail. The Iterator type of Array is defined as type IndexingIterator<Array<Element>>. The documentation IndexingIterator says that it is the default implementation of the iterator for collections, so we can assume that most collections will use this. We can see in the source code for IndexingIterator that it holds a copy of its collection

    @frozen
    public struct IndexingIterator<Elements: Collection> {
      @usableFromInline
      internal let _elements: Elements
      @usableFromInline
      internal var _position: Elements.Index
    
      @inlinable
      @inline(__always)
      /// Creates an iterator over the given collection.
      public /// @testable
      init(_elements: Elements) {
        self._elements = _elements
        self._position = _elements.startIndex
      }
      ...
    }
    

    and that the default .makeIterator() simply creates this copy.

    extension Collection where Iterator == IndexingIterator<Self> {
      /// Returns an iterator over the elements of the collection.
      @inlinable // trivial-implementation
      @inline(__always)
      public __consuming func makeIterator() -> IndexingIterator<Self> {
        return IndexingIterator(_elements: self)
      }
    }
    

    Although you might not want to trust this source code, the documentation for library evolution claims that "the @inlinable attribute is a promise from the library developer that the current definition of a function will remain correct when used with future versions of the library" and the @frozen also means that the members of IndexingIterator cannot change.

    Altogether, this means that any collection type with value semantics and an IndexingIterator as its Iterator must make a copy when using using for in loops (at least until the next ABI break, which should be a long-way off). Even then, I don't think Apple is likely to change this behavior.

    In Conclusion

    I don't know of any place that it is explicitly spelled out in the docs "you can modify an array while you iterate over it, and the iteration will proceed as if you made a copy" but that's also the kind of language that probably shouldn't be written down as such code could definitely confuse a beginner.

    However, there is enough documentation lying around which says that a for in loop just calls .makeIterator() and that for any collection with value semantics and the default iterator type (for example, Array), .makeIterator() makes a copy and so cannot be influenced by code inside the loop. Further, because Array and some other types like Set and Dictionary are copy-on-write, modifying these collections inside a loop will have a one-time copy penalty as the body of the loop will not have a unique reference to its storage (because the iterator will have a reference too). This is the exact same penalty that modifying the collection outside the loop will have if you don’t have a unique reference to the storage.

    Without these assumptions, you aren't guaranteed safety, but you might have it anyway in some circumstances.

    Edit:

    I just realized we can create some cases where this is unsafe for sequences.

    import Foundation
    
    /// This is clearly fine and works as expected.
    print("Test normal")
    for _ in 0...10 {
        let x: NSMutableArray = [0,1,2,3]
        for i in x {
            print(i)
        }
    }
    
    /// This is also okay. Reassigning `x` does not mutate the reference that the iterator holds.
    print("Test reassignment")
    for _ in 0...10 {
        var x: NSMutableArray = [0,1,2,3]
        for i in x {
            x = []
            print(i)
        }
    }
    
    /// This crashes. The iterator assumes that the last index it used is still valid, but after removing the objects, there are no valid indices.
    print("Test removal")
    for _ in 0...10 {
        let x: NSMutableArray = [0,1,2,3]
        for i in x {
            x.removeAllObjects()
            print(i)
        }
    }
    
    /// This also crashes. `.enumerated()` gets a reference to `x` which it expects will not be modified behind its back.
    print("Test removal enumerated")
    for _ in 0...10 {
        let x: NSMutableArray = [0,1,2,3]
        for i in x.enumerated() {
            x.removeAllObjects()
            print(i)
        }
    }
    

    The fact that this is an NSMutableArray is important because this type has reference semantics. Since NSMutableArray conforms to Sequence, we know that mutating a sequence while iterating over it is not safe, even when using .enumerated().