A common idiom in CPython to ensure thread safety for iteration is using tuple()
.
For example - tuple(dict.items())
is guaranteed to be threadsafe in CPython even if items are removed by a different thread.
This is because the interpreter does not run the eval loop and does not release the GIL while running these C functions. I've tested it, and it works great.
However, tuple(reversed(dict.items()))
doesn't seem to be threadsafe, and I can't understand why. It's not running any Python function, it isn't explicitly releasing the GIL. Why am I still getting errors if I delete keys from the dict
while it runs on a different thread?
Modifying the size of a dict
while iterating over it will always give an error. The reason why tuple(d.items())
is thread-safe, is because both retrieval of the iterator and iteration happen in the same C function.
d.items()
creates a dict_items
object but there's no iterator yet. That's why changes in dictionary size are still reflected:
>>> d = {'a': 1, 'b': 2}
>>> view = d.items()
>>> del d['a']
>>> list(view)
[('b', 2)]
However once the iterator has been retrieved the dict size must not change anymore:
>>> d = {'a': 1, 'b': 2}
>>> iterator = iter(d.items())
>>> del d['a']
>>> list(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
And this is what happens when using reversed
: it creates a reverse iterator of the dict items. It's the iterator part that causes the trouble, because once the iterator has been created, the underlying dict must not change in size:
>>> d = {'a': 1, 'b': 2}
>>> r = reversed(d.items())
>>> r # Note the iterator here.
<dict_reverseitemiterator object at 0x7fb3a4aa24f0>
>>> del d['a']
>>> list(r)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
So in the specific example tuple(reversed(dict.items()))
an iterator reversed(dict.items())
of the original dict is created which is then iterated over by tuple
. This iterator requires however that the dict does not change in size. It's like tuple(iter(dict.items()))
just in reverse order.
Regarding the GIL switch, the eval loop runs when it's taking the result of reversed()
, after creating the iterator, and sending it to tuple()
for iteration. See the following dissasembly:
>>> dis.dis("tuple({}.items())")
1 0 LOAD_NAME 0 (tuple)
2 BUILD_MAP 0
4 LOAD_METHOD 1 (items)
6 CALL_METHOD 0
8 CALL_FUNCTION 1
10 RETURN_VALUE
>>> dis.dis("tuple(reversed({}.items()))")
1 0 LOAD_NAME 0 (tuple)
2 LOAD_NAME 1 (reversed)
4 BUILD_MAP 0
6 LOAD_METHOD 2 (items)
8 CALL_METHOD 0
10 CALL_FUNCTION 1
12 CALL_FUNCTION 1
14 RETURN_VALUE