Search code examples
pythonpython-c-api

Difference iterating with tp_iternext or PyIter_Next


If I write a C function that does something with an iterable then I create an Iterator first and then loop over it.

iterator = PyObject_GetIter(sequence);
if (iterator == NULL) {
    return NULL;
}
while (( item = PyIter_Next(iterator) )) {
    ...
}

This works fine but I've also seen some functions using tp_iternext:

iterator = PyObject_GetIter(sequence); // ....

iternext = *Py_TYPE(iterator)->tp_iternext;
while (( item = iternext(iterator) )) {
    ...
}

the second approach seems faster (I have only one data point: my Windows computer and my msvc compiler).

Is it just coincidence that the iternext approach is faster and is there any significant difference between these two?

Links to the python documentation of both: PyIter_Next, tp_iternext I have read them but to me it's not clear when and why one should be preferred.


Solution

  • The source code for PyIter_Next shows that it simply retrieves the tp_iternext slot and calls it and clears a StopIteration exception that may or may not have occurred.

    If you use tp_iternext explicitly you have to check for this StopIteration when exhausting the iterator.


    By the way: the documentation of tp_iternext also says:

    iternextfunc PyTypeObject.tp_iternext

    An optional pointer to a function that returns the next item in an iterator. When the iterator is exhausted, it must return NULL; a StopIteration exception may or may not be set. When another error occurs, it must return NULL too. Its presence signals that the instances of this type are iterators.

    While there is no such mention in PyIter_Next's documentation.

    So PyIter_Next is the simple and safe way of iterating over an iterator. You can use tp_iternext but then you have to be careful to not trigger a StopIteration exception at the end.