Search code examples
pythonpython-typingabcstructural-typing

collections.abc.Iterable doesn't allows runtime structural checks according to Iterable API


Existing Approaches to Structural Subtyping

Abstract classes defined in collections.abc module are slightly more advanced since they implement a custom __subclasshook__() method that allows runtime structural checks without explicit registration:

from collections.abc import Iterable

class MyIterable:
    def __iter__(self):
        return []

assert isinstance(MyIterable(), Iterable)

But Python glossary: Iterable:

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.

"or with a __getitem__()"

So I expect that this code run without any AssertionError:

from collections.abc import Iterable

class MyIterable:
    def __getitem__(self, item):
        return []

assert isinstance(MyIterable(), Iterable)

But it doesn't:

Traceback (most recent call last):
  File "file.py", line 7, in <module>
    assert isinstance(MyIterable(), Iterable)
AssertionError

So why, even if an Iterable should implement __iter__ OR __getitem__, __getitem__ doesn't works if we want to check if it's an Iterable.

I also tested with Mypy:

from collections.abc import Iterable

class MyIterable1:
    def __iter__(self):
        return []

class MyIterable2:
    def __getitem__(self):
        return []

def foo(bar: Iterable):
    ...

foo(MyIterable1())
foo(MyIterable2())

Type check result:

$ mypy .\scratch_443.py
test_file.py:15: error: Argument 1 to "foo" has incompatible type "MyIterable2"; expected "Iterable[Any]"
Found 1 error in 1 file (checked 1 source file)

Solution

  • While you did cite most of the relevant passages, I would like to add a little bit of additional context and another perspective.

    The problem lies (as it often does) in the definitions, of which there are two in this case.


    The abstract base Iterable

    The collections.abc.Iterable is not flawed, it just leans on a more narrow definition of the term. In that definition, if a class implements the __iter__ method, it is considered iterable; plain and simple. Mind you, this does not (and can not) impose any constraints on what happens inside that method or what it returns.

    One of the consequences of this is that technically the method could return something silly, like an integer for example, even though we would reasonably expect the __iter__ method to always return an iterator (i.e. something implementing the __next__ method).

    Case in point:

    from collections.abc import Iterable
    
    class Foo:
        def __iter__(self) -> int:
            return 1
    
    assert isinstance(Foo(), Iterable)  # passes
    iter(Foo())  # TypeError: iter() returned non-iterator of type 'int'
    

    The error is only raised inside the iter function, as it presumably checks the existence of __next__ in the __dict__ of the class (!) of the provided object.

    class NotReallyAnIterator:
        __next__ = None
    
    class Foo:
        def __iter__(self) -> NotReallyAnIterator:
            return NotReallyAnIterator()
    
    it = iter(Foo())  # passes
    next(it)  # TypeError: 'NoneType' object is not callable
    

    This last point is tangential, but still relevant to the discussion IMO.


    The loose term "iterable"

    The term "iterable" is defined more broadly in the glossary as an object whose class corresponds to the aforementioned Iterable protocol or, as you quoted,

    with a __getitem__() method that implements Sequence semantics.

    And you'll notice I highlighted that last portion of the sentence. This part is actually important to understanding the problem at hand. This is unfortunately not expanded on further in the glossary, but if we take a look at the documentation for the built-in iter(), which is (as the docs tell us) the only reliable way of checking, if an object is iterable, we find the following clarification. It says the argument

    must be a collection object which supports the iterable protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).

    This qualification is important because simply having the __getitem__ method does not constitute a Sequence. It is a necessary but not sufficient requirement, as e.g. the Mapping protocol also requires the __getitem__ method to be implemented, but neither of those two is a subclass of the other (as you can see here).

    __getitem__ merely allows subscripting an instance with a key (i.e. using square brackets [key] with them) and the sequence protocol requires an accepted key to be an integer (or slice).

    Why is this relevant?

    Because while we can know if an object's class implements __getitem__, it is impossible to know from the outside how it implements it. A Sequence subtype should raise an error, if we were to try and call its __getitem__ with a string for example. But how con we know that it does? Only by calling it.

    And since specifically the sequence protocol (and not just any __getitem__ method) is what constitutes an "iterable" in the absence of __iter__ in this broader sense, there is no way to determine, if a class should or should not be considered iterable.

    To top this all off, consider the following example:

    class Bar:
        def __getitem__(self, key: str) -> str:
            return key.upper()
    
    it = iter(Bar())  # passes
    print(next(it))  # AttributeError: 'int' object has no attribute 'upper'
    

    I would argue that Bar is a perfectly valid (albeit not very useful) example of a subscriptable class. An instance even passes the iter() check! Yet should it be considered an iterable? Both the documentation and common sense say no.


    Conclusion

    Determining whether or not something is "iterable" comes down to what you mean by the term. And I would argue that (if anything) the documentation suggesting that the iter() is reliable in this regard is misleading. The simple subclass check with the ABC Iterable is not sufficient, if you consider the sequence protocol to also be a reasonable version of an iterable.

    IMHO, the only actually reliable way of determining if an object is iterable is to chain a next() call with an iter() call, which in practice amounts to a plain for-loop. If that raises an error, the object is not iterable.

    Final example:

    from __future__ import annotations
    
    class RealIter:
        def __iter__(self) -> RealIter:
            print(f"called {self.__class__.__name__}.__iter__")
            return self
    
        def __next__(self) -> str:
            print(f"called {self.__class__.__name__}.__next__")
            return "Hi, mom!"
    
    class SeqIter:
        def __getitem__(self, key: int) -> str:
            print(f"called {self.__class__.__name__}.__getitem__({key})")
            return "Hi, mom!"
    
    for item in RealIter():
        print(item)
        break
    
    for item in SeqIter():
        print(item)
        break
    

    Output:

    called RealIter.__iter__
    called RealIter.__next__
    Hi, mom!
    called SeqIter.__getitem__(0)
    Hi, mom!
    

    Related

    How to check a class/type is iterable (uninstantiated)