pythongeneratoryieldtype-hinting

Python's PEP 484 type annotation for Generator Expression


What is the correct type annotation for a function that returns a generator expression?

e.g.:

def foo():
    return (x*x for x in range(10))

I can't figure out if this is -> Iterator[int], -> Iterable[int], -> Generator[int, None, None], or something else.

If there should be one-- and preferably only one --obvious way to do it, then what is the obvious way here?


Solution

  • Quick note: your function is a "regular function which returns a generator", not a "generator function". To understand the distinction, read this answer.

    For your foo, I suggest using -> Iterator[int].

    Explanation

    It boils down to what kind of interface you want.

    First, make yourself familiar with this page in the python documentation where the hierarchy of the most important Python types is defined.

    You can see there that these expressions return True:

    import typing as t
    issubclass(t.Iterator, t.Iterable)
    issubclass(t.Generator, t.Iterator)
    

    You should also notice on the same page that Generator has methods that Iterator doesn't have. These methods are send, throw and close (documentation), and they allow you to do more with generators than just simple single passthrough iteration. Check this question for examples of the possibilities with generators: What is the purpose of the "send" function on Python generators?

    Going back to choosing an interface. If you want others to use the results of your generator function like a generator, i.e.

    def gen(limit: int): -> Generator[int, None, None]
        for x in range(limit):
            yield x
    
    g = gen(3)
    next(g)  # => 0
    g.send(10)  # => 1
    

    Then you should specify -> Generator[int, None, None].

    But notice that above is nonsense. You in fact can call send, but it doesn't change the execution because gen doesn't do anything with sent value (there is nothing like x = yield). Knowing that, you can limit the knowledge of people using gen and define it as -> Iterator[int]. In this way, you can make a contract with users that "my function returns iterator of integers and you should use it as such". If you later change implementation to, e.g.

    def gen(limit: int): -> Iterator[int]
        return iter(list(range(limit)))
    

    Those who used a returned object like Generator (because they peeked implementation) would have their code broken. However, you shouldn't be bothered by that because they used it in a different way to the way specified in your contract. As such, this kind of breakage is not your responsibility.

    Put simply, if you end up with Generator[Something, None, None] (two Nones) then consider Iterable[Something] or Iterator[Something].

    The same goes for Iterator vs Iterable. If you want your users to be able to use your object only with the iter function (and thus be used in iteration context e.g. [x for x in g]), then use Iterable. If you want them to use both next and iter on the object, use Iterator.

    Note

    This line of thought applies mostly to the annotated type of returned values. In the case of parameters, you should specify the types according to what interface (read: methods/functions) you want to use on that object inside your function.