Search code examples
pythonparsingtype-inferencestatic-typing

Why is all static typing not inferred?


As Python supports type annotations, it enables a static typing discipline. When working with the AST produced by the ast module, it strikes me that given such discipline all types could be inferred, there should be no need for type annotations. Given a static typing pragma (perhaps a comment at top of the code file), an additional layer of logic in the parser could traverse the AST to determine types of all variables.

For example, take this snippet of code from the Mypy website:

d = {}  # type: Dict[str, int]

with open(sys.argv[1]) as f:
    for s in f:
        for word in re.sub('\W', ' ', s).split():
            d[word] = d.get(word, 0) + 1

The dict d and its keys and values are typed with a comment, but the type could be inferred from the looping that follows: s is a str if it is in f, the content read from a file; and the dict item value is an int because that's what the assignment expression returns.

Is it the case that performing such analysis of the code is generally too expensive for static typing to be inferred, or am I missing something else?

Please note that this question does not relate to the discussion regarding dynamic versus static typing, or optional typing. My point is about type inferrence when the programmer agrees to static typing.


Solution

  • The problem is that type annotations are optional. Indeed, the re module has no type annotations, not even in Python 3.8 it would seem. Sure, the analyser could introspect the Python code to see what's going on. However, for some code (like the re module), the code eventually dips into the C-API (in CPYthon). At this point the analyser has no way to figure out what the type signature of the function is. As humans, we can read the documentation and know that re.sub always returns an instance of str, but automated tools have no way of knowing unless they are provided with supplementary type information.

    Then you have the problem that some functions return type unions. eg. the ** operator (int.__pow__) that returns an int, a float, or a complex depending on the types and values of its operands. eg.

    >>> 3 ** 2
    9
    >>> 3 ** -2
    0.1111111111111111
    >>> 2 ** 0.5
    1.4142135623730951
    >>> (-1) ** 0.5
    (6.123233995736766e-17+1j) # should really just be 1j
    

    This means that, given:

    def f(x: int, y: int):
       z = x ** y
    

    z would be assigned the type of object (the common base of int, float and complex), which is likely not what is desired. By giving the variable a type annotation, we can enable mypy to do type checking when assigning to z the result of x ** y, but that any future operations on z can safely assume the type of z to be whatever it was defined to be.