Search code examples
pythonabstract-syntax-tree

Inspect vs AST argument specification compatbility


I have a string that contains python code - more specifically a python function definition (using traditional function definition, not lambdas). For example:

code = "def f(p,t,h, *, foo: str = 'foo'): return True"

I need to get the signature of the defined function as I need to assert that some properties are being met. As far as I know there are two approaches I could take: One involving AST module, and one involving inspect module.

Using AST module

import ast
root = ast.parse(code)
print(root.body[0].args.__dict__

This will print:

{'posonlyargs': [],
 'args': [<ast.arg at 0x7f87e06cd4c0>,
  <ast.arg at 0x7f87e06cd7c0>,
  <ast.arg at 0x7f8808074100>],
 'vararg': None,
 'kwonlyargs': [<ast.arg at 0x7f87e023d4c0>],
 'kw_defaults': [<ast.Constant at 0x7f87e023d6a0>],
 'kwarg': None,
 'defaults': []}

Using inspect

First execute the code to obtain a function object and then inspect its signature (I know exec is dangerous and I should not use it, don't worry).

import inspect
results = {}
exec(code, {}, results)
f = results['f']
print(inspect.getfullargspec(f))

This will print:

FullArgSpec(args=['p', 't', 'h'], varargs=None, varkw=None, defaults=None, kwonlyargs=['foo'], kwonlydefaults={'foo': 'foo'}, annotations={'foo': <class 'str'>})

I notice that some things are the same, for example the 'args' attribute of both things represent the same thing (or at least that's what I would expect) but for other things it's not so clear, for example, using inspect we get an attribute named varkw that is not present in the AST.arguments instance (although I think it can be kwarg attribute maybe?).

The question is: Is there a mapping between the two classes? Is all the info present in one retained in the other? If so, which attributes correspond to which?


Solution

  • The two approaches work with very different inputs, one should be seen as structured instructions, the other the reflection of the result of the instructions having been executed.

    • The AST approach looks, presumably, at the very limited context of just the function definition. This can only tell you things about the static interpretation of that text. E.g. any keyword argument defaults or dynamic annotations are not going to be accurate without more context. Moreover, the default values and annotations are expressions that are used to executed by Python when the function is created, and could even be simplified during compilation.

    • The inspect approach works on the current 'live' representation of the function object as loaded by the interpreter. Function objects are not static entities, various aspects of the object are actually mutable, so the signature generated could change depending on what other code has run and if that has acted on the function or on objects the function references.

    To give an example, consider:

    from random import randint, choice
    
    annotation = choice((str, int))
    
    def foo(bar, baz = randint(0, 42), arg: annotation = None):
        pass
    
    foo.__defaults__ = choice((foo.__defaults__, foo.__defaults__[::-1]))
    

    Can you tell me what the default values for baz and arg will be from the source code alone? Or what type the arg argument is annotated with?

    You can't, because the AST reflects text as processed using Python's grammar rules. The inspect.getfullargspec() named tuple reflects Python objects as created by the Python interpreter, after executing bytecode generated from the AST.

    Put differently: the inspect.getfullargspec() result looks at function objects several stages further along the path from writing code to calling the function. If an analogy helps here: imagine being given detailed instructions on how to walk from a train station in a big city to a major museum, in a busy (pre-Covid-19) city. The detailed instructions can't account for any street vendors, road works, or inviting café's along the way, so your mobile phone GPS may not entirely agree with what path you actually took once you reach the museum. The detailed instructions are the AST, the inspect.getfullargspec() return value the actual path travelled.

    That said, disregarding dynamic content, the inspect signature object should give you the same information as what you can get from the AST. The AST shows you what primitive objects and variables the Python interpreter will use to create the function object, and the inspect.getfullargspec() result only looks at the same pieces of information, but after the function was created.

    You only printed the ast.arguments object __dict__, you want to use ast.dump() function to see a bit more. You have a tree of objects reflecting the source, so you need to look at the full depth to see what else is contained.

    I've included all AST attributes in my dump, so you can see the annotations too:

    >>> import ast
    >>> code = "def f(p,t,h, *, foo: str = 'foo'): return True"
    >>> module = ast.parse(code)
    >>> print(ast.dump(module.body[0].args, indent=4))
    arguments(
        posonlyargs=[],
        args=[
            arg(arg='p'),
            arg(arg='t'),
            arg(arg='h')],
        kwonlyargs=[
            arg(
                arg='foo',
                annotation=Name(id='str', ctx=Load()))],
        kw_defaults=[
            Constant(value='foo')],
        defaults=[])
    

    (the indent option to ast.dump() is new in Python 3.9).

    You can see that all the same information is contained; the annotations are part of the ast.arg() objects contained; you'd have to walk the tree to collect those into a dictionary:

    >>> annotations = {}
    >>> for node in ast.walk(module.body[0].args):
    ...     if isinstance(node, ast.arg) and node.annotation:
    ...         ann = node.annotation  # could be any valid Python expression!
    ...         annotations[node.arg] = ann.id if isinstance(ann, ast.Name) else str(ann)
    ...
    >>> annotations
    {'foo': 'str'}
    

    To further illustrate, this is the AST tree for the contrived dynamic function definition using random values example I gave:

    >>> print(ast.dump(module.body[2].args, indent=4))
    arguments(
        posonlyargs=[],
        args=[
            arg(arg='bar'),
            arg(arg='baz'),
            arg(
                arg='arg',
                annotation=Name(id='annotation', ctx=Load()))],
        kwonlyargs=[],
        kw_defaults=[],
        defaults=[
            Call(
                func=Name(id='randint', ctx=Load()),
                args=[
                    Constant(value=0),
                    Constant(value=42)],
                keywords=[]),
            Constant(value=None)])
    

    and the inspect.getfullargspec() result (well, one of the possible results):

    >>> inspect.getfullargspec(foo)
    FullArgSpec(args=['bar', 'baz', 'arg'], varargs=None, varkw=None, defaults=(None, 16), kwonlyargs=[], kwonlydefaults=None, annotations={'arg': <class 'int'>})
    

    There are obvious differences:

    • The arg annotation is the name annotation in the AST, because the expression that references the annotation variable has not been executed. In the getfullargspec() the outcome of random.choice() picked the int class, but it could also have been <class 'str'>.
    • The first keyword argument default is a call expression in the AST; the Python source code for randint(0, 42) is reflected, not the result of calling that function. The inspect result shows that in my case, 16 was picked.