Search code examples
pythonpython-3.xinspect

How to deal with limitations of "inspect.getsource" - or how to get ONLY the source of a function?


I have been playing with the inspect module from Python's standard library.

The following examples work just fine (assuming that inspect has been imported):

def foo(x, y):
    return x - y
print(inspect.getsource(foo))

... will print def foo(x, y):\n return x - y\n and ...

bar = lambda x, y: x / y
print(inspect.getsource(bar))

... will print bar = lambda x, y: x / y\n. So far so good. Things become a little odd in the following examples, however:

print(inspect.getsource(lambda x, y: x / y))

... will print print(inspect.getsource(lambda x, y: x / y)) and ...

baz = [2, 3, lambda x, y: x / y, 5]
print(inspect.getsource(baz[2]))

... will print baz = [2, 3, lambda x, y: x / y, 5].

The pattern seem to be that all relevant source code lines regardless of context are returned by getsource. Everything else on those lines, in my case stuff other than the desired function source / definition, is also included. Is there another, "alternative" approach, which would allow to extract something that represents a function's source code - and only its source code - preferably in some anonymous fashion?


EDIT (1)

def foo(x, y):
    return x - y
bar = [1, 2, foo, 4]
print(inspect.getsource(bar[2]))

... will print def foo(x, y):\n return x - y\n.


Solution

  • Unfortunately, that's not possible with inspect, and it's unlikely to work without parsing (and compiling) the source code again. inspect's getsource method is rather limited: it uses getsourcelines to call then findsource, which essentially unwraps your object until we end up at a PyCodeObject.

    At that point, we're dealing with compiled bytecode. All that's left from the original source are fragments and hints, such as co_firstlineno:

    /* Bytecode object */
    typedef struct {
        /* ... other fields omitted ... */
        int co_firstlineno;         /* first source line number */
        PyObject *co_code;          /* instruction opcodes */
        /* ... other fields omitted ... */
    } PyCodeObject;
    

    By the way, similar to the PyCodeObject, a PyFrameObject also contains only a f_lineno, but no column, which explains why tracebacks only include the file name as well as the line: the column isn't compiled into the bytecode.

    As the bytecode does not contain any more specific regions than the (first) line, it's not possible to get the exact source location from inspect or any other library that only uses the (public) bytecode information without further parsing. This also holds true for any other option that only uses the bytecode, such as pickle.

    inspect uses the public information (co_firstlineno) and then just searches for a suitable begin of a function and the end of the surrounding block. However, inspect is almost there, but it only finds any block, not the correct one, and it cannot find the correct one at the moment. inspect tokenizes the full line and does not start at the correct variant, it wouldn't know the correct corresponding source code region either.

    Let's say we have

    plus, minus, mult = lambda x: x + 1, lambda y: y - 1, lambda z: z * 5
    

    and we want just minus. As the bytecode does not contain a co_firstcolumn, we only have the full line available. We could parse all lambdas, but we still don't know which lambda fits our co_code. We would need to compile them again and check whether their bytecode fits the original one.

    In the end, we have to do exactly that: parse the source again and find the correct PyCodeObject. It would be a lot easier if we had at least a starting column number as we could just use a syntactical analysis, but the AST only preserves line numbers at the moment. So either inspect needs a big patch, or the bytecode needs to include the starting column of the compiled object.