Search code examples
pythonexceptionpython-internals

Get Intermediate Value in Python?


I'm trying to write a Python sys.excepthook which, in addition to printing out the stack trace for the code as you wrote it, also prints out the repr for each evaluated value.

For example, if I ran the following code:

def greeting():
    return 'Hello'

def name():
    return

greeting() + name()

Instead of just printing out:

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
        greeting() + name()
TypeError: cannot concatenate 'str' and 'NoneType' objects

It would also print out 'Hello' + None so I can immediately see which value was invalid and know the right area of the code to look in (obviously this is a very simple example).

I know that the CPU needs to store these intermediate values in some temporary registers... I suspect that internally Python has to do something similar and I'm hoping that there's some way I can access those temporary values, possibly through the inspect module or something similar.


Solution

  • By the time that sys.exceptionhook() is called, you can't get those intermediary values any more, as they are already gone. Yes, the intermediary results of component expressions are stored somewhere by Python. You can't access that 'somewhere' directly at the time, nor are those kept around at all when an exception occurs.

    In CPython, the standard Python implementation, that 'somewhere' is the stack attached to the current frame of execution (each active function has one). Python code is compiled to bytecode, and an evaluation loop then executes that bytecode, and individual bytecode instructions in bytecode operate on that stack.

    You can use the dis.dis() function to see what bytecode is used for your example expression:

    >>> import dis
    >>> dis.dis("greeting() + name()")
      1           0 LOAD_NAME                0 (greeting)
                  2 CALL_FUNCTION            0
                  4 LOAD_NAME                1 (name)
                  6 CALL_FUNCTION            0
                  8 BINARY_ADD
                 10 RETURN_VALUE
    

    then look up what those bytecode instructions do:

    • LOAD_NAME 0 finds the object named greeting and puts that on the top of the stack (TOS).
    • CALL_FUNCTION 0 removes 0 elements from the stack to be the arguments for a call, then takes the next object from the stack to be the callable object, calls that object with the arguments, and puts the result as the new TOS.
    • BINARY_ADD takes the top two elements from the stack, adds them up, and puts the result back on TOS.

    So together, LOAD_NAME and CALL_FUNCTION execute a call to a named object, and the top of the stack ends up referencing both results, the name() result on top of the greeting() result. The BINARY_ADD instruction then replaces those two results on the stack with the result of adding them together.

    You don't have access to that stack from within Python, because it is the very act of executing Python bytecode that makes Python work in the first place. Any code that could access the stack would have to deal with the fact that stack is currently being used to execute that Python code!

    But you have a bigger problem. If you look at the CPython source code, you can search for instruction names in the evaluation loop in ceval.c. When you look at the BINARY_ADD instruction implementation, you can see that the two input values are removed from the stack before adding them together:

    TARGET(BINARY_ADD) {
        PyObject *right = POP();
        PyObject *left = TOP();
        PyObject *sum;
        // code to set sum as the result of addibg left to right
        SET_TOP(sum);
        if (sum == NULL)
            goto error;
        DISPATCH();
    }
    

    If BINARY_ADD fails with an exception, sum == NULL is true and goto error is executed to wind down the call stack and propagate the exception along to the first try block or, failing that, eventually calling the sys.excepthook() function. At that point, the intermediary results are gone from the stack. The local right and left pointers in the above block are also long, long gone (C uses block scope, and when goto error is executed the scope is exited so the variables are lost).