python debugging subprocess dynamic-analysis

Analyzing execution of a Python program from another Python program

I want to write a Python program that analyzes the execution of other arbitrary Python programs.

For example, suppose I have a Python script called main.py that calls a function func a certain number of times. I want to create another script called analyzer.py that can "look inside" main.py while it's running and record how many times func was called. I also want to record the list of input arguments passed to func, and the return value of func each time it was called.

I cannot modify the source code of main.py or func in any way. Ideally analyzer.py would work for any python program, and for any function.

The best way I have found to accomplish this is to have analyzer.py run main.py as a subprocess using pdb.

script = "main.py"
process = subprocess.Popen(['python', '-m', 'pdb', script], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

I can then send pdb commands to the program via the process' stdin and then read the output via stdout.

To retrieve the input parameters and return values of func, I need to

Find the line number of the first line of func by analyzing its file
Send a breakpoint command for this file/lineno
Send continue command
Import pickle, serialize locals(), and print to stdout (to get input parameters)
Send return command (go to end of function)
Serialize __return__ and print to stdout
Send continue command

I'm wondering if there is a better way to accomplish this

Solution

Instead of controlling pdb with pipes, you can just configure your own trace function using sys.settrace before doing import main. (Of course you can also do importlib.import_module("main") or runpy.run_module() or runpy.run_path().)

For instance,

import sys


def trace(frame, event, args):
    if event == "call":
        print(frame.f_code.co_name, frame.f_locals)


sys.settrace(trace)

# (this is where you'd `import main` to cede control to it)

def func(a, b, c):
    return a + b + c


func(1, 2, 3)
func("a", "b", "c")

prints out

func {'a': 1, 'b': 2, 'c': 3}
func {'a': 'a', 'b': 'b', 'c': 'c'}