Search code examples
pythondebuggingsubprocessdynamic-analysis

Analyzing execution of a Python program from another Python program


I want to write a Python program that analyzes the execution of other arbitrary Python programs.

For example, suppose I have a Python script called main.py that calls a function func a certain number of times. I want to create another script called analyzer.py that can "look inside" main.py while it's running and record how many times func was called. I also want to record the list of input arguments passed to func, and the return value of func each time it was called.

I cannot modify the source code of main.py or func in any way. Ideally analyzer.py would work for any python program, and for any function.

The best way I have found to accomplish this is to have analyzer.py run main.py as a subprocess using pdb.

script = "main.py"
process = subprocess.Popen(['python', '-m', 'pdb', script], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

I can then send pdb commands to the program via the process' stdin and then read the output via stdout.

To retrieve the input parameters and return values of func, I need to

  1. Find the line number of the first line of func by analyzing its file
  2. Send a breakpoint command for this file/lineno
  3. Send continue command
  4. Import pickle, serialize locals(), and print to stdout (to get input parameters)
  5. Send return command (go to end of function)
  6. Serialize __return__ and print to stdout
  7. Send continue command

I'm wondering if there is a better way to accomplish this


Solution

  • Instead of controlling pdb with pipes, you can just configure your own trace function using sys.settrace before doing import main. (Of course you can also do importlib.import_module("main") or runpy.run_module() or runpy.run_path().)

    For instance,

    import sys
    
    
    def trace(frame, event, args):
        if event == "call":
            print(frame.f_code.co_name, frame.f_locals)
    
    
    sys.settrace(trace)
    
    # (this is where you'd `import main` to cede control to it)
    
    def func(a, b, c):
        return a + b + c
    
    
    func(1, 2, 3)
    func("a", "b", "c")
    

    prints out

    func {'a': 1, 'b': 2, 'c': 3}
    func {'a': 'a', 'b': 'b', 'c': 'c'}