Search code examples
pythoncprofile

Run cProfile over a list of files in Python


I've researched other SOF articles about how to do this and have been tinkering with code for a long time now, getting no where. I've also watched YouTube videos about profiling and tried the examples in the cProfile documentation, and they don't seem to cover iterations. No one seems to be discussing using cProfile on files in an iterable object.

So, as of now, here is what I have that still doesn't work. What doesn't work? Well, cProfile.py uses recursion and doesn't get to the next file to iterate over.

I've tried a recursive function, while loop, for loop, and it doesn't matter. As soon as __iter__ and __next__ seems to be picked up by cProfile.py, cProfile.py seems to be run into an infinite loop. I have to use 32-bit Python, so this code will run over 1 or 2 files in the list, over and over again, until Python throws a MemoryError.

I want to run some code to create a list of files, iterate over the list, and run the cProfile.run()_or calling a Python command function over them.

I have to change the recursion limit to get my code to not have a recursion error, but then it just runs until the larger recursion limit infinitely. I don't want to do this at all. In fact, it will process 2 files in the list at the same time, and never move on. I've tried adding command line arguments, which still doesn't work, because the problem seems to be coming from within cProfile.py and how I am using it.

from subprocess import call
from glob import glob
from sys import argv, setrecursionlimit

setrecursionlimit(10000)
files = glob('**/*.py', recursive=True)

def run_cProfile(file):
    call(['python', '-m', 'cProfile', '-s', 'ncalls', file])

for file in files:
    if file == argv[0]:
        continue
    print('Processing file: {}'.format(file))
    run_cProfile(file)

The output prints what you would expect from cProfile, but it just does it on the same file in the list until I either get a MemoryError or RecursionError.

The big picture is that I'm writing a command line program that will run different external profilers on the files passed in as command line arguments, parse the results and save the data in a flat file for analysis. I don't want to have to modify any code for these profilers to run and generate a report. That may be a separate project.

Your help will be greatly appreciated.

Thank you!


Solution

  • The condition if file == argv[0]: will not stop you from calling the same script again because you are checking the fully qualified path of the starting script to paths of the scripts found by glob that are relative to the current directory). E.g.

    print(argv[0])
    print(file)
    
    /home/yourlogin/startscript.py
    startscript.py
    

    You may want to change it to:

    from os import path
    
    for file in files:
        _, startname = path.split(argv[0])
        if file == startname:
            continue
        print('Processing file: {}'.format(file))
        run_cProfile(file)