How to benchmark a C program from a python script?

I'm currently doing some work in uni that requires generating multiple benchmarks for multiple short C programs. I've written a python script to automate this process. Up until now I've been using the time module and essentially calculating the benchmark as such:

start = time.time()
successful = run_program(path)
end = time.time()

runtime = end - start

where the run_program function just uses the subprocess module to run the C program:

def run_program(path):
    p = subprocess.Popen(path, shell=True, stdout=subprocess.PIPE)
    p.communicate()[0]

    if (p.returncode > 1): 
        return False
    return True

However I've recently discovered that this measures elapsed time and not CPU time, i.e. this sort of measurement is sensitive to noise from the OS. Similar questions on SO suggest that the timeit module is is better for measuring CPU time, so I've adapted the run method as such:

def run_program(path):
    command = 'p = subprocess.Popen(\'time ' + path + '\', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE); out, err = p.communicate()'

    result = timeit.Timer(command, setup='import subprocess').repeat(1, 10)
    return numpy.median(result)

But from looking at the timeit documentation it seems that the timeit module is only meant for small snippets of python code passed in as a string. So I'm not sure if timeit is giving me accurate results for this computation. So my question is: Will timeit measure the CPU for every step of the process that it runs or will it only measure the CPU time for the actual python(i.e. the subprocess module) code to run? Is this an accurate way to benchmark a set of C programs?

Solution

timeit will measure the CPU time used by the Python process in which it runs. Execution time of external processes will not be "credited" to those times.