Search code examples
pythonperformancetensorflowkeras

invalid results of process_time() when measuring model.fit() performance


I use the snippet below to measure and output the time spent during model fitting.

perf_counter_train_begin = time.perf_counter()
process_time_train_begin = time.process_time()

model.fit(data, ...)

perf_counter_train = time.perf_counter() - perf_counter_train_begin
process_time_train = time.process_time() - process_time_train_begin

print(f"System Time: {perf_counter_train}; Process Time: {process_time_train}")

It is expected that the system time (acquired from time.perf_counter()) might take much greater values than the process time (from time.process_time()) due to various factors like system calls, process scheduling and so on. On the other hand, when I run my neural network training script, I get results like this:

System Time: 51.13854772000013; Process Time: 115.725974476

Judging by my clock, the system time is measured correctly, and the process time is bogus. What am I doing wrong here?


Solution

  • The documentation of the various timing functions can in fact be a bit confusing, here is an explanation for both functions (skip to the end for a summary of the differences):

    time.process_time

    time.process_time() → float Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep. It is process-wide by definition. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid.

    Use process_time_ns() to avoid the precision loss caused by the float type.

    New in version 3.3.

    • "Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process": This means that time.process_time() returns the amount of CPU time used by the current process and all its child processes, expressed in seconds as a floating point number. This includes all the CPU time of the threads spawned by the process, each one measured individually and then they are all added up.

    • "It does not include time elapsed during sleep": This means that the time spent sleeping or waiting for I/O operations is not included in the value returned by time.process_time(). Only the time spent executing code on the CPU is included.

    • "It is process-wide by definition": it only includes time taken by this process (and its children, as mentioned before).

    • "The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid": This means that the absolute value returned by time.process_time() has no specific meaning, since the "zero" point in time is undefined. Instead, the value returned by two successive calls to time.process_time() can be subtracted to get the elapsed CPU time between the two calls, which is the only meaningful measure.

    time.perf_counter

    time.perf_counter() → float Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid.

    Use _counter_ns() to avoid the precision loss caused by the float type. New in version 3.3. Changed in version 3.10: On Windows, the function is now system-wide.

    • "Return the value (in fractional seconds) of a performance counter": This means that time.perf_counter() returns the value of a performance counter, which is a clock that is used to measure elapsed time with high accuracy and precision.
    • "It does include time elapsed during sleep and is system-wide": This means that the time elapsed during sleep is included and system-wide means that it also takes into account the time that other programs take while the main program is paused (such as waiting to receive a message from another running program).
    • "The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid": This means that the absolute value returned by time.perf_counter() has no well-defined meaning, since the "zero" point in time is undefined. Instead, the value returned by two successive calls to time.perf_counter() can be subtracted to get the elapsed time between the two calls, which is the only meaningful measure. (This point is the same as for time.process_time).

    Summary of the differences

    • time.process_time() measures the total CPU time used by the current process (counting the time taken by a process using N threads N times, because the time taken by each thread will be added to the final result), while time.perf_counter() is aligned with real-life (wall clock) time. In general we expect time.process_time() to be longer then time.perf_counter(), as you also have seen in your example. Furthermore, the time.process_time() will generally be longer than the real life time, as more threads can work concurrently at the same time, and all of their computation time will be added up, it will only be equal to the wall time for a single-threaded application.
    • time.process_time() does not include time elapsed during sleep, while time.perf_counter() does include it.