Search code examples
ctimeexecv

Time the duration of a program called by execv


I am making a C program that uses fork and execv to run other programs in parallel.

I can't seem to be able to time the duration of the execution of the program called by execv since the new process dies immediately after that program is done running. Another complication is not being able to use the parent process to wait for the child process to finish (I am using waitpid) because I need the parent process to do some other work instead of waiting for the child process to be finished.

So my question is: is there a way to measure the duration of the execv call without the use of an auxiliary fork, pthread or text file?

Thank you in advance


Solution

  • Your parent process knows when it issued the fork() system call. That's not exactly the moment that the execv'd process starts running, since the execv() system call takes some amount of time, but it's not totally unreasonable to include that time in the tally. If you accept that limitation, you can just record the start time as the time at which you called fork().

    When the child terminates, the parent will receive a SIGCHLD signal. The default action for SIGCHLD is to ignore it, but you probably want to change that anyway. If you attach a signal handler to SIGCHLD, then in that signal handler you can call waitpid (with the WNOHANG option) until you've received all the child terminated notifications. For each notification, you record the notification time as the process's end time. (Again, if the system is under heavy load, the signal might lag from the termination, causing your time measure to be inaccurate. But most of the time, it will be accurate.)

    Clearly, the parent needs to track more than one child process. So you'll need to use the child's PID to index these values.

    Now you have a start time and an end time for each child process.

    There's a small problem, though. You cannot attach the start time to the child process's PID until the fork() call returns to the parent. But it's entirely possible that the fork() call will return to the child, and that the child will call execv() and that the execv()'d process terminates all before the fork() call returned to the parent. (Honest. It happens.)

    So it is possible for the SIGCHLD handler to receive a notification of the termination of a process whose start time has not yet been recorded.

    This is easy to fix, but when you do so you need to take into account the fact that signal handlers cannot allocate memory. So if you're recording the start and end time information in dynamically allocated storage, you need to have allocated storage before the signal handler runs.

    So the code will look something like this:

    1. Allocate storage for a new process times table entry
       (PID / start time / end time / status result). Set all
       fields to 0 to indicate that the entry is available.
    2. Recall the current time as start_time (a local variable,
       not the table entry).
    3. Fork()
    4. (Still in the parent). Using an atomic compare-and-swap
       (or equivalent), set the PID of the table entry created
       in step 1 to the child's PID. If the entry was 0 (and is
       now the PID) or if the entry was already the PID, then
       continue to step 6.
    5. If the entry has some other non-zero PID, find an empty entry
       in the table and return to step 4.
    6. Now record the start time in the table entry. If the table entry
       already has an end time recorded, then the signal handler already
       ran and you know how long it took and what its return status is.
       (This is the case where the child terminated before you got to
       step 4.) You can now report this information.
    

    In the SIGCHLD signal handler, you need to do something like this:

    For each successful call to waitpid():
    1. Find the entry in the child process information table whose PID
       corresponds to the PID returned by waitpid(). If you find one,
       skip to step 4.
    2. Find an empty entry in the child process information table.
       Note that the signal handler cannot be interrupted by the main
       program, so locking is not required here.
    3. Claim that entry by setting its PID field to the PID returned by
       waitpid() above.
    4. Now that you have an entry, record the end time and return status
       information in the table entry. If the table entry existed
       previously, you need to put the entry on a notification queue
       so that the main process can notify the user. (You cannot call
       printf in a signal handler either.) If the table entry didn't
       exist before, then the main process will notice by itself.
    

    You might have to draw some diagrams to convince yourself that the above algorithm is correct and has no race conditions. Good luck.

    Also, if you haven't done any of these things before, you'll want to do some reading :-)

    • waitpid(). Pay particular attention to the macros used to extract status information.

    • sigaction(). How to assign a handler function to a signal. If that's still greek to you, start with signal(7) or a relevant chapter in your Unix programming textbook.

    • Race conditions (from Wikipedia)

    • Compare and Swap (on Wikipedia). (Don't use their sample code; it doesn't work. GCC has a built-in extension which implements atomic compare and swap on any architecture which has a way of supporting it. I know that section is marked legacy and you should use the more complicated functions in the next section __atomic, but in this case the defaults are fine. But if you use __atomic_compare_exchange_n, kudos.)