What are the reasons for elapsed time to greatly exceed user+kernel time?

I see a lot of threads asking about situations with elapsed time (wall time) being less than user+kernel time, and I understand how multi-threading can cause this situation. However, when timing an execution of some MPI code via:

$ time mpirun -n 4 ./a.out

I'm seeing elapsed times that range from 4-5 minutes, user times of about 40 seconds, and kernel times of about 40 seconds. I'm thinking that barrier synchronization between processes could be part of the cause, or perhaps time only getting information about a single MPI process, but I'm still not able to rationalize exactly what is causing my readings. Can anyone explain that?

Thanks very much.

Solution

For many processes I expect wall clock time to greatly exceed total CPU time. Few processes are CPU bound so they will spend a lot of time waiting. Fortunately wait times don't get charged to the process anymore. Things that cause waits:

I/O of any sort (disk, network, interprocess pipes, etc.).
Resource synchronization between processes.
Time slices allocated to other processes.
Memory swapping (not to common these days).
Interrupts of pretty well any other sort.

Even heavy duty statistical software is likely to do I/O which will cause CPU utilization to be less than wall clock time.

An extreme example is to copy a large file from one partition on a disk to another partition on the same disk. This can take lots of Wall time with little CPU time. If you have the ability to use ionice you can make the Wall time even higher if the disk is at all busy with other work.

The following command will likely show significantly higher real (wall clock) time than user and sys time combined.

time bash -c "read ans"