Search code examples
parallel-processingmpihpc

Would communication and inter-connection have any impact on a computation bound application on multinode?


I have a computation bound application. I have executed it on multi-nodes ( 4nodes, 8nodes) I'm wondering if communication between the nodes could have any effect on the run time? If so, how would it be possible? because as far as I found, computation bound application just depends on the computing capability of system.

Also, can I consider CPU amount of my system as computing capability?

Any help would be appreciated.

Updated:

In order to see if the application is memory-bound or compute-bound, I've run the application over 1 nodes using different number of cores. For that application (NPB-LU), the run time decreased linearly by increasing the number of cores. So I found this application could be compute-bound (I didn't have another option to figure it out).

Then, I have predicted the run time of the application with a model which considers the latency(in my case it's message-time) in different connection levels like inter-socket, inter-node. There are some difference in the predicted time which achieved by different latency connection levels although the application seemed to be computation-bound.

communication-model

performance-model

n:grid size, p:number of cores, m(total Mops/s), f(Mop/s/core)


Solution

  • Imagine you have horse that is drinking water, let's say 1 liter per minute.

    In order to give the water to the horse you have a water well where you can take the water from. Imagine you can pump up to 1.5 liters per minute.

    Having this situation your water consumption is horse-bounded.

    Then it turns out that you have two horses drinking the same amount of water: 1 liter each per minute. Then your water consumption is no longer horse-bounded but well-bounded.

    Your application behavior can change depending of the environment. In order to determine what is happening to your application I recommend you to profile your app. You have a lot of alternatives such as gprof, perf, PAPI and many others to better observe what is your application behaviour.

    Then you can determine experimentally very intersting metrics like Instructions per Clock cycle, which can give you a better understanding of the behaviour of your app.