Round-robin processing with MPI (off by one/some)

I have an MPI implementation basically for IDW2 based gridding on a set of sparsely sampled points. I have divided the jobs up as follows:

All nodes read all the data, the last node does not need to but whatever.

Node0 takes each data point and sends to nodes 1...N-1 with the following code:

int nodes_in_play = NNodes-2;
for(int i=0;i < data_size;i++)
{
    int dest = (i%nodes_in_play)+1;
    //printf("Point %d of %d going to %d\n",i+1,data_size,dest);
    Error = MPI_Send(las_points[i],3,MPI_DOUBLE,dest,PIPE_MSG,MPI_COMM_WORLD);
    if(Error != MPI_SUCCESS) break;
}

Nodes 1...N-1 perform IDW based estimates



for(int i=0;i<=data_size-nodes_in_play;i+=nodes_in_play)
{
    Error = MPI_Recv(test_point,3,MPI_DOUBLE,0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
    if(status.MPI_TAG == END_MSG) break;
    ... IDW2 code
    Error = MPI_Send(&zdiff,1,MPI_DOUBLE,NNodes-1,PIPE_MSG,MPI_COMM_WORLD);
}

Node N does receive and serializes to output file

This works fine for 3 nodes but with more nodes the IDW loop is off by some due to the tricky loop boundaries and the overall run gets stuck. What would be a simple way run the receive.. process .. send tasks in the in-between nodes. I am looking for a nifty for loop line.

What I have done:

Against my better judgement I have added a while(1) loop in the intermediate nodes, with an exit condition if a message with END_TAG is received. Node0 sends an END_TAG message to all intermediate nodes once all the points have been sent off.

Solution

The while loop is an ugly solution but works with an End flag. I will stick with that for now.