fork parent-child nested-loops child-process

fork in nested loops

can someone explain How many child processes do this program create? the answer is 127, but I couldn't understand how they got it.

int create(int num){
int i;
for(i=0;i<num;i++)
fork();
}

int main() {
int i;
fork();
for(i=0;i<4;i++)
create(i);
return 0; 
}

Solution

This really sounds like it's a homework problem for a class on operating systems, but it's an interesting problem so I'll answer it for you. First off, let's look at the code as follows. Functionally, it's the same thing, but it'll make things a little easier to digest. Also, to start, let's ignore that initial fork() call. We'll count how many there are if it weren't there, and then if we add it back in, we'll have the same amount of processes, times two.

int main() {
    int i, j;
    // fork();
    for (i = 0; i < 4; i++) {
      for (j = 0; j < i; j++) {
        fork();
      }
    }
}

Now this is partly a math problem, and partly a programming problem. First, we need to understand what happens when we call fork(). When you create a child process, the child inherits it's own copy of all of the parent's variables at the variables' current values at the time at which the fork() call was made. So that means that now, the parent and the child have copies of the exact same variables with the exact same values, but they can modify those variables independently, and they won't effect each other. So then in the following simple example,

int main() {
    int i = 0, pid = fork();

    if (pid == 0) {
      i = 1;
    }

    if (pid > 0) {
      i = 2;
    }
}

In the parents world, i gets the value 2, and in the child's world i gets the value 1, and these are now separate variables we're talking about so the parent can have what it wants, the child can have what it wants, they don't conflict, and everybody's happy.

Now, to answer your problem, we have to keep this in mind. So let's see how many processes we have first without the initial fork() call. Well now, the parent itself will spawn 6 child process. For each of those processes, the variables (i,j) will have values (1,0), (2,0), (2,1), (3,0), (3,1), and (3,2), respectively.

So the last child spawned at (3,2) will exit the loop, and won't spawn any more children. The child spawned at (3,1) will then continue the for loops, increment j, spawn another process, and then both children will see (i,j) at (3,2), exit the for loops, and then die. Then we had another child spawned by the parent at (3,0). Well now this child will continue through the for loops, spawn a child at (3,1) and (3,2), and then die, and then this new child spawned at (3,1) will spawn another child and then they'll die. I think we can see this is starting to get pretty complex, so we can represent this situation with the following graph.

Each vertex of the graph represents a process and the vertex labeled p is the parent process. The ordered pair on each of the edges represents the values of (i,j) at the time at which the child process was spawned. Notice how we can group the processes. In that first group, we have 1 process, the next, we have 2, the next 4, then 8, and we should see now how things are going. The next group will have 16 processes, and the next group will have 32. Therefore, if we count all the processes we have, including the parent, we have 64 process. Make sense so far?

Well now let's put that initial fork() call back in. That will cause the exact same situation that we just described to happen twice, which would give us 128 process in total, including the parent, which means that we have spawned 127 children.

So yeah, half math problem, half programming problem. Let me know your questions.

You could rewrite the first loop to be for (i = 1; i <= n; i++). Then I'm pretty sure we could say that in general, your parent process will spawn $2(\sum_{k=1}^{N} 2^{k}) - 1$ children, where $N=\frac{n(n+1)}{2}$