Search code examples
clinuxforksleepmemory-mapped-files

child process seems to get stuck in sleep in a while loop


I have a C program that forks a child process at some point in a loop. The child process waits for the parent process to finish its job (some numerical calculations). If things go wrong, the parent process aborts and the child process should continue from the state when it was forked and retry the calculation with some modifications. Otherwise, the parents keeps running, and the child process should be killed.

The communication between the parent and child process is through a memory mapped file, which only has 1 byte as a character that indicates the status of the parent process.

The memory map is done like this

    char child_flag[]="W";
    
    fp1 = fopen( "child_interface.dat","wb");
    // the interface file has two bytes, but only one is meaningful to the program
    fwrite(child_flag, 1, sizeof(child_flag), fp1); 
    fclose(fp1);
    printf("child_interface.dat created\n");
    
    if(mmap_child_flag() ==0) {
        printf("memory map of parent-child interface successful.\n");
        fflush(stdout);
    }

The wait loop in the child process is like this

child_pid = fork();                     
if (child_pid ==0) { /* child process, wait for parent process to finish*/

    mmap_child_flag();

    while(child_file[0]=='W' ){  //Child waits
        usleep(100000);
    }
    if(child_file[0]=='R'){ // run child process (as a new parent process)
        child_file[0]='W';
        goto label2;
    }
    if(child_file[0]=='K'){ //Kill child process
        exit(0);
    }
}

The problem is that the child process seems to get stuck in the sleep while loop, even when the parent process has set the status to 'K' (checked in the file that is memory mapped). This code has been run on several linux based super computers, and the behavior seems very inconsistent. On some platforms, it can run smoothly, but on some others, it constantly get stuck in the while loop. Sometimes, if I add some statements inside the while loop after the usleep call, it can then run just fine.

However, I'm not sure if the sleep while loop is the root cause of this problem. My guess is that because the process has almost nothing to do except to check a byte in the memory, the system let it sleep all the time and somehow "forget" to let it check the memory. Can such thing happen in the Linux system?

This the function that does the actual mapping

/* Memory map for parent-child processes interface */
int mmap_child_flag()
{
    int fd_child;    
    struct stat st_child; 
    
    // open files
    if ((fd_child = open("child_interface.dat", O_RDWR)) == -1){
        perror("open child_interface.dat");
        exit(1);
    }
    // stat
    if (stat("child_interface.dat", &st_child) == -1){
        perror("stat of child_interface.dat");
        exit(1);
    }
    // map, child_file is global char array
    child_file = mmap(0, st_child.st_size, PROT_WRITE, MAP_SHARED, fd_child, 0);
    if (child_file == (char *)(-1)) {
        perror("mmap child_interface.dat");
        exit(1);
    }
    return 0;
}

Solution

  • The problem is that the child process seems to get stuck in the sleep while loop, even when the parent process has set the status to 'K' (checked in the file that is memory mapped).

    There are several odd things about your program, with one of them being that you are using shared memory for this task at all. See below for a better approach.

    Issues with the current approach

    As to the question as it stands, however, you have a synchronization problem. The contents of the mapped memory are being changed outside the scope of the child process, but you've given it no reason to suspect that that might be the case. The compiler can therefore assume that if the wait loop condition is satisfied when it is first evaluated, then it will be satisfied on every subsequent evaluation, too.

    For a more complicated interaction, you might need to set up a process-shared mutex or similar to guard access to the shared memory, but for this, it would probably be sufficient to declare child_file as a pointer to volatile char.

    A better approach

    You want the child to wait for a one- or maybe two-byte instruction from the parent. You presently do this by polling the contents of a shared memory segment, but that's complex to set up and use, as you discovered. It would be a lot easier to use a pipe to convey the needed information from parent to child:

    • setup: Declare an array. Call pipe().
    • child use: The child performs a blocking read() on the pipe.
    • parent use: write() the message to the pipe when ready, then close it. Or just close it.

    Note that the pipe itself then provides adequate synchronization, and that there is no need for a wait loop. Note also that the child can detect the case that the parent dies without sending any message, which your shared memory approach does not support.