Search code examples
c++posixsemaphoreunlink

Why is my semaphore on a forked process not being released?


I have a problem with a POSIX semaphore being used to release a forked process. The forked process is started by calling another instance of the running process after a fork and exec. Sometimes the child is being released and other times it is not.

It is a POSIX shared memory named semaphore and the weird thing is it works sometimes. I checked the other solutions out there and their solutions did not help me.

void init()
{
    ...
    sem_unlink(sem_name.c_str());

    if (parent_process)
    {
        sem_t* semaphore = sem_open(sem_name.c_str(), O_CREAT | O_RDWR, 0);
        if (SEM_FAILED == semaphore)
        {
            display_error();
        }
        sem_close(semaphore);
    }

    child_pid = fork();

    if (child_pid == -1)
    {
        display_error();
    }
    else if (child_pid == 0)
    {
        int ret = execve(program_name, args, env);
        if (ret == -1)
        {
            display_error();
        }
    }
    else
    {
        // rest of code
    }
    ...
}

I had the child process wait to be released in another class that has this function:

void wait_until_released()
{
    if (!parent_process)
    {
        sem_t* semaphore = sem_open(sem_name.c_str(), O_CREAT | O_RDWR, 0);
        if (SEM_FAILED == semaphore)
        {
            display_error();
        }

        sem_wait(semaphore);

        sem_close(semaphore);
        sem_unlink(semaphore);            
    }
}

The post was done in another location in the code:

void release_child()
{
    sem_t* semaphore = sem_open(sem_name.c_str(), O_CREAT | O_RDWR, 0);
    if (SEM_FAILED == semaphore)
    {
        display_error();
    }

    if (sem_post(semaphore) != 0)
    {
        display_error();
    }

    sem_close(semaphore);
    sem_unlink(semaphore);
}

Solution

  • This problem was occurring ultimately because I was calling sem_unlink on the POSIX semaphore before doing a wait on it in the forked process. Calling sem_unlink causes the semaphore to be removed when all file descriptors have called sem_close on the semaphore. This, in essence, prevented my child process from being able to use that instance and be released at all.

    This only works sometimes because there is a base assumption that the child is already waiting to be released by the time we call release_child. This is not guaranteed and was the reason that this was working sometimes and not all the time. If we call release_child before the child has called sem_wait then we remove the semaphore completely and the child creates their own version of the semaphore that never gets posted to.

    By moving the unlink call after the if statement in the first set of code, I prevented the child process from removing the semaphore before waiting on it. Also, by removing O_CREAT flag from the sem_open in the release_child and wait_until_released functions and the sem_unlink from the release_child function, I prevented the child from creating their own semaphore.

    I wanted to record the behavior that I was seeing though because that was what really caused me problems. In the middle of debugging and fixing this problem I learned that if the parent creates the semaphore but doesn't close it, the child was calling the sem_unlink and creating it's own version with the same name. This caused me to believe that the original semaphore was still there but that the sem_post and/or sem_wait were not working.

    So just be aware of your post, wait, close, and unlink calls when you are doing semaphores. Especially when it comes to forked processes!!