Search code examples
cipcsemaphorelinux-namespaces

Using POSIX semaphores between Linux kernel namespaces


I'm working on a C application using Linux namespaces, and one thing that's come up is the need to signal the child namespace from the parent using a semaphore (or something similar). Here's what I'm trying to do at the moment:

#define _GNU_SOURCE
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/mount.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <string.h>
#include <semaphore.h> 

//A stack for the container
#define STACK_SIZE (1024 * 1024)
static char stack[STACK_SIZE];

//The semaphore
sem_t semaphore;

int child(void* arg){
    int semval;

    //Print the semaphore state as read from the new namespace
    for(int i=0;i<6;i++){
        sem_getvalue(&semaphore, &semval);
        printf("Semaphore state: %d.\n", semval);
        sleep(1);
    }

    return 1;
}

int main(){
    //Init a shared POSIX semaphore with the value zero
    sem_init(&semaphore, 1, 0); 

    //Create the child namespace
    pid_t pid = clone(child, stack+STACK_SIZE, CLONE_NEWNET | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);

    //Wait, then post the semaphore
    sleep(3);
    printf("Posting semaphore\n");
    sem_post(&semaphore);

    //Wait for it to return
    waitpid(pid, NULL, 0);

    return 0;
}

As I understand it, this should start up child in a new namespace where it will spit out the value of the semaphore a few times. We should be able to see when the parent process in the parent namespace posts it, however, we don't:

Semaphore state: 0.
Semaphore state: 0.
Semaphore state: 0.
Posting semaphore
Semaphore state: 0.
Semaphore state: 0.
Semaphore state: 0.

The semaphore is initialized as shared, and removing CLONE_NEWIPC from the clone call doesn't fix this either (as I understand it, that only deals with isolating SysV IPC anyway, not this POSIX semaphore).

Another quirk of this is that if we initialize the semaphore to a different value (e.g. sem_init(&semaphore, 1, 3);, the child namespace will read that initial value:

Semaphore state: 3.
Semaphore state: 3.
Semaphore state: 3.
Posting semaphore
Semaphore state: 3.
Semaphore state: 3.
Semaphore state: 3.

As such, it's seemingly not totally failing to access the semaphore - it's just failing to see when it's posted. How are you supposed to do this? Is there some special shared memory trick I need to set up to get it to work between namespaces, or am I just doing something wrong here?


Solution

  • sem_* assumes the semaphore memory is shared between the processes/threads.

    When you clone/fork, you get a different copy.

    To fix, add CLONE_VM to your flags argument [preferred].

    Or, ensure that the semaphore is in shared memory (e.g. mmap, shmget, etc) before doing the clone.