Search code examples
cposixipcmessage-queue

c mq_open() doesn't connect if it's called before message queue is open


So I have two processes. These each create their own message queue, and try to connect to eachothers. However, for some reason this only works one way.

Process one has the following:

struct mq_attr attr;
  int flags = O_RDWR | O_CREAT;
  attr.mq_flags = 0;
  attr.mq_maxmsg = 3; // ***
  attr.mq_msgsize = sizeof(cache_request);
  attr.mq_curmsgs = 0;

  mqd_t fd, fd2;
  mq_unlink("/mq_one");
  fd2 = mq_open("/mq_two", flags,(S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH),&attr );
  while((fd = mq_open("/mq_one", O_RDWR)) == -1){
    printf("Couldnt connect to message queue in cache\n");
    sleep(2);
  }
  mq_close(fd2);
  mq_unlink("/mq_two");


  printf("connected to message queue.\n");

Process two has the following:

mqd_t fd, fd2;
    //mq_unlink("/mq_one");
    struct mq_attr attr;
    int flags = O_RDWR | O_CREAT;
    attr.mq_flags = 0;
    attr.mq_maxmsg = 3; // ***
    attr.mq_msgsize = sizeof(cache_request);
    attr.mq_curmsgs = 0;

    fd = mq_open("/mq_one", flags,(S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH),&attr );
    printf("opened message queue /mq_one.\n");
    if (fd < 0) {
            printf ("   Error %d (%s) on server mq_open.\n",errno, strerror (errno));
            mq_close(fd);
            mq_unlink("/mq_one");
        exit (1);
    }
        while((fd2 = mq_open("/mq_two", O_RDWR)) == -1){
        printf("waiting on webproxy...\n");
        sleep(2);
      }
        mq_close(fd2);
        mq_unlink("/mq_two");

Essentially, each process opens(creates) its own message queue, and then waits on the other in a loop trying to connect. The problem is that this only works if start process one before process two, and not vice versa. If i start process 2 first, then when i start process 1, process 2 exits the loop and continues running, but process 1 stays in the loop, even though it should see the first message queue. I can't figure out why this is.


Solution

  • You have to deal with two issues: * timing of remove mq_one in process P1 * Dealing with persistent (left connection)

    When you start P2 before P1, P2 will create /mq_ope, but P1 will remove it. At this point P1 will wait (forever) for /mq_one, but P2 does not attempt to create p1 again.

    Consider different strategy: * Each program (P1, P2) will only remove the queue that it created, when it exit.

    This should allow the programs to work correctly regardless of timing (who starts first), and regardless of state (is there a leftover from the previous run).