Search code examples
linuxselectmultiprocessingfifoepoll

Fail to wake up from epoll_wait when other process closes fifo


I'm seeing different epoll and select behavior in two different binaries and was hoping for some debugging help. In the following, epoll_wait and select will be used interchangeably.

I have two processes, one writer and one reader, that communicate over a fifo. The reader performs an epoll_wait to be notified of writes. I would also like to know when the writer closes the fifo, and it appears that epoll_wait should notify me of this as well. The following toy program, which behaves as expected, illustrates what I'm trying to accomplish:

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <unistd.h>

int
main(int argc, char** argv)
{
  const char* filename = "tempfile";
  char buf[1024];
  memset(buf, 0, sizeof(buf));

  struct stat statbuf;
  if (!stat(filename, &statbuf))
    unlink(filename);

  mkfifo(filename, S_IRUSR | S_IWUSR);

  pid_t pid = fork();
  if (!pid) {
    int fd = open(filename, O_WRONLY);
    printf("Opened %d for writing\n", fd);
    sleep(3);
    close(fd);
  } else {
    int fd = open(filename, O_RDONLY);
    printf("Opened %d for reading\n", fd);

    static const int MAX_LENGTH = 1;
    struct epoll_event init;
    struct epoll_event evs[MAX_LENGTH];
    int efd = epoll_create(MAX_LENGTH);

    int i;
    for (i = 0; i < MAX_LENGTH; ++i) {
        init.data.u64 = 0;
        init.data.fd = fd;
        init.events |= EPOLLIN | EPOLLPRI | EPOLLHUP;
        epoll_ctl(efd, EPOLL_CTL_ADD, fd, &init);
    }

    while (1) {
      int nfds = epoll_wait(efd, evs, MAX_LENGTH, -1);
      printf("%d fds ready\n", nfds);
      int nread = read(fd, buf, sizeof(buf));
      if (nread < 0) {
        perror("read");
        exit(1);
      } else if (!nread) {
        printf("Child %d closed the pipe\n", pid);
        break;
      }
      printf("Reading: %s\n", buf);
    }
  }
  return 0;
}

However, when I do this with another reader (whose code I'm not privileged to post, but which makes the exact same calls--the toy program is modeled on it), the process does not wake when the writer closes the fifo. The toy reader also gives the desired semantics with select. The real reader configured to use select also fails.

What might account for the different behavior of the two? For any provided hypotheses, how can I verify them? I'm running Linux 2.6.38.8.


Solution

  • strace is a great tool to confirm that the system calls are invoked correctly (i.e. parameters are passed correctly and they don't return any unexpected errors).

    In addition to that I would recommend using lsof to check that no other process has that FIFO still opened.