Search code examples
cmacospipeposixnamed-pipes

Why is a FIFO pipe on macOS ~8x slower than an anonymous pipe?


On an M1 Max, I have created a FIFO named pipe with mkfifo and am testing write/read performance with a simple C program and pv. The program writes 65536 bytes at a time to stdout. When doing ./writer | pv > /dev/null, I get ~8 GiB/s. When doing ./writer >> mypipe and pv mypipe > /dev/null, I get ~1 GiB/s. For both of these, if I print the amount of writes performed, the 8x factor is about the same between the two. I've yet to test this on Linux and have not found any fcntl I can run on macOS/darwin that can change the buffer size of the FIFO pipe.

What I'd like to know is:

  • Why is the named pipe approach I'm doing slower?
  • Can it be made faster with some system level configuration or file control?

This is the C program:

#include <unistd.h>
#include <string.h>
#include <stdio.h>

int main() {
    const int size = 65536;
    char buf[size];
    memset(buf, 0, size);
    while (1) {
        if (write(1, buf, size) != size) {
            fprintf(stderr, "bad\n");
        }
    }
    return 0;
}

I've verified the most I can write before an anonymous pipe gets blocked is 65536 (M=0; while printf A; do >&2 printf "\r$((++M)) B"; done | sleep 999)


Solution

  • In macOS (14.4.1), fifo pipes are ultimately AF_UNIX sockets and their limits are dictated by https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/bsd/kern/uipc_proto.c#L84-L92 and https://github.com/apple-oss-distributions/xnu/blob/94d3b452840153a99b38a3a9659680b2a006908e/bsd/kern/uipc_usrreq.c#L920-L933 where at the time of this writing, the buffer size on receive and send of a streaming socket (SOCK_STREAM) is #define PIPSIZ 8192. Anonymous pipes are not the same concept and from testing, they appear to have a buffer size that is at least 65536, which is 8x PIPSIZ.

    From running dtrace on the writer side, it appears each write syscall does return with 65536, but I believe because the read buffer is constrained to 8192. So we will end up paying the cost of context switching on the reader side getting 8192 bytes from read in the kernel, returning to the user, and calling read again. So while the write side is in the kernel writing those 65536 bytes (which mind you, is saving us some context switching, so that's nice), the read side has no choice but to keep switching back and forth.

    In terms of how to increase this number or change some flags, I haven't been able to find out how since the soreceive method that uipc calls doesn't seem to accept any user set IO flags. As a workaround, you could introduce some forking with a pipe pair created in process to share between the reader and writer, but that may be too contrived.

    Notes:

    • The XNU code wasn't that easy for me to read and I may be wrong about my assumption of what kind of vnode file is being created and which functions are called, but the empirical data supports my theory so far.
    • I am also not sure why the soreceive returns immediately when the writer still has data to push into the buffer but it may have to do with locking the data structure down and possibly timing/multi-threading as well.