how to measure the cost of context switching more precisely

I am writing some code to roughly measure the cost of context switching. The basic idea is inside the textbook OSTEP. And based on the idea, I wrote some code as below:

#define _GNU_SOURCE
#define _POSIX_C_SOURCE 199309L

#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <unistd.h>
#include <time.h>

#define TIMES 1000
#define BILLION 10e9

int main(int argc, char *argv[]) {
    int pipefd_1[2], pipefd_2[2];

    struct timespec start, stop;
    clockid_t clk_id = CLOCK_REALTIME;

    // for child and parent process run on the same cpu
    cpu_set_t set;
    int parentCPU, childCPU;

    char testChar = 'a';        /* Use for test */

    if (argc != 3) {
        fprintf(stderr, "Usage: %s parent-cpu child-cpu\n",
                argv[0]);
        exit(EXIT_FAILURE);
    }

    parentCPU = atoi(argv[1]);
    childCPU = atoi(argv[2]);

    CPU_ZERO(&set);

    if (pipe(pipefd_1) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }

    if (pipe(pipefd_2) == -1) {
        perror("pipe");
        exit(EXIT_FAILURE);
    }

    switch (fork()) {
        case -1:    /* error */
            perror("fork");
            exit(EXIT_FAILURE);
            
        case 0:     /* child process */
            CPU_SET(childCPU, &set);

            if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
                perror("set cpu for child process");
                exit(EXIT_FAILURE);
            }

            char readChar_c;

            close(pipefd_1[0]);     /* Close unused read end */
            close(pipefd_2[1]);     /* Close unused write end */

            for (int i = 0; i < TIMES; ++i) {
                while (read(pipefd_2[0], &readChar_c, 1) <= 0) {}       /* read to the first pipe */
                write(pipefd_1[1], &readChar_c, 1);                     /* write to the first pipe */
            }

            close(pipefd_2[0]);
            close(pipefd_1[1]);

            exit(EXIT_SUCCESS);

        default:    /* parent process */
            CPU_SET(parentCPU, &set);

            if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
                perror("set cpu for parent process");
                exit(EXIT_FAILURE);
            }

            char readChar_p;

            close(pipefd_2[0]);     /* Close unused read end */
            close(pipefd_1[1]);     /* Close unused write end */
            
            clock_gettime(clk_id, &start);
            for (int i = 0; i < TIMES; ++i) {
                write(pipefd_2[1], &testChar, 1);                   /* write to the second pipe */
                while (read(pipefd_1[0], &readChar_p, 1) <= 0) {}   /* read to the first pipe */
            }
            clock_gettime(clk_id, &stop);

            close(pipefd_2[1]);
            close(pipefd_1[0]);

            printf("the average cost of context switching is: %lf nsec\n", ((stop.tv_sec - start.tv_sec) * BILLION
                         + stop.tv_nsec - start.tv_nsec) / TIMES);
    }

    exit(EXIT_SUCCESS);
}

but I still have some questions about this problem.

I've read other's code, and they just use read(pipefd_2[0], NULL, 0) and write(pipefd_1[1], NULL, 0) to perform read and write operation. I am not sure if you have not written some data to pipe1 in one process yet, and you want to read the data through pipe1 in other process, will context switching occur in this situation? or read function just return 0?
Since context switching will happen if you read through pipe, the precise cost of context switching should be from leaving that process to entering the other process and not include the time of executing some instructions in later process, so I think use this way to calculate the cost of context switching may not be precise enough. Is this because the execution time is negligible compared to switching contexts?

Thanks for your help!

Solution

#define BILLION 1e9 //not 10e9
The code is OK. read() does not return 0 if there's no data in the pipe—it blocks.

That's why the ping pong you're doing effectively measures the cost of context switches (+IO overhead).

read() returns 0 for the read end of a pipe only when all OS-counted references (created via dup* functions or forking in conjuction with fd inheritance) to the corresponding write end are closed.
You're effectively measuring context switches + the pipe's IO overhead. You can measure approximate IO overhead of the pipe separately by adapting the code to use just one pipe on a >=2 core system (so there's almost no context switch per an io call) and making one process a permanent reader and the other a permanent writer (https://pastebin.com/cGDWFdgQ). I'm getting about 2*0.55µs overhead + about 5.5µs for the whole thing so about 4.4µs per context switch).