Search code examples
cshellprocesspipefork

Communicating across child processes with a pipe


I have been tasked with creating my own shell in c. I am to use fork(), pipe(), exec(), and wait() to achieve this. I have a good start, but the more I research about pipes, the more confused I get. Every example of piping to a child processes looks like this:

parent calls pipe(), fork() a child, close respective ends of pipe, communicate!

I completely understand this. I have implemented it before. My problem is with how simple the example is. In creating a shell, I need two children to communicate with each other through a pipe in order to run a command like "cat file | grep hello". I can imagine a few ways of doing this. This was my first idea:

child1 writes to pipe and exits, child2 is created and reads from the pipe

This doesn't seem to work. I could just be that my code is flawed, but I suspect my understanding of pipes and file descriptors is insufficient. I figured that since pipe was called in main() and fd[] is a file variable, this strategy should work. The Linux manual states "At the time of fork() both memory spaces have the same content." Surely my child processes can access the pipe through the same file descriptors.

Is there a flaw in my understanding? I could try to make the processes run concurrently like so:

child both run concurrently, main waits for both to finish

But I'm not sure why this would behave differently.

Question: If a process writes to a pipe, but there is no immediate second process to read that data, does the data get lost?

Most examples online show that each process needs to close the end of the pipe that it is not using. However, occasionally I see an example that closed both ends of the pipe in both processes:

close(fd[1]);
dup2(fd[0], STDIN_FILENO);
close(fd[0]);

As best I can tell, dup2 duplicates the file descriptor, making 2 open file descriptors to the same file. If I don't close BOTH, then execvp() continues to expect input and never exits. This means that when I am done with the reading, I should close(stdin).

Question: With 2 children communicating over a pipe, does the main process need anything with the pipe, such as close(fd[0])?


Solution

  • If a process writes to a pipe, but there is no immediate second process to read that data, does the data get lost?

    No.

    By default, writing to a pipe is a blocking action. That is, writing to a pipe will block execution of the calling process until there is enough room in the pipe to write the requested data.

    The responsibility is on the reading side to drain the pipe to make room, or close their side of the pipe to signal they no longer wish to receive data.

    With 2 children communicating over a pipe, does the main process need anything with the pipe, such as close(fd[0])?

    Each process involved will have its own copy of the file descriptors.

    As such, the parent process should close both ends of the pipe (after both forks), since it has no reason to hold onto those file descriptors. Failing to do so could result in the parent process running out of file descriptors (ulimit -n).


    Your understanding of dup2 appears to be correct.

    close(fd[1]);
    dup2(fd[0], STDIN_FILENO);
    close(fd[0]);
    

    Both ends of the pipe are closed because after dup2 the file descriptor usually associated with stdin now refers to the same file description that the file descriptor for the read end of the pipe does.

    stdin is of course closed closed when the replacement process image (exec*) exits.


    Your second example of forking two processes, where they run concurrently, is the correct understanding.

    In your typical shell, piped commands run concurrently. Otherwise, as stated earlier, the writer may fill the pipe and block before completing its task.

    Generally, the parent waits for both processes to finish.


    Here's a toy example. Run as ./program FILE STRING to emulate cat FILE | grep STRING.

    #include <stdio.h>
    #include <sys/wait.h>
    #include <unistd.h>
    
    int main(int argc, char **argv) {
        int fds[2];
    
        pipe(fds);
    
        int left = fork();
    
        if (0 == left) {
            close(fds[0]);
            dup2(fds[1], fileno(stdout));
            close(fds[1]);
            execlp("cat", "cat", argv[1], (char *) NULL);
            return 1;
        }
    
        int right = fork();
    
        if (0 == right) {
            close(fds[1]);
            dup2(fds[0], fileno(stdin));
            close(fds[0]);
            execlp("grep", "grep", argv[2], (char *) NULL);
            return 1;
        }
    
        close(fds[0]);
        close(fds[1]);
    
        waitpid(left, NULL, 0);
        waitpid(right, NULL, 0);
    }