Search code examples
c++booststream

What's the purpose of Boost pipe and why it's important?


Apologies if this question is overly broad. I'm new to C++ and trying to understand different stream types and why they matter (or doesn't matter).

I'm learning by coding a simple program that launch a child process, and process the output. I'm following the Boost process synchronous IO example: https://www.boost.org/doc/libs/1_75_0/doc/html/boost_process/tutorial.html#boost_process.tutorial.io.

One of the example can be reduce to this:

#include <boost/process.hpp>

using namespace std;
using namespace boost::process;

int main(int argc, char *argv[]) {
  opstream in;
  ipstream out;

  child c("c++filt", std_out > out, std_in < in);

  in << "_ZN5boost7process8tutorialE" << endl;

  in.pipe().close(); // This will help c++filt quit, so we don't hang at wait() forever

  c.wait();
  return 0;
}

My question is:

Why do we have to use a boost opstream? Can I use istringstream instead (besides that it doesn't compile)? Can make it compile with istringstream?

Boost document said:

Boost.process provides the pipestream (ipstream, opstream, pstream) to wrap around the pipe and provide an implementation of the std::istream, std::ostream and std::iostream interface.

Does being a pipe matter, i.e. does pipe have significant implication here?


Solution

  • What Are Processes, How Do They Talk?

    Programs interact with their environment in various ways. One set of channels are the standard input, output and error streams.

    These are often tied to a terminal or files by a shell (cmd.exe, sh, bash etc).

    Now if programs interact with eachother, like:

    ls | rev
    

    to list files and send the output to another program (rev, which reverses each line), this is implemented with pipes. Pipes are an operating system feature, not a boost idea. All major operating systems have them.

    Fun fact: the | operator used in a most shells to indicate this type of output/input redirection between processes is called the PIPE symbol.

    What Is A Pipe, Then?

    Pipes are basically "magic" file-descriptors that refer to an "IO channel" rather than a file. Pipes have two ends: One party can writes to one end, the other party reads from the other.

    Why?

    Two reasons that come to mind right away

    • Files require disk IO and syncing, making it slow

      Another fun fact: MSDOS has implemented pipes in terms of temporary files (on disk) for a very long time:

      MS-DOS 2.0 introduced the ability to pipe the output of one program as the input of another. Since MS-DOS was a single-tasking operating system, this was simulated by redirecting the first program’s output to a temporary file and running it to completion, then running the second program with its input redirected from that temporary file. Now all of a sudden, MS-DOS needed a location to create temporary files! For whatever reason, the authors of MS-DOS chose to use the TEMP variable to control where these temporary files were created.

    • The pipe enables asynchronous IO. This can be important in case processes have two-way (full duplex) IO going on.

    Okay Do I Care?

    Yes, no, maybe.

    You mostly don't. The ipstream/opstream classes are 100% compatible with std::istream/std::ostream, so if you had a function that expects them:

    void simulate_input(std::ostream& os)
    {
        for (int i = 0; i < 10; ++i) {
            os << "_ZN5boost7process8tutorialE" << std::endl;
        }
    }
    

    You can perfectly use it in your sample:

    bp::opstream in;
    bp::ipstream out;
    
    bp::child c("c++filt", bp::std_out > out, bp::std_in < in);
    
    simulate_input(in);
    in.close();
    
    c.wait();
    

    When You Definitely Need It

    In full-duplex situations where you could easily induce a deadlock where both programs are waiting for input from the other end because they're doing the IO synchronously.

    You can find examples + solution here: