Stdio, cin and cout: Programs for use in unix pipes (like grep, sort, etc)

I want to write programs that behave like unix utlities. In particular, I want to use them with pipes, e.g.:

grep foo myfile | ./MyTransformation [--args] | cut -f2 | ...

Three aspects make me wonder how to handle I/O:

  1. According to scources like Useless Use of Cat Award, it would be good to support both, reading from stdin and reading from a file (in the beginning of a pipeline). How is this best accomplished? I'm used to using the <getopt.h> / <cgetopt> stuff for parsing arguments. I could see if there is another file argument besides my options and read from it. If not, read from stdin. That would mean that stdin is ignore if an inut file is supplied. Is this desireable?

  2. According to this question, C++ synchronizes cout and cin with stdio and hence does not buffer well. This leads to a huge decrease in performance. A solution is to disable synchronization: cin.sync_with_stdio(false);. Should a program for use in pipes always disable synchronization with stdio for cin and cout? Or should it avoid using cin and cout and instead use their own form of buffered io?

  3. Since cout will be used for program output (unless an output file is specified), status messages (verbosity like % done) have to go somewhere else. cerr/stderr seems like an obvious choince. However, status are no errors.

In summary, I wonder about the io ahndling of such programs in c++. Can cin and cout be used despite the problems addressed above? Should I/O be handled differently? For example, reading and writing from/to buffered files wheres stdin and stdout are default files? What would be the recommended way to implement such a behavior?


  • The standard idiom if there are no options is:

    int returnCode = 0;
    processFile( std::string const& filename )
        if ( filename == "-" ) {
            process( std::cin );
        } else {
            std::ifstream in( filename.c_str() );
            if ( !in.is_open() ) {
                std::cerr << argv[0] << ": cannot open " << filename << std::endl;
                returnCode = 1;
            } else {
                process( in );
    main( int argc, char** argv )
        if ( argc == 1 ) {
            processFile( "-" );
        } else {
            for ( int i = 1; i != argc; ++ i ) {
                processFile( argv[i] );
        return std::cout ? returnCode : 2;

    There are many variants, however. I found myself doing this so often that I wrote a MultiFileInputStream class whose (template> constructor takes a pair of iterators; it then executes more or less the same code as the above. (All of the significant code is, as usual, in the corresponding streambuf.) Similarly, I have a class to parse out the options (which looks like an immutable std::vector<std::string> once the options have been parsed. So the above would become:

    main( int argc, char** argv )
        CommandLine& args = CommandLine::instance();
        args.parse( argc, argv );
        MultiFileInputStream src( args.begin(), args.end() );
        process( src );
        return ProgramStatus::instance().returnCode();

    (ProgramStatus is another useful class, which handles error output, and the return code. And flushes std::cout and adjusts the error code when you call returnCode() on it.)

    I'm sure that anyone writing Unix filter programs has developed similar classes.

    With regards to question 2: sync_with_stdio is a static member of std::ios_base, so you can call it without an object: std::ios_base::sync_with_stdio( false );. I find this less misleading, since the call will affect all iostream objects. If the IO handling is a blocking point, by all means do it, but most of the time, I don't bother. It's rare for such programs to need any sort of optimization. (Note that if you do call sync_with_stdio, then you should not use any C style IO. But I can't see any reason to use it anyway.)

    With regards to question 3: error messages go to std::cerr, always. You also want to be sure to return a non-zero return code, even if the error wasn't fatal. Something like:

    myprog file1 > tmp && mv tmp file1

    is all to common, and if you have some problem, and don't generate the output, it's a disaster if you don't return a non-zero error code. (That's why I always flush and then check the status of std::cout. A long, long time ago, a user of my program did the above, with a very large file, and the disk was full. It wasn't quite as full afterwards. Since then: always flush std::cout, and check that it worked, before returning OK.)