Search code examples
c++bufferfstreamofstream

std::ofstream - no buffering string longer than 1023 (instant flush)


When I changed size of ofstream buffer with pubsetbuf(...), everything works fine, except when I put to ofstream single string longer then 1023 (in the code below). Is it correct behavior or I did something wrong?

int main(){
    std::vector<char> rawBuf;
    std::ofstream stream;

    rawBuf.resize(20000);
    stream.rdbuf()->pubsetbuf(&rawBuf[0], 20000);

    stream.open("file.txt", std::ios_base::app);

    std::string data(1499, 'b');

    for(int i = 0; i < 10; i++)
    {   
        stream << data.substr(0, 1024) << "\n"; //1023-length string works great
        sleep(1);
    }
    stream.flush();
    stream.close();

    return 0;
}

when there is 1024-length string strace ./program shows something like this:

writev(3, [{iov_base=NULL, iov_len=0}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1024
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
writev(3, [{iov_base="\n", iov_len=1}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1025
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
... and so on 10x

when there is 1023-length string, everything seems ok:

nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
... 10x

and then:

write(3, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 10240) = 10240

Why here is single write and earlier is not?

edit:

gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)

Solution

  • Per [filebuf.virtuals]/12:

    basic_streambuf* setbuf(char_type* s, streamsize n) override;
    

    Effects: If setbuf(0, 0) is called on a stream before any I/O has occurred on that stream, the stream becomes unbuffered. Otherwise the results are implementation-defined. “Unbuffered” means that pbase() and pptr() always return null and output to the file should appear as soon as possible.

    “Implementation-defined” includes “works fine” and “there is only a single write” and other things. In fact, here's what libstdc++ 7.3.0 says:

    First, are you sure that you understand buffering? Particularly the fact that C++ may not, in fact, have anything to do with it?

    The rules for buffering can be a little odd, but they aren't any different from those of C. (Maybe that's why they can be a bit odd.) Many people think that writing a newline to an output stream automatically flushes the output buffer. This is true only when the output stream is, in fact, a terminal and not a file or some other device -- and that may not even be true since C++ says nothing about files nor terminals. All of that is system-dependent. (The "newline-buffer-flushing only occurring on terminals" thing is mostly true on Unix systems, though.)

    Some people also believe that sending endl down an output stream only writes a newline. This is incorrect; after a newline is written, the buffer is also flushed. Perhaps this is the effect you want when writing to a screen -- get the text out as soon as possible, etc -- but the buffering is largely wasted when doing this to a file:

    output << "a line of text" << endl;
    output << some_data_variable << endl;
    output << "another line of text" << endl; 
    

    The proper thing to do in this case to just write the data out and let the libraries and the system worry about the buffering. If you need a newline, just write a newline:

    output << "a line of text\n"
     << some_data_variable << '\n'
     << "another line of text\n"; 
    

    I have also joined the output statements into a single statement. You could make the code prettier by moving the single newline to the start of the quoted text on the last line, for example.

    If you do need to flush the buffer above, you can send an endl if you also need a newline, or just flush the buffer yourself:

    output << ...... << flush;    // can use std::flush manipulator
    output.flush();               // or call a member fn 
    

    On the other hand, there are times when writing to a file should be like writing to standard error; no buffering should be done because the data needs to appear quickly (a prime example is a log file for security-related information). The way to do this is just to turn off the buffering before any I/O operations at all have been done (note that opening counts as an I/O operation):

    std::ofstream    os;
    std::ifstream    is;
    int   i;
    
    os.rdbuf()->pubsetbuf(0,0);
    is.rdbuf()->pubsetbuf(0,0);
    
    os.open("/foo/bar/baz");
    is.open("/qux/quux/quuux");
    ...
    os << "this data is written immediately\n";
    is >> i;   // and this will probably cause a disk read 
    

    Since all aspects of buffering are handled by a streambuf-derived member, it is necessary to get at that member with rdbuf(). Then the public version of setbuf can be called. The arguments are the same as those for the Standard C I/O Library function (a buffer area followed by its size).

    A great deal of this is implementation-dependent. For example, streambuf does not specify any actions for its own setbuf()-ish functions; the classes derived from streambuf each define behavior that "makes sense" for that class: an argument of (0,0) turns off buffering for filebuf but does nothing at all for its siblings stringbuf and strstreambuf, and specifying anything other than (0,0) has varying effects. User-defined classes derived from streambuf can do whatever they want. (For filebuf and arguments for (p,s) other than zeros, libstdc++ does what you'd expect: the first s bytes of p are used as a buffer, which you must allocate and deallocate.)

    A last reminder: there are usually more buffers involved than just those at the language/library level. Kernel buffers, disk buffers, and the like will also have an effect. Inspecting and changing those are system-dependent.