I'm timing the difference between various ways to print text to standard output. I'm testing cout
, printf
, and ostringstream
using both \n
and std::endl
. I expected std::endl
to make a difference with cout
(and it did), but I didn't expect it to slow down output with ostringstream
. I thought using std::endl
would just write a \n
to the stream and it would still only get flushed once. What's going on here? Here's all my code:
// cout.cpp
#include <iostream>
using namespace std;
int main() {
for (int i = 0; i < 10000000; i++) {
cout << "Hello World!\n";
}
return 0;
}
// printf.cpp
#include <stdio.h>
int main() {
for (int i = 0; i < 10000000; i++) {
printf("Hello World!\n");
}
return 0;
}
// stream.cpp
#include <iostream>
#include <sstream>
using namespace std;
int main () {
ostringstream ss;
for (int i = 0; i < 10000000; i++) {
ss << "stream" << endl;
}
cout << ss.str();
}
// streamn.cpp
#include <iostream>
#include <sstream>
using namespace std;
int main () {
ostringstream ss;
for (int i = 0; i < 10000000; i++) {
ss << "stream\n";
}
cout << ss.str();
}
And here's my Makefile
SHELL:=/bin/bash
all: cout.cpp printf.cpp
g++ cout.cpp -o cout.out
g++ printf.cpp -o printf.out
g++ stream.cpp -o stream.out
g++ streamn.cpp -o streamn.out
time:
time ./cout.out > output.txt
time ./printf.out > output.txt
time ./stream.out > output.txt
time ./streamn.out > output.txt
Here's what I get when I run make
followed by make time
time ./cout.out > output.txt
real 0m1.771s
user 0m0.616s
sys 0m0.148s
time ./printf.out > output.txt
real 0m2.411s
user 0m0.392s
sys 0m0.172s
time ./stream.out > output.txt
real 0m2.048s
user 0m0.632s
sys 0m0.220s
time ./streamn.out > output.txt
real 0m1.742s
user 0m0.404s
sys 0m0.200s
These results are consistent.
std::endl
triggers a flush of the stream, which slows down printing a lot. See http://en.cppreference.com/w/cpp/io/manip/endl
It is often recommended to not use std::endl
unless you really want the stream to be flushed. If this is really important to you, depends on your use case.
Regarding why flush
has a performance impact even on a ostringstream (where no flushing should happen): It seems that an implementation is required to at least construct the sentry objects. Those need to check good
and tie
of the ostream
. The call to pubsync
should be able to be optimized out. This is based on my reading of libcpp and libstdc++.
After some more reading the interesting question seems to be this: Is an implementation of basic_ostringstream::flush
really required to construct the sentry object? If not, this seems like a "quality of implementation" issues to me. But I actually think it needs to because even a basic_stringbug
can change to have its badbit
set.