If I have a std::string
:
std::string s{"hello"};
and a loop that modifies it in-place, like this:
for (auto &c: s)
c = std::toupper(c);
I can replace it with the equivalent transform
:
std::transform(s.begin(), s.end(), s.begin(),
[](unsigned char c) -> unsigned char
{ return std::toupper(c); });
and these snippets generate identical assembly. They also have similar performance.
However, if I have a std::vector<std::string>
:
std::vector<std::string> v {"hello", "how", "are", "you"};
and modify it in-place like this:
for (auto & s : v)
for (auto &c: s)
c = std::toupper(c);
the equivalent transform should be:
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto s) {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
However, the transform
version generates over half as much assembly, and performs correspondingly poorly, which is surprising to me.
Is std::transform
not a zero-cost abstraction in this case, or am I just using it incorrectly?
Pass and return everything by reference. Otherwise, you make multiple copies of the string. Notice the change: [](auto& s) -> std::string& {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) -> std::string& {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
I added two new quickbench functions to your link. One which takes the input string as pass by reference. And the other that also returns by reference. That is:
static void Transform2(benchmark::State& state) {
// Code before the loop is not measured
std::vector<std::string> v {"hello", "how", "are", "you"};
for (auto _ : state) {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
}
}
BENCHMARK(Transform2);
static void Transform3(benchmark::State& state) {
// Code before the loop is not measured
std::vector<std::string> v {"hello", "how", "are", "you"};
for (auto _ : state) {
std::transform(std::begin(v), std::end(v), std::begin(v),
[](auto& s) -> std::string& {
std::transform(std::begin(s), std::end(s), std::begin(s),
[](unsigned char c) -> unsigned char { return std::toupper(c); });
return s;
});
}
}
BENCHMARK(Transform3);
Depending on how lucky I get when I run the benchmark, Transform3 is nearly (and sometimes equal) in perf to the InPlace test implementation.