Search code examples
c++performancestl-algorithm

Is a nested std::transform inefficient?


If I have a std::string:

std::string s{"hello"};

and a loop that modifies it in-place, like this:

for (auto &c: s)
  c = std::toupper(c);

I can replace it with the equivalent transform:

std::transform(s.begin(), s.end(), s.begin(),
               [](unsigned char c) -> unsigned char 
               { return std::toupper(c); });

and these snippets generate identical assembly. They also have similar performance.

However, if I have a std::vector<std::string>:

std::vector<std::string> v {"hello", "how", "are", "you"};

and modify it in-place like this:

for (auto & s : v)
  for (auto &c: s)
    c = std::toupper(c);

the equivalent transform should be:

std::transform(std::begin(v), std::end(v), std::begin(v),
  [](auto s) {
    std::transform(std::begin(s), std::end(s), std::begin(s), 
      [](unsigned char c) -> unsigned char { return std::toupper(c); });
    return s;
});

However, the transform version generates over half as much assembly, and performs correspondingly poorly, which is surprising to me.

Is std::transform not a zero-cost abstraction in this case, or am I just using it incorrectly?


Solution

  • Pass and return everything by reference. Otherwise, you make multiple copies of the string. Notice the change: [](auto& s) -> std::string& {

    std::transform(std::begin(v), std::end(v), std::begin(v),
      [](auto& s) -> std::string& {
        std::transform(std::begin(s), std::end(s), std::begin(s), 
          [](unsigned char c) -> unsigned char { return std::toupper(c); });
        return s;
    });
    

    I added two new quickbench functions to your link. One which takes the input string as pass by reference. And the other that also returns by reference. That is:

    static void Transform2(benchmark::State& state) {
      // Code before the loop is not measured
    
      std::vector<std::string> v {"hello", "how", "are", "you"};
      for (auto _ : state) {
        std::transform(std::begin(v), std::end(v), std::begin(v),
        [](auto& s) {
          std::transform(std::begin(s), std::end(s), std::begin(s), 
            [](unsigned char c) -> unsigned char { return std::toupper(c); });
          return s;
        });
    
      }
    }
    BENCHMARK(Transform2);
    
    
    static void Transform3(benchmark::State& state) {
      // Code before the loop is not measured
    
      std::vector<std::string> v {"hello", "how", "are", "you"};
      for (auto _ : state) {
        std::transform(std::begin(v), std::end(v), std::begin(v),
        [](auto& s) -> std::string& {
          std::transform(std::begin(s), std::end(s), std::begin(s), 
            [](unsigned char c) -> unsigned char { return std::toupper(c); });
          return s;
        });
    
      }
    }
    BENCHMARK(Transform3);
    

    Depending on how lucky I get when I run the benchmark, Transform3 is nearly (and sometimes equal) in perf to the InPlace test implementation.

    enter image description here