Search code examples
c++performancestdstdstring

why is std::copy faster than std::string constructor?


I tried these codes, to compare std::copy and std::string's constructor.

#include <chrono>
#include <iostream>
#include <vector>

void construct_test() {
  std::vector<uint8_t> raw_data;
  for (int i = 0; i < 1000 * 1024; i++) {
    raw_data.push_back(i % 256);
  }

  auto start = std::chrono::high_resolution_clock::now();
  std::string target_data;
  target_data = std::string(raw_data.begin(), raw_data.end());
  auto finish = std::chrono::high_resolution_clock::now();
  std::cout << "construct: " << std::chrono::duration_cast<std::chrono::microseconds>(finish -
                                                                     start)
                   .count()
            << "us" << std::endl;
}

void copy_test() {
  std::vector<uint8_t> raw_data;
  for (int i = 0; i < 1000 * 1024; i++) {
    raw_data.push_back(i % 256);
  }

  auto start = std::chrono::high_resolution_clock::now();
  std::string target_data;
  target_data.resize(raw_data.size());
  std::copy(raw_data.begin(), raw_data.end(), target_data.begin());
  auto finish = std::chrono::high_resolution_clock::now();
  std::cout << "copy: " << std::chrono::duration_cast<std::chrono::microseconds>(finish -
                                                                     start)
                   .count()
            << "us" << std::endl;
}

int main() {
  construct_test();
  copy_test();

  return 0;
}

And I got result:

construct: 6245us
copy: 1087us

std::copy is 6x faster!

Is that meeting expectation? If so, what's the reason?
I searched a lot of methods of converting vector to string, but no one mentioned the std::copy way. Should I use this way? Are there any drawbacks?


Solution

  • As commenters have pointed out, your testing methodology is deeply flawed. In general, you have to run operations many times (possibly millions or billions) to get meaningful results. Otherwise, the order in which you run the benchmarks, and scheduling etc. may give you drastically different outcomes.

    • @463035818_is_not_an_ai has pointed out that you should use steady_clock over high_resolution_clock for benchmarks (although this is unlikely to have major impact in this case)
    • @VLL has pointed out that simply changing the order of construct_test() and copy_test() makes one function run faster than the other.

    You can use google/benchmark (used by QuickBench) to get more meaningful results.

    Besides the two methods you have used, there are at least two more ways to create/overwrite strings:

    // BenchmarkInit
    std::string target_data = std::string(raw_data.begin(), raw_data.end());
    
    // BenchmarkAssignmentOp
    std::string target_data;
    target_data = std::string(raw_data.begin(), raw_data.end());
    
    // BenchmarkAssign
    std::string target_data;
    target_data.assign(raw_data.begin(), raw_data.end());
    
    // BenchmarkCopy
    std::string target_data;
    target_data.resize(raw_data.size());
    std::copy(raw_data.begin(), raw_data.end(), target_data.begin());
    

    We get the following benchmark results for clang 15, libstdc++, -O3: enter image description here

    • Using the std::string constructor is best, whether there's an unnecessary default initialization first, or not. The first two methods use std::memcpy internally, which should be the fastest way to copy memory.
    • std::copy is slower, likely because .resize() requires zeroing memory first, and isn't getting optimized nicely to memcpy , but to vectorized memory operations.
    • .assign is dramatically slower, likely because there is less partial loop unrolling compared to std::copy, so there is a lot of overhead besides just copying memory.

    Even with proper benchmarking, you can see unexpected and dramatic differences, and you can only make sense of things when looking at assembly.

    Update

    I've opened a bug report and this performance issue has been patched by @JonathanWakely.