I am using a C library which uses various fixed-sized unsigned char
arrays with no null terminator as strings.
I've been converting them to std::string
using the following function:
auto uchar_to_stdstring(const unsigned char* input_array, int width) -> std::string {
std::string temp_string(reinterpret_cast<const char*>(input_array), width);
temp_string.erase(temp_string.find_last_not_of(' ') + 1);
return temp_string;
}
Which works fine, other than the use of reinterpret_cast
, the need to pass the array size and the fact that I'm decaying an array into a pointer. I'm trying to avoid all of these issues with the use of std::span
.
The function that uses std::span
looks like this:
auto ucharspan_to_stdstring(const std::span<unsigned char>& input_array) -> std::string {
std::stringstream temp_ss;
for (const auto& input_arr_char : input_array) {
temp_ss << input_arr_char;
}
return temp_ss.str();
}
The function works well, makes everything else simpler without having to track the C array's size. But, a little further digging with some benchmarking (using nanobench) shows that the new function is many times slower than the classic reinterpret_cast
method. My assumption is that the for
loop in the std::span
-based function is the inefficiency here.
My question: Is there a more efficient method to convert a fixed-size C array of unsigned chars from a std::span
variable to a std::string
?
Edit:
gcc
benchmark (-O3 -DNDEBUG -std=gnu++20, nanobench, minEpochIterations=54552558, warmup=100, doNotOptimizeAway)
relative | ns/op | op/s | err% | ins/op | bra/op | miss% | total | uchar[] to std::string |
---|---|---|---|---|---|---|---|---|
100.0% | 5.39 | 185,410,438.12 | 0.3% | 80.00 | 20.00 | 0.0% | 3.56 | uchar |
2.1% | 253.06 | 3,951,678.30 | 0.6% | 4,445.00 | 768.00 | 0.0% | 167.74 | ucharspan |
1,244.0% | 0.43 | 2,306,562,499.69 | 0.2% | 9.00 | 1.00 | 0.0% | 0.29 | ucharspan_barry |
72.8% | 7.41 | 134,914,127.56 | 1.3% | 99.00 | 22.00 | 0.0% | 4.89 | uchar_bsv |
clang
benchmark (-O3 -DNDEBUG -std=gnu++20, nanobench, minEpochIterations=54552558, warmup=100, doNotOptimizeAway)
relative | ns/op | op/s | err% | ins/op | bra/op | miss% | total | uchar[] to std::string |
---|---|---|---|---|---|---|---|---|
100.0% | 2.13 | 468,495,014.11 | 0.2% | 14.00 | 1.00 | 0.0% | 1.42 | uchar |
0.8% | 251.74 | 3,972,418.54 | 0.2% | 4,477.00 | 767.00 | 0.0% | 166.30 | ucharspan |
144.4% | 1.48 | 676,329,668.07 | 0.1% | 7.00 | 0.00 | 95.8% | 0.98 | ucharspan_barry |
34.5% | 6.19 | 161,592,563.70 | 0.1% | 80.00 | 24.00 | 0.0% | 4.08 | uchar_bsv |
(uchar_bsv
in the benchmarks is the same as ucharspan_barry
, but with a std::basic_string_view<unsigned char const>
parameter instead of std::span<unsigned char const>
You want:
auto ucharspan_to_stdstring(std::span<unsigned char const> input_array) -> std::string {
return std::string(input_array.begin(), input_array.end());
}
string
, like other stand library containers, is constructible from an appropriate iterator pair - and this is such a pair. Since these are random access iterators, this will do a single allocation, etc.
Note that I changed from span<T> const&
to span<T const>
, for two reasons. First, you're not mutating the contents of the span, so the inner type needs to be const
... similar to how you took a T const*
, not a T*
. Second, you should take span
s by value because they're cheap to copy (unless you very specifically need the identity of the span, which you don't here).
It may be better to do a reinterpret_cast
so that you can use the (char const*, size_t)
constructor - this one ensures a single memcpy
for the eventual write. But you'd have to time it to see if it's worthwhile.