I wrote multiple simple C++ functions to convert byte sequences to string representations.
It was pretty straight forward, I am sure my logic is right, I thought this to be extremely easy, until I started to print the strings and I found the output to be garbage:
#include <iostream>
#include <string>
#include <vector>
using std::vector;
typedef vector<uint8_t> bytes;
using std::string;
using std::cout;
using namespace std::literals;
string DIGITS = "0123456789abcdef"s;
static inline string hexlify(bytes arr) {
string repr = ""s;
for (auto& chr : arr) {
repr += " " + DIGITS[(chr & 240) >> 4] + DIGITS[chr & 15];
}
repr.erase(0, 1);
return repr;
}
bytes text = {
84, 111, 32, 98, 101, 32,
111, 114, 32, 110, 111, 116,
32, 116, 111, 32, 98, 101
}; // To be or not to be
int main() {
cout << hexlify(text);
}
2♠
÷82♠
÷82♠
÷82♠
÷
Why is this happening?
I know my logic is right, the following is the direct translation to Python:
digits = "0123456789abcdef"
def bytes_string(data):
s = ""
for i in data:
s += " " + digits[(i & 240) >> 4] + digits[i & 15]
return s[1:]
And it works:
>>> bytes_string(b"To be or not to be")
'54 6f 20 62 65 20 6f 72 20 6e 6f 74 20 74 6f 20 62 65'
But why it doesn't work in C++?
I am using Visual Studio 2022 V17.9.7, compiler flags:
/permissive- /ifcOutput "hexlify_test\x64\Release\" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"hexlify_test\x64\Release\vc143.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /std:c17 /Gd /Oi /MD /std:c++20 /FC /Fa"hexlify_test\x64\Release\" /EHsc /nologo /Fo"hexlify_test\x64\Release\" /Ot /Fp"hexlify_test\x64\Release\hexlify_test.pch" /diagnostics:column
I just found out the garbage output only occurs on Debug mode after the fix is implemented, I targeted C++20 in Debug mode, somehow the code causes garbage output in Debug mode, switching to release mode fixes the problem. Before the fix is implemented I compiled in release mode and there was this problem.
As noted in comments (also here), the problem here is at or around string concatenation. The following code doesn't do concatenation:
" " + DIGITS[(chr & 240) >> 4]
When you extract a character from the string DIGTS
, it has type char
— a dedicated type for single characters. For historical reasons (compatibility with C), the +
operator interprets the string literal " "
as a pointer and the digit character as an integer, and does some useless pointer arithmetic.
To do concatenation, use a string literal of type std::string
, like you did elsewhere in your code:
" "s + DIGITS[(chr & 240) >> 4]
Here, operator+
encounters correct types std::string
and char
, so it works correctly.
The proper idiom in C++ to do string concatenation is a string stream.
#include <sstream>
...
std::ostringstream stream; // "output string stream"
stream << " " << DIGITS[...] << DIGITS[...];
...
return stream.str();
The stringstream
class is optimized for incremental building of strings. After the code finishes all the concatenations, it converts the stream to a regular std::string
type, which is general-purpose.