Search code examples
c++stringtype-conversionc++20

C++ functions to convert sequence of bytes to string representation results in garbage output


I wrote multiple simple C++ functions to convert byte sequences to string representations.

It was pretty straight forward, I am sure my logic is right, I thought this to be extremely easy, until I started to print the strings and I found the output to be garbage:

#include <iostream>
#include <string>
#include <vector>

using std::vector;
typedef vector<uint8_t> bytes;
using std::string;
using std::cout;
using namespace std::literals;

string DIGITS = "0123456789abcdef"s;

static inline string hexlify(bytes arr) {
    string repr = ""s;
    for (auto& chr : arr) {
        repr += " " + DIGITS[(chr & 240) >> 4] + DIGITS[chr & 15];
    }
    repr.erase(0, 1);
    return repr;
}

bytes text = {
    84, 111, 32, 98, 101, 32,
    111, 114, 32, 110, 111, 116,
    32, 116, 111, 32, 98, 101
}; // To be or not to be

int main() {
    cout << hexlify(text);
}
2♠
÷82♠
÷82♠
÷82♠
÷

Why is this happening?

I know my logic is right, the following is the direct translation to Python:

digits = "0123456789abcdef"
def bytes_string(data):
    s = ""
    for i in data:
        s += " " + digits[(i & 240) >> 4] + digits[i & 15]
    return s[1:]

And it works:

>>> bytes_string(b"To be or not to be")
'54 6f 20 62 65 20 6f 72 20 6e 6f 74 20 74 6f 20 62 65'

But why it doesn't work in C++?

I am using Visual Studio 2022 V17.9.7, compiler flags:

/permissive- /ifcOutput "hexlify_test\x64\Release\" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"hexlify_test\x64\Release\vc143.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /std:c17 /Gd /Oi /MD /std:c++20 /FC /Fa"hexlify_test\x64\Release\" /EHsc /nologo /Fo"hexlify_test\x64\Release\" /Ot /Fp"hexlify_test\x64\Release\hexlify_test.pch" /diagnostics:column 

I just found out the garbage output only occurs on Debug mode after the fix is implemented, I targeted C++20 in Debug mode, somehow the code causes garbage output in Debug mode, switching to release mode fixes the problem. Before the fix is implemented I compiled in release mode and there was this problem.


Solution

  • As noted in comments (also here), the problem here is at or around string concatenation. The following code doesn't do concatenation:

    " " + DIGITS[(chr & 240) >> 4]
    

    When you extract a character from the string DIGTS, it has type char — a dedicated type for single characters. For historical reasons (compatibility with C), the + operator interprets the string literal " " as a pointer and the digit character as an integer, and does some useless pointer arithmetic.

    To do concatenation, use a string literal of type std::string, like you did elsewhere in your code:

    " "s + DIGITS[(chr & 240) >> 4]
    

    Here, operator+ encounters correct types std::string and char, so it works correctly.


    The proper idiom in C++ to do string concatenation is a string stream.

    #include <sstream>
    ...
    std::ostringstream stream; // "output string stream"
    stream << " " << DIGITS[...] << DIGITS[...];
    ...
    return stream.str();
    

    The stringstream class is optimized for incremental building of strings. After the code finishes all the concatenations, it converts the stream to a regular std::string type, which is general-purpose.