I have some raw string which I'm converting to hex
>>> word_str = "4954640000005200000005a7a90fb36ecd3fa2ca7ec48ca36004acef63f77157ab2f53e3f768ecd9e18547b8c22e21d01bfb6b3de325a27b8fb3acef63f77157ab2f53e3f768ecd9e185b7330fb7c95782fc3d67e7c3a66728dad8b59848c7670c94b29b54d2379e2e7a"
>>> hex_str = word_str.decode('hex')
>>> hex_str = "ITd\x00\x00\x00R\x00\x00\x00\x05\xa7\xa9\x0f\xb3n\xcd?\xa2\xca~\xc4\x8c\xa3`\x04\xac\xefc\xf7qW\xab/S\xe3\xf7h\xec\xd9\xe1\x85G\xb8\xc2.!\xd0\x1b\xfbk=\xe3%\xa2{\x8f\xb3\xac\xefc\xf7qW\xab/S\xe3\xf7h\xec\xd9\xe1\x85\xb73\x0f\xb7\xc9W\x82\xfc=g\xe7\xc3\xa6g(\xda\xd8\xb5\x98H\xc7g\x0c\x94\xb2\x9bT\xd27\x9e.z"
By looking at ascii table I suppose that it takes two numbers at a time and converts them by appropriate value from ascii table like
49 -> I
54 -> T
64 -> d
00 -> \x00
00 -> \x00
But at some point this rule breaks
52 -> \x00R (00 and 52)
Then is proceeds to take two numbers at a time and
00 -> \x00
00 -> \x00
00 -> \x00
05 -> \x05
a7 -> \xa7
a9 -> \xa9
0f -> \x0f
Here it takes 2 pairs (b3
and 63
) at the same time instead of of one, wherein it doesn't convert b3
with appropriate value (from extended ascii table)
b36e -> \xb3n
Here cd becomes \xcd?
...
cd -> \xcd?
My goal is to implement the same (variable.decode('hex')) in C++, but I need to understand what's going on, which algorithm here has been used ?
What you're asking about is the representation of the string for printing it in a human-readable format. The string itself contains the values of each byte in the original hex string (each byte being derived from two original digits).
Some of the bytes in your string are characters that aren't printable or aren't representable in ASCII. For those, Python uses an escape code: \x
followed by the the two original hex digits.
In your example b36e -> \xb3n
, Python converts the b3
to \xb3
. The next byte, 6e
, is ASCII for the lowercase n
and since that's printable, it comes through verbatim. Python is not "taking them two at a time;" each byte is processed separately.
So basically, if you want to "do the same thing" in C++ then you would want to add all characters between 32 and 126 (inclusive) verbatim, and anything outside that range using the \x
escape.
I'm not sure you really want to do the same thing in C++ though; perhaps you can explain why you want to generate a Python string representation in C++.