I have a dump file with a lot of lines like this:
$0414$0436$0435$0434$0430$0439
$05DE$05E1$05D3$05E8_$05D4$05D2$0027$05D3$05D9$05D9
I assume the strings above do means "Джедаи" (russian) and "מסדר_הג'דיי" (hebrew).
How can I decode these strings ?
Which encode is that ?
The file contains UTF-16 code units formatted as 16bit hex strings, each beginning with $
. Except for the _
ASCII character (U+005F) in מסדר_הג'דיי
, which has been written to the file as-is instead of being hex encoded. Oddly, the '
ASCII character (U+0027) in מסדר_הג'דיי
has been hex encoded.
To decode this, you would read the file one character at a time. When you detect a $
character, skip it and hex-decode the next 4 characters into a 16bit value, otherwise treat the character as-is as a 16bit value. Build up a string of these 16bit values, and you will have a UTF-16 encoded string.