What I'm trying to do is hash a message but it contains a Unicode charset.
What I've done so far in NodeJS :-
const CryptoJS = require('crypto-js');
let message = '\x1aSmartCash Signed Message:\n\xabCTxIn(COutPoint(7bb8ad134928a003752beb098471af5a66fc5475ff96b5ba4c2e1c4cbac3aa13, 0), scriptSig=)000000000002c5c2ef4afc588492773e6bbb18e4f2374b6dc159ef257667bd881667906410';
let hash = CryptoJS.SHA256(message).toString();
console.log('hash', hash); // 989c004534c6962293c95a9438bdb926c92c1d8b4dec0f4f1e535defa171e5fe
In Python code the code :-
import hashlib
def to_bytes(something, encoding='utf8'):
"""
cast string to bytes() like object, but for python2 support it's bytearray copy
"""
if isinstance(something, bytes):
return something
if isinstance(something, str):
return something.encode(encoding)
elif isinstance(something, bytearray):
return bytes(something)
else:
raise TypeError("Not a string or bytes like object")
def sha256(x):
x = to_bytes(x, 'utf8')
return bytes(hashlib.sha256(x).digest())
def Hash_Sha256(x):
x = to_bytes(x, 'utf8')
out = bytes(sha256(x))
return out
message = b'\x1aSmartCash Signed Message:\n\xabCTxIn(COutPoint(7bb8ad134928a003752beb098471af5a66fc5475ff96b5ba4c2e1c4cbac3aa13, 0), scriptSig=)000000000002c5c2ef4afc588492773e6bbb18e4f2374b6dc159ef257667bd881667906410'
print('hash', Hash_Sha256(message).hex()) # 947270b0b8041a92ba82ef37661b692a4a150532b88de59bf95e965ceb5c07f8
As you can see the output from Python is different than NodeJS
The desired output should be the same as Python code 947270b0b8041a92ba82ef37661b692a4a150532b88de59bf95e965ceb5c07f8
I don't know how to hash the Unicode the problem is these characters (\x1a, \xab). If I removed the escape character the 2 hashes will be the same.
So how to hash Unicode characters in NodeJS so the output equals the hash from Python code?
message
in the Python code is a byte string, i.e. a sequence of bytes. In particular, \xab
in the message corresponds to the byte 0xab
.
In the CryptoJS code, message
is a string that is implicitly UTF-8 encoded in CryptoJS.SHA256(message)
. Here all characters beyond U+007f are represented by more than one byte. In particular, \xab
in the message is encoded to 0xc2ab
.
This leads to different byte sequences in the CryptoJS and Python code and therefore to different hashes.
To achieve the same encoding as the Python code, the Latin1 encoder must be applied in the CryptoJS code:
let hash = CryptoJS.SHA256(CryptoJS.enc.Latin1.parse(message)).toString();
With this, the CryptoJS code returns the same hash as the Python code.