I have the following version of code for Python:
import hashlib
msg = 'abc'
print msg
sha256_hash = hashlib.sha256()
sha256_hash.update(msg)
hash_digest = sha256_hash.digest()
print hash_digest
And corresponding Node js version:
var crypto= require('crypto');
var msg = 'abc';
var shasum = crypto.createHash('sha256').update(msg);
var hashDigest = shasum.digest();
console.log(hashDigest);
However, the binary output is slightly off for both:
The hex representation is correct though between the two libraries. Am I doing something wrong here?
Your node code is trying to parse the result of the hash as utf8 and failing.
The difference is in how the languages treat their binary data and string types. When considering the final binary output, your examples both output the same values. So let's example the output of your two examples, in hex:
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
In Python:
'\xbax\x16\xbf\x8f\x01\xcf\xeaAA@\xde]\xae"#\xb0\x03a\xa3\x96\x17z\x9c\xb4\x10\xffa\xf2\x00\x15\xad'
In Node:
<SlowBuffer ba 78 16 bf 8f 01 cf ea 41 41 40 de 5d ae 22 23 b0 03 61 a3 96 17 7a 9c b4 10 ff 61 f2 00 15 ad>
In this case, the core thing to notice is that the result in Python is returned as a string. In Python, strings are simply arrays of chars (0-255) values. The value in Node however, is stored as a Buffer, which actually represents an array of values (0-255) as well. That is the key different here. Node does not return a string, because strings in Node are not arrays of single-byte characters, but arrays of UTF-16 code units. Python supports Unicode using a separate string class designated by u''
.
So then compare your examples of printing the output, shortened for readability
print '\xbax\x16\xbf\x8f\x01\xcf\xeaAA'
vs
console.log('' +
new Buffer([0xba, 0x78, 0x16, 0xbf, 0x8f, 0x01, 0xcf, 0xea, 0x41, 0x41]))
The Python code says, write this array of bytes to the terminal. The second however, says something very different, convert this array of bytes into a string, and then write that string to the terminal. But the buffer is binary data, not UTF-8 encoded data, so it will fail to decode your data into a string, causing garbled results. If you wish to directly compare the binary values as actual decoded values in a terminal, you need to give the equivalent instructions in both languages.
print '\xbax\x16\xbf\x8f\x01\xcf\xeaAA'
vs
process.stdout.write(
new Buffer([0xba, 0x78, 0x16, 0xbf, 0x8f, 0x01, 0xcf, 0xea, 0x41, 0x41]))
process.stdout.write
in this case being a way to write binary values to the terminal, rather than strings.
Really though, you should just compare the hashes as hex, since it is already a string representation of a binary value, and it's easier to read than improperly decoded unicode characters.