Search code examples
pythonhashshacircom

Conflicting versions of sha256 bit calculation


I'm computing sha256 from two different sources, both ran on bit arrays. In Python, I run

from bitarray import bitarray
from hashlib import sha256

inbits = bitarray([1,0,1,0,1,0,1,0,1,0])
sha = sha256(inbits)

outbits = bitarray()
outbits.frombytes(sha.digest())

The other source is a circuit implementation of sha256 (implemented in circom). I'm just wondering if there are different implementations of sha256, as running the sha256 circuit and python code give different outputs.

Output from circom:

 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0,
 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0,
 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1,
 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1,
 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1,
 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1,
 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,
 1, 1, 0, 0, 1, 0, 1, 1, 0]

and output from python: bitarray('1110111001111011010110111001100001000011011101011100000100111011001101011111000000010101110000001001100011100001100011010111011110001100100010110010111111110011111101010111111110101000101111011010010001011101000001101110101110111011011010111100101101111100')


Solution

  • You cannot feed bitarray to hashlib and expect it to handle it. hashlib handles only full bytes so it somehow convert its input to bytes. Proof by code:

    >>> from bitarray import bitarray
    >>> from hashlib import sha256
    >>> inbits = bitarray([0])
    >>> sha256(inbits).hexdigest()
    '148de9c5a7a44d19e56cd9ae1a554bf67847afb0c58f6e12fa29ac7ddfca9940'
    >>> bytes(inbits)
    b'p'
    >>> sha256(b'p').hexdigest()
    '148de9c5a7a44d19e56cd9ae1a554bf67847afb0c58f6e12fa29ac7ddfca9940'
    

    We know for sure that this is not the expected result because NIST published the test vector for the single bit at 0: see 'SHA256ShortMsg.rsp' in https://csrc.nist.gov/CSRC/media/Projects/Cryptographic-Algorithm-Validation-Program/documents/shs/shabittestvectors.zip

    It says the following:

    Len = 1
    Msg = 00
    MD = bd4f9e98beb68c6ead3243b1b4c7fed75fa4feaab1f84795cbd8a98676a2a375
    

    We can compute that using https://pypi.org/project/sha256bit/

    >>> from sha256bit import Sha256bit
    >>> Sha256bit(inbits,bitlen=len(inbits)).hexdigest()
    'bd4f9e98beb68c6ead3243b1b4c7fed75fa4feaab1f84795cbd8a98676a2a375'
    

    Application to your original input:

    >>> inbits = bitarray([1,0,1,0,1,0,1,0,1,0])
    >>> Sha256bit(inbits,bitlen=len(inbits)).hexdigest()
    '39e78e40303b445bd9298f30ccb55e810585edce97bf287f970ca8d891fb7996'
    >>> outbits = bitarray()
    >>> outbits.frombytes(sha256bit(inbits,bitlen=len(inbits)).digest())
    >>> outbits
    bitarray('0011100111100111100011100100000000110000001110110100010001011011110110010010100110001111001100001100110010110101010111101000000100000101100001011110110111001110100101111011111100101000011111111001011100001100101010001101100010010001111110110111100110010110')