Search code examples
pythonbase64encodesha256

How to base64 encode a SHA256 hex character string


Hi I need help to get a base64 encoded column, what I got is a sha256 hashed column, I suppose to get 44 characters, but when I try this in python

[base64.b64encode(x.encode('utf-8')).decode() for x in xxx['yyy']]

it returns 88 character, anyone can help with this? Basically I want to achieve the steps showing in the pictures below in Python, thanks! enter image description here

enter image description here

enter image description here


Solution

  • The step in the first image consist of a few substeps:

    • a text is entered, but that is just the character representation of a UTF-8 encoding
    • sha256 hashing is applied to that bytes string
    • the resulting digest byte sequence is rendered in its hexadecimal representation

    So:

    from hashlib import sha256
    
    s = '[email protected]'
    
    h = sha256()
    h.update(s.encode('utf-8'))  # specifying encoding, optional as this is the default
    hex_string = h.digest().hex()
    print(hex_string)
    

    The second image seems to suggest it takes that hex representation as text again, and base64 encodes it - but really it takes the byte string represented by the hex string and encodes that.

    So, starting with the hex string:

    • decode the hex to bytes (reconstructing the digest bytes)
    • encode the bytes using base64 into an ascii bytes string
    • decode that resulting bytes string into characters for printing
    from base64 import b64encode
    
    digest_again = bytes.fromhex(hex_string)
    b64bytes = b64encode(digest_again)
    # no real need to specify 'ascii', the relevant code points overlap with UTF-8:
    result = b64bytes.decode('ascii')
    print(result)
    

    Put together:

    from hashlib import sha256
    from base64 import b64encode
    
    s = '[email protected]'
    
    h = sha256()
    h.update(s.encode())
    print(h.digest().hex())
    
    b64bytes = b64encode(h.digest())
    print(b64bytes.decode())
    

    Output:

    b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514
    tMmiiTI7IaAcPpQPFQ65uMVCWH8av9jw4cwf/F5HVRQ=
    

    Why your code didn't work:

    base64.b64encode('[email protected]'.encode('utf-8')).decode()  # superfluous utf-8
    

    This:

    • encodes the characters '[email protected]' into bytes using UTF-8
    • encodes that byte string using base64
    • decodes the resulting byte string into a character string

    Nowhere does it apply SHA256 hashing, nor does it create a hex representation, if you were expecting that. The end result doesn't match because it is the text representation of the base64 encoding of the original text's UTF-8 encoding, not the digest of its SHA256 hash.

    Or perhaps I misunderstood and you already had the hex encoding, but you're putting that in as a string:

    x = 'b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514'
    base64.b64encode(x.encode()).decode()
    

    That does indeed result in a 88 character base64 encoding, because you're not encoding the bytes, you're encoding the hex representation. That would have to be this instead:

    x = 'b4c9a289323b21a01c3e940f150eb9b8c542587f1abfd8f0e1cc1ffc5e475514'
    base64.b64encode(bytes.fromhex(x)).decode()
    

    ... and perhaps that is the answer you were looking for.