Search code examples
pythonpicklehmac

Getting invalid signature for HMAC authentication of python pickle file


I am trying to use HMAC authentication for reading and write pickle files.

Sample Data :

import base64
import hashlib
import hmac
from datetime import datetime
import six
import pandas as pd
import pickle

df1 = pd.DataFrame({'id' : [1,2,3,4,5],
                   'score' : [720, 700, 710, 690, 670]})

df2 = pd.DataFrame({'name' : ['abc', 'pqr', 'xyz'],
                   'address' : ['1st st', '2nd ave', '3rd st'] })

mylist = ['a', 'b', 'c', 'd', 'e']

mydict = {1 : 'p', 2 : 'q', 3 : 'r'}

obj = [df1, df2, mylist, mydict]

Write pickle file using:

data = pickle.dumps(obj)
digest =  hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
with open('temp.txt', 'wb') as output:
    output.write(bytes(digest, sys.stdin.encoding) + data)

But when I try to read it back using:

with open('temp.txt', 'rb') as f:
    digest = f.readline()
    data = f.read()
    
recomputed = hmac.new(b'unique-key-here', data, hashlib.blake2b).hexdigest()
if not compare_digest(digest, bytes(recomputed, sys.stdin.encoding)):
    print('Invalid signature')
else:
    print('Signature matching')

I am getting Invalid signature as output. Could someone please help me understand where I am going wrong.


Solution

  • Here is some code illustrating what I think is a cleaner way to solve the problem.

    import hashlib
    import hmac
    import io
    import os
    import pickle
    
    sample_obj = {'hello': [os.urandom(50)]}
    
    data = pickle.dumps(sample_obj)
    
    # write it out
    
    my_hmac = hmac.new(b'my_hmac_key', digestmod=hashlib.blake2b)
    my_hmac.update(data)
    mac_result = my_hmac.digest()
    pickle_out = io.BytesIO()
    pickle_out.write(mac_result + data)
    
    # read it in
    
    pickle_in = io.BytesIO(pickle_out.getbuffer().tobytes())
    my_hmac = hmac.new(b'my_hmac_key', digestmod=hashlib.blake2b)
    mac_from_stream = pickle_in.read(my_hmac.digest_size)
    data_from_stream = pickle_in.read()
    my_hmac.update(data_from_stream)
    computed_mac = my_hmac.digest()
    
    # see if they match
    
    print(hmac.compare_digest(computed_mac, mac_from_stream))
    

    We avoid hexdigest() all together and thus eliminate unnecessary encoding and decoding. We create the mac instance and keep it around so that we can get the hmac.digest_size property. The use of io.BytesIO is just for illustrating the I/O part of your code.