Search code examples
pythonpython-3.xutfhashlibaudio-fingerprinting

TypeError: Unicode-objects must be encoded before hashing in Hashlib Function


I have checked out all of the other solutions to the same problem on stackoverflow and also tried them, but nothing seemed to work. I am simply posting links here instead of the code as the code is huge and it would be less interactive.

Link to the repository : https://github.com/executable16/audio-fingerprint-identifying-python

However specifically the error is at line no. 2 here :

if t_delta >= MIN_HASH_TIME_DELTA and t_delta <= MAX_HASH_TIME_DELTA:
            h = hashlib.sha1("%s|%s|%s" % (str(freq1), str(freq2), str(t_delta)))
            yield (h.hexdigest()[0:FINGERPRINT_REDUCTION], t1)

I tried to use .encode('utf-8') but alas it didn't help. Here is what I tried :

if t_delta >= MIN_HASH_TIME_DELTA and t_delta <= MAX_HASH_TIME_DELTA:
            first = str(freq1).encode('utf-8')
            second = str(freq2).encode('utf-8')
            third = str(t_delta).encode('utf-8')
            h = hashlib.sha1("%s|%s|%s" % (first, second, third))
            yield (h.hexdigest()[0:FINGERPRINT_REDUCTION], t1)

Error in form of text :

sqlite - connection opened
 * id=2 channels=2: file_example_MP3_700KB.mp3
   new song, going to analyze..
   fingerprinting channel 1/2
   local_maxima: 664 of frequency & time pairs
Traceback (most recent call last):
  File "collect-fingerprints-of-songs.py", line 54, in <module>
    channel_hashes = set(channel_hashes)
  File "/home/executable/Desktop/audio-fingerprint-identifying-python/libs/fingerprint.py", line 168, in generate_hashes
    h = hashlib.sha1("%s|%s|%s" % (str(freq1), str(freq2), str(t_delta)))
TypeError: Unicode-objects must be encoded before hashing
sqlite - connection has been closed
make: *** [Makefile:19: fingerprint-songs] Error 1

Would be really great If I could find some working solution along with proper explanations.


Solution

  • This worked

    h = hashlib.sha1(b"%s|%s|%s" % (str(freq1).encode('utf-8'), str(freq2).encode('utf-8'), str(t_delta).encode('utf-8')))