I am running this hashlib code and it runs almost all the way:
def generate_hashes(peaks, fan_value=DEFAULT_FAN_VALUE):
if PEAK_SORT:
sorted(peaks,key=itemgetter(1))
# bruteforce all peaks
peaks=list(peaks)
len_peaks=len(peaks)
for i in range(len_peaks):
for j in range(1, fan_value):
if (i + j) < len(peaks):
# take current & next peak frequency value
freq1 = peaks[i][IDX_FREQ_I]
freq2 = peaks[i + j][IDX_FREQ_I]
# take current & next -peak time offset
t1 = peaks[i][IDX_TIME_J]
t2 = peaks[i + j][IDX_TIME_J]
# get diff of time offsets
t_delta = t2 - t1
# check if delta is between min & max
if t_delta >= MIN_HASH_TIME_DELTA and t_delta <= MAX_HASH_TIME_DELTA:
h = hashlib.sha1(("%s|%s|%s") % (str(freq1), str(freq2), str(t_delta)))
yield (h.hexdigest()[0:FINGERPRINT_REDUCTION], t1)
However, it returns this error:
h = hashlib.sha1(("%s|%s|%s") % (str(freq1), str(freq2), str(t_delta)))
TypeError: Unicode-objects must be encoded before hashing
I am honestly completely lost and don't know how to fix it. If you guys have any follow up questions regarding details about the code I will try my best to answer. Any feedback would be appreciated.
The answer is in the error message: use encode
on your text string before hashing.
h = hashlib.sha1(("%s|%s|%s" % (str(freq1), str(freq2), str(t_delta))).encode('utf-8'))
The reason this is necessary is because hashlib.sha1()
requires a bytes
object due to the way it works internally. Normal Python strings (since version 3.0) are made of Unicode codepoints, which don't fit into a byte. They need an encoding which defines how the translation between codepoints and bytes occurs. UTF-8 is the most popular encoding, because it can handle every Unicode codepoint yet remain backwards compatible with older encodings like ASCII.