Search code examples
pythonlistloopshashhashlib

Strings unwillingly concatenate in while loop in Python


I'm writing a simple program that takes a string from a file and hashes it. The loop unwillingly concatenates strings for some reason. It works outside of a for loop and while loop, but does funky stuff inside of one. Here's my code.

import hashlib

f = open('1000-most-common-passwords.txt', 'r')  # Opens file with all of the strings to compare it to.

unparsed = f.read()
unparsed = unparsed.replace('\n', ' ').split(' ')  # Turns string into list with every new line.
sha1 = hashlib.sha1()
sha1.update(unparsed[0].encode('utf-8'))  # Line 1 is hashed into SHA-1.

This works well. I can substitue the index in unparsed[0] and it selects the string from that line and prints it out hashed. Now, I'd like to do this for every line in the text file, so I wrtoe a simple while loop. Here's how that looks.

i = 0  # Selects the first line.
while i < len(unparsed):  # While i is less than the amount of values in the list, keep going.
    sha1.update(unparsed[i].encode('utf-8'))  # Update the index to the current value in the list.
    print(sha1.hexdigest())
    i += 1

This doesn't give me any errors. To the contrary, it looks like how I want it to look. But what it actually does bothers me. Instead of giving me the hash for each value, it gives me some sort of concatonation of all previous hashes. Instead of hashing 123456, it hashes 123456123456 or 123456password. Why does this work outside of a loop but not inside of one? Any help is much appreciated.


Solution

  • Seems like you want to hash each line separately; update is going to keep hashing all the data you give it, so instead you need to create a new hash object per line to get what you want. Here's how:

    from hashlib import sha1
    
    # Just read the file in binary mode so you don't have to re-encode it:
    with open('1000-most-common-passwords.txt', 'rb') as f:
      for line in f.readlines():  # iterate over all the lines in the file
        pw = line.strip()  # Don't include the trailing newline in the hash
        digest = sha1(pw).hexdigest()
        print(f'{pw} hashes to {digest}')