Search code examples
pythoncsvdictionarymd5apache-tika

Printing Dictionary Key if Values were found in CSV List


I'm pretty new to python, so forgive me if this is a long explanation to a simple problem. I need some help in understanding how to use a dictionary to find matches from csv list, then print the key in a reporting type output.

Goal: I have a list of clear text privacy data like Social Security Numbers. I need to compare the hash of that clear text and at the same time, obfuscate the clear text to the last 4 digits (XXX-XX-1245). If there is a match from my clear text hash, to a hash I already have in a CSV lookup, I do a mini report linking demographic information of who the found hash might belong to. Also, because nothing is easy, in the mini report needs to print the obfuscated SPI value.

output should look like this if hash I just generated, matches the hash of column 2 in my spreadsheet:

[email protected] Full Name Another Full Name xxx-xx-1234  location1 location2

Problem: All of the hash, obfuscation, and matching is done and stored in lists and works correctly. I need help figuring out how to print the key from the dictionary with my other columns below without printing the whole set each time in the for-loop.

This works outside of my reader:

 for i in hashes_ssnxxxx:
        print(i)

but I do not know how to take that value and put it in my print statement inside of the reader.

clear_text_hash = [] #Where Hash of clear text value found is stored
obfuscate_xxxxssn = [] #Where obfuscated SPI found by using re.sub is stored

#Zip them in a dictonary to keep the two related
hashes_and_ssnxxxx = dict(zip(obfuscate_xxxxssn,clear_text_hash))

book_of_record = open('path\to\bookofrecord.csv', 'rt',  encoding='UTF-8')
a1 = csv.reader(book_of_record, delimiter=',')

for row in a1:
    hashes = row[2] 
    if hashes in hashes_ssnxxxx.values():
        print(row[16], row[6], hashes_ssnxxxx.keys(), row[13], row[35], row[18], row[43])

UPDATE [Solved] using the list comprehension suggested by @tianhua liao all it needed was:

if hashes in hashes_ssnxxxx.values():
     obfuscate = [k for k,v in hashes_ssnxxxx.items() if hashes == v]
     print(row[16], obfuscate, row[6], row[13], row[35], row[18], row[43])

Solution

  • Actually, I am not sure what your problems really are. If you could give us some simple examples of hashes_ssnxxxx and hashes will be good.

    Here I just give some guessed answers. After you judge that if hashes in hashes_ssnxxxx.values():, you want to print some relative key from hashes_ssnxxxx.keys() instead of all of them.

    Maybe you could use some list comprehension to do that simply. Just like

    [keys for key,vals in hashes_ssnxxxx.items() if hashes == vals]

    The output of that code is a list. If you want to make it more readable, maybe you need to use some index[0] or ','.join() to print it.