Search code examples
pythondictionarycryptographydictionary-attack

how to perform XOR of all words in a file


I want to convert all words in a standard dictionary (for example : /usr/share/dict/words of a unix machine) integer and find XOR between every two words in the dictionary( ofcourse after converting them to integer) and probably store it in a new file.

Since I am new to python and because of large file sizes, the program is getting hung every now and then.

import os
dictionary = open("/usr/share/dict/words","r")
'''a = os.path.getsize("/usr/share/dict/words")
c = fo.read(a)'''
words = dictionary.readlines()

foo = open("word_integer.txt", "a")


for word in words:
    foo.write(word)
    foo.write("\t")
    int_word = int(word.encode('hex'), 16)
    '''print int_word'''
    foo.write(str(int_word))
    foo.write("\n")

foo.close()

Solution

  • First we need a method to convert your string to an int, I'll make one up (since what you're doing isn't working for me at all, maybe you mean to encode as unicode?):

    def word_to_int(word):
        return sum(ord(i) for i in word.strip())
    

    Next, we need to process the files. The following works in Python 2.7 onward, (in 2.6, just nest two separate with blocks, or use contextlib.nested:

    with open("/usr/share/dict/words","rU") as dictionary: 
        with open("word_integer.txt", "a") as foo:
            while dictionary:
                try:
                    w1, w2 = next(dictionary), next(dictionary)
                    foo.write(str(word_to_int(w1) ^ word_to_int(w2)))
                except StopIteration:
                    print("We've run out of words!")
                    break