Search code examples
pythonpermutationrampython-itertools

Permutating lists too large for RAM in Python


I have written a program to read a list of words from a text file (one word per line) and combine them to produce every permutation of 3 words before writing an output file of the permutations, again one per line.

import itertools

wordList = open("wordlist.txt", "r").readlines() # import words into list
wordListOut = open("output.txt", "w")

wordList = [item.rstrip() for item in wordList] # strip \n from list items
for item in [x for x in itertools.permutations(wordList, 3)]:
    wordListOut.write("".join("%s %s %s\n" % item))

wordListOut.close()

It seems to do the job, but my concern is that with the whole text being stored in RAM in a list and itertools.permutations() producing a list of tuples in RAM, without a very large wordlist.txt it will quickly run out of memory.

It would be better if each permutation was written straight to the output file rather than held in RAM, and depending on size of wordlist.txt, it could be better not to load the whole thing to RAM.

Also how can I avoid adding \n to the last line of the output file?


Solution

  • for item in [x for x in itertools.permutations(wordList, 3)]:
    

    This line is not providing any benefit and will only cause problems. permutations() does not produce a list, it uses a generator that creates the next permutation as it is requested. By wrapping this call in a list comprehension, you are producing that list and ensuring that all permutations exist in memory at one time. This defeats the point of using a generator in the first place. You should change the line to be just:

    for item in itertools.permutations(wordList, 3):