Search code examples
pythonpython-itertools

python itertools dont load files into memory


I have some what big files and I'm trying to get all combinations with this code

for text1, text2 in itertools.product(open('text1.txt'), open('text2.txt')):
    t3 = (text1.strip() + text2.strip())
    time.sleep(1)
    print(t3)

testing with small files it worked fine but when using big files nothing happens I'm guessing its loading the file into memory anyway so it doesn't load the whole file into memory


Solution

  • This is documented:

    Before product() runs, it completely consumes the input iterables, keeping pools of values in memory to generate the products. Accordingly, it is only useful with finite inputs.

    Note, in this particular case, you may be able to do something like:

    with open("text1.txt") as f1, open("text2.txt") as f2:
        for text1 in f1:
            for text2 in f2:
                # do some stuff
                t3 = (text1.strip() + text2.strip())
            f2.seek(0) # reset inner file cursor
    

    This is possible due to the nature of file iterators - you can just seek to the beginning and the iterator is effectively reset (and this is nice and efficient too!). But this won't work with iterables or iterators in general, so itertools.product handles the general case by simply reifying two lists out fo the iterator