Search code examples
numpypermutationpython-itertools

A concern involving very large arrays


My concern involves huge arrays with shapes like (14!, 14), but I'll ask the question using a much smaller array.

Consider array p holding the 10! permutations of a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] We can create a permutation array of this shape (ie: 3628800, 10) in a variety of ways, say:

p = np.array(list(itertools.permutations(range(10))))

QUESTION: I'd like to know if there is any way I could produce, say:

array p1 holding the first 100000 permutations, then

array p2 holding the next 100000 permutations, then

etc..., then

array p37 holding the last 28800 permutations.

I'm not talking about creating the full set of permutations, then subdividing it. What I'd like to know is whether I can actually generate the permutation rows in 'clumps' of suitable size. The actual order of rows in each 'clump' isn't an issue, as long as the full set of 'clumps' holds all permutations without any overlap.

As mentioned earlier, my actual concern is to find a way, in principal, to handle much larger arrays of permutations. I'll worry about the size of the 'clumps', etc, later.


Solution

  • Use itertools.islice in the batched recipe:

    from itertools import islice, permutations
    
    def batched(iterable, n):
        "Batch data into tuples of length n. The last batch may be shorter."
        # batched('ABCDEFG', 3) --> ABC DEF G
        if n < 1:
            raise ValueError('n must be at least one')
        it = iter(iterable)
        while (batch := tuple(islice(it, n))):
            yield batch
    
    perm = permutations(range(10))
    
    arrays = [np.array(x) for x in batched(perm, 100000)]
    

    If you want to iterate by chunk:

    perm = permutations(range(10))
    
    for x in batched(perm, 100000):
        a = np.array(x)
        print(a)