Search code examples
pythonpermutation

how to sample from permutation iterator without creating all possibl permutations (and thus causing MemoryError)


I have a number of characters n. I want to sample(randomly) from all possible permutations of these n characters, say m samples.

what i did so far works up to n=10 chars, but i need it to work for n=40 chars:


n=10 #nr of characters to be permuted. currently works for 10 max

seq_to_permute = np.arange(1,n+1) #creating the "chars" to be permuted

rand_list = list(permutations(seq_to_permute)) #storing all possible permutations in list->throws MemoryError

# need to sample many times from the rand_list
perm_samples = []
for i in range (1000):
    perm_samples.append(rand_list[random.randint(0, len(rand_list)-1)])

When i try the above code with n=40 then i get memory error, as I am storing all possible permutations in a list to sample from it. From reading around I understood that it is not possible to not have memory errors when storing the list of all permutations. So according to this answer (Prevent memory error in itertools.permutation) i created an iterator from permutation module, which as I understand doesnt store all possible permutations in memory. Then tried to sample from there but I am still getting the memory error:


seq_to_permute = np.arange(1,n+1) #creating the "chars" to be permuted
print(seq_to_permute)
# rand_list = list(permutations(seq_to_permute)) throws memory error when u store them all

perm_iterator = itertools.permutations(seq_to_permute)

# need to sample many times from the rand_list
perm_samples = []

perm_samples = (random.sample(list(perm_iterator), 5))

any help on how to randomly sample some permutations of characters without storing all possible permutations would be greatly appreciated


Solution

  • you can predetermine the samples you want to take if you know their total number, from Get the total number of permutations in python

    then construct the list of samples that you want to keep.

    from math import factorial
    import random
    import numpy as np
    import itertools
    def nPr(n, r):
        return int(factorial(n)/factorial(n-r))
    
    n = 10
    
    seq_to_permute = np.arange(1,n+1) #creating the "chars" to be permuted
    print(seq_to_permute)
    
    perm_iterator = itertools.permutations(seq_to_permute)
    
    seq_to_permute_length = nPr(n,n)  # precompute the length using maths
    print(f"searching {seq_to_permute_length} samples")
    
    # precompute the indicies to take
    indicies = []
    for i in range(1000):
        indicies.append(random.randint(0, seq_to_permute_length-1))
    indicies = set(indicies)  # for some performance gain
    
    # take the needed indicies
    picked_samples = []
    for i, number in enumerate(perm_iterator):
        if i in indicies:
            picked_samples.append(number)
    print(picked_samples)
    
    # # faster path https://stackoverflow.com/a/54027494/15649230
    # picked_samples = []
    # for idx in indicies:
    #     elements = []
    #     for j in range(len(seq_to_permute)):
    #         elements.append(elt(len(seq_to_permute),idx,j))
    #     picked_samples.append(seq_to_permute[elements])
    

    since the number of permutations grows as the factorial of n you need until the end of eternity to calculate the permutations on 40 elements, so instead we know that we are randomly and uniformly sampling permutations of 40 elements, so we can just randomly permute the array and be done with it.

    import numpy as np
    
    n = 40
    
    seq_to_permute = np.arange(1,n+1) #creating the "chars" to be permuted
    print(seq_to_permute)
    
    results = []  # store result in set() to avoid duplicates
    while len(results) < 1000:
        results.append(tuple(seq_to_permute[np.random.permutation(n)]))
    print(results)
    

    another way would be to calculate the i'th permutation of an array, by dividing i into n binary numbers and computing the i'th permutation directly, but that's just a longer way to the same result, and would be similar to this answer How to get the ith element of the nth permutation of a sequence directly (without any recursivity)?