Search code examples
pythonout-of-memorypermutation

How to avoid memory error in Python while zipping list with large number of permutations from another list to get unique combinations?


I have the code below to generate all unique combinations between trucks and the permutations of tripares. The way I do it (below) gives me correct results. However it works only when len(tripares) < 10. When len(tripares) > 11 I get a memory error. Is there a way to avoid this memory error?

import itertools
from itertools import cycle

trucks=['A','B','C']
tripares = ['trip1', 'trip2', 'trip3', 'trip4', 'trip5', 'trip6', 'trip7', 'trip8', 'trip9', 'trip10', 'trip11', 'trip12'] 

# Get all permutations of the tripares list
perms = itertools.permutations(tripares,len(tripares))

# Zip the two lists and cycle the trucks list so all trips are matched with a truck
combinations = [list(zip(cycle(trucks), x)) for x in perms]

# Drop duplicates to get unique values
combs_unique = set([tuple(sorted(x)) for x in combinations ])

print(len(combs_unique))
# Should print 34650

Solution

  • If you just want the number of unique combinations you can simply compute it, without generating all those big sequences.

    Let's take your example input. The first truck will have 4 (no of trips / no of trucks) of the 12 trips: it amounts to 12! / 8! / 4! = 495. Then the second truck will have 4 of the remaining 8 trips: 8! / 4! / 4! = 70. The third truck takes the last 4 trips: 4! / 0! / 4! = 1. So the total count is 495 * 70 * 1 = 34650

    You may use a code like this:

    from math import factorial
    
    def count_combos(n,k):
        return factorial(n)//factorial(k)//factorial(n-k)
    
    def count_trips(ntrips, ntrucks):
        total = 1
        tt = ntrips//ntrucks
        for x in range(ntrips, tt, -tt):
            total *= count_combos(x, tt)
        return total
    
    count_trips(12,3)
    34650
    

    EDIT: get the combinations

    If you need the actual combinations, not just their number, you can generate them in your code "manually", one at a time, via recursion: list the combos for all trips for the first truck, append to each of them the list of combos for the remaining trips for the second truck, and so on until a truck has only one possible combination left. This uses a reasonable amount of memory, and also is not so slow as one may expect:

    from itertools import combinations
    
    def list_combos(trips, trucks):
            #print(trips)
            combos = []
            ntrips = len(trips)
            ntrucks = len(trucks)
            tt = ntrips//ntrucks
            if ntrips == tt:
                    return trips
            fs_trips = frozenset(trips)
            for c in combinations(fs_trips, tt):
                    for subc in list_combos2(fs_trips.difference(c), tt):
                            #print(c, subc)
                            combos.append(c + tuple(subc))
            return combos
        
    def list_combos2(fs_trips, tt):
            #print('combos2:', fs_trips)
            combos = []
            ntrips = len(fs_trips)
            if ntrips == tt:
                    #print('returning')
                    return (fs_trips,)
            for c in combinations(fs_trips, tt):
                    for subc in list_combos2(fs_trips.difference(c), tt):
                            #print(c, subc)
                            combos.append(c + tuple(subc))
            return combos
    
    x = list_combos(['t1','t2','t3','t4','t5','t6','t7','t8','t9','t10','t11','t12'], ['a','b','c'])
    len(x)
    34650
    

    I tested it on 18 trips/3 trucks (17153136 combos) and also on 16/4 (63063000 combos). In the latter case it took around 275 seconds on my PC.

    (Note: I'm simply returning a tuple of trips for each combo, without splitting it in ntrucks subtuples with the truck added. This saves some memory and speeds up things a bit; your fuel consumption calculation will simply need to consider that the first ntrips//ntrucks elements belong to the first truck, and so on).

    OPTIMIZATION

    Since you want to extract the best combo, why generate them all and then do the calculations? Starting from the code above it would be easy to compute the consumption of each combo once you have it, compare it with the current best and only keep track of the new best, without even maintaining the giant list.

    This approach may even allow you to do some forward-check, discarding a combo before it is completely defined, and hence dramatically speeding up the execution.