Search code examples
pythonnumpysum

Get combinations from values in array for which the sum doesn't exceed a value


I am trying to obtain all the combinations of values in a list, for which the sum of the corresponding values in another list do not exceed a certain limit. As an example, I have certain jobs that I want to produce in a batch, but that batch has a limited capacity, meaning that only certain combinations of jobs can be put together so that their size don't exceed the capacity of the batch.

Imagining I have jobs defined as job 1, job 3, and job 5, I have the following list:

jobs = [1,3,5]

If job 1 has size 1, job 3 has size 3, and job 5 has size 2, I have the following list:

jobSize = [1,3,2]

If my batch has maximum capacity of 5, this means the desired output is the following:

combinations = [(1),(3),(5),(1,3),(1,5),(3,5)] 
#here I use the name/number of the jobs, and (1,3,5) not included cause total size of 5 is exceeded

From what I have seen, something similar (not exactly the same) is possible to do using combinations from itertools, like explained here Powersets in Python using itertools or here How to get all possible combinations of a list’s elements?, but this is definitely not as efficient as I would like.

I also tried something with itertools.product and np.where, like explained here Python Numpy: All combinations from array or scalar by Corralien, but once again, this is not exactly what I need, and requires a lot of post-processing operations, ending up being too slow.

My idea is that this is possible to do somehow with numpy arrays, because it is the only way I can see this being fast. Does anyone have any idea?


Solution

  • I think the easiest approach is to use a recursive function. You take one job away from your list, and run the function on the remaining sublist. Then, you combine the results from the sublist with the first job accordingly.

    In this way, if a combination is invalid – for example (1,2,3) when max is 5 – it will not show up in the results from the sublist, and so all longer combinations containing (1,2,3) are eliminated.

    This could be improved by:

    • changing from recursive to iterative
    • using multiprocessing. For example instead of taking 1 job – taking 3 first jobs, this gives 6 combinations which would need to be combined with the results from the sublist processing. This could be seperated into processes running at the same time
    • take advantage of the fact that jobs may have the same jobSize. In which case the results are symmetric

    Code:

    import sys
    import time
    from collections import namedtuple
    
    def print_job(self):
        #return f'( /{self.job_name}/ {self.size} )'
        return str(self.job_name)+' ('+str(self.size)+')'
    def print_solution(self):
        return ' '.join([str(job) for job in self.jobs]) # +'\n'
    Job = namedtuple('Job', ['job_name','size'])
    Solution = namedtuple('Solution', ['jobs', 'Size'])
    Job.__repr__ = print_job
    Solution.__repr__ = print_solution
    
    ########### main function #################################
    def find_sets(job_list, max_size):
        first_job = job_list[0]
        new_solutions = []
        if len(job_list) > 1:
            other_solutions = find_sets(job_list[1:], max_size)
            for solution in other_solutions:
                if solution.Size + first_job.size <= max_size:
                    new_solution = Solution(solution.jobs + [first_job], solution.Size + first_job.size)
                    new_solutions.append(new_solution)
        else:
            other_solutions = []
        if first_job.size <= max_size:
                new_solutions.append(Solution([first_job], first_job.size))
        return other_solutions + new_solutions
    
    
    def find_all_job_sets(jobs='ABC', jobSize = (1,3,2), max_size=5, show_progress=True):
        sys.setrecursionlimit(1500)
        if (len(jobs) != len(jobSize)):
            print('Number of jobs and jobSizes must match.')
            exit(1)
        jobList = list(Job(name, size) for name, size in zip(jobs, jobSize))
        start = time.time()
        results = find_sets(jobList, max_size)
        end = time.time()
        print(results)
        print(f'Number of sets: {len(results)}')
        print(f'Time: {int((end - start) * 100) / 100} seconds')
    

     

    find_all_job_sets(jobs='ABCDE', jobSize=[1,2,3,4,5], max_size=5)
    

    Output:

    [E (5), D (4), C (3), C (3) B (2), B (2), D (4) A (1), C (3) A (1), B (2) 
    A (1), A (1)]
    Number of sets: 9
    

    Example:

    find_all_job_sets(jobs=[f'J{i}' for i in range(100)],
                      jobSize=list(range(20,120)), max_size=200)
    

    Output:

    ...
        J2 (22) J1 (21) J0 (20), J8 (28) J6 (26) J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J7 (27) J6 (26) J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J6 (26) J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J5 (25) J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J4 (24) J3 (23) J2 (22) J1 (21) J0 (20), J3 (23) J2 (22) J1 (21) J0 (20), J2 (22) J1 (21) J0 (20), J1 (21) J0 (20), J0 (20)]
    Number of sets: 1462601
    Time: 3.11 seconds
    

    List of 200 jobs:

    find_all_job_sets(jobs=[f'J{i}' for i in range(200)], 
    jobSize= [12, 13, 14, 15, 16] + list(range(105, 300)), max_size=500)
    

    Output:

        ...
        J5 (105) J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J6 (106) J5 (105) J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J5 (105) J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J4 (16) J3 (15) J2 (14) J1 (13) J0 (12), J3 (15) J2 (14) J1 (13) J0 (12), J2 (14) J1 (13) J0 (12), J1 (13) J0 (12), J0 (12)]
    Number of sets: 3985093
    Time: 7.42 seconds