python algorithm complexity-theory knapsack-problem

Find optimal combination of items with 2 values

I have a list of 500 elements each with 2 values x (float value from 0-Infinity) and a y (float value from 0-1). I need to find the cheapest (lowest sum(x)) combination of 10 of these elements that achieve an average y value less than a given n. I think this is some kind of knapsack problem but I'm unsure of how to fit solutions online to my exact problem and fear that it may be too complicated.

All I have tried so far is trying a solution from ChatGPT but I believe that its complexity is too high. I'm looking for a solution that hopefully can be completed with a high number of elements <1000.

def find_cheapest_combination(elements, k, target_avg):
    def backtrack(start, combo):
        if len(combo) == k:
            # Calculate the average float value for the current combination
            avg_float = sum(element[1] for element in combo) / k
            if avg_float <= target_avg:
                nonlocal best_combo, lowest_avg
                best_combo = combo[:]
                lowest_avg = avg_float
            return

        for i in range(start, len(elements)):
            if len(elements) - i < k - len(combo):
                # Not enough elements remaining to form a valid combination
                break
            combo.append(elements[i])
            backtrack(i + 1, combo)
            combo.pop()

    elements.sort(key=lambda x: x[0])  # Sort by price in ascending order
    best_combo = []
    lowest_avg = float('inf')
    backtrack(0, [])

    return best_combo


elements = [(x, y) for x, y in items]
k = 10
target_avg = 0.07
cheapest_combination = find_cheapest_combination(elements, k, target_avg)
print(cheapest_combination)

Solution

You can try this:

import numpy as np
from scipy.optimize import milp, LinearConstraint, Bounds

def f(x, y, n, m):
    x, y = np.array(x), np.array(y)
    A = np.vstack([np.ones_like(x), y])
    ub = [n, n * m]
    lb = [n, -np.inf]
    res = milp(x, 
               constraints=LinearConstraint(A, ub=ub, lb=lb), 
               integrality=np.ones_like(x),
               bounds=Bounds(lb=0, ub=1)
              )
    if res.success:
        choices = np.nonzero(res.x)[0]
        return {"choices": choices, 
                "x_sum": res.fun, 
                "y_mean": y[choices].mean()}

This function takes as its arguments arrays x and y of x- and y-values, respectively. n is the number of elements to be selected, and m is the value that bounds the mean of selected y-values from above. If a solution to the problem exists, the function returns a dictionary of the form

{“choices”: indices of selected elements,
 “x_sum”: the sum of x-values of selected elements,
 “y_mean”: the mean of y-values of selected elements}

It a solution cannot be found (i.e. no selection of elements gives small enough mean of y-values), the function returns None.

For example:

f(x=[1, 2, 3, 4], y=[1, 10, 2, 1], n=2, m=2)

gives:

{'choices': array([0, 2]), 'x_sum': 4.0, 'y_mean': 1.5}

Explanation. Let x = [a_1,…,a_k], y=[b_1,…,b_k]. The problem of selecting some elements of x with the smallest sum is equivalent to finding a minimum of the function a_1*x_1 + … + a_k*x_k where the value of each variable x_i can be either 1 (which means that a_i is selected) or 0. The variables x_i must satisfy a couple constraints. First, x_1 + … + x_k = n, since we want to select exactly n elements. Second, b_1*x_1 + … + b_k*x_k <= n*m. This condition means that the average of selected b_i’s does not exceed m. In this form, the problem becomes an integer linear program. The code uses scipy.optimize.milp to compute its solution.