Search code examples
pythonpandasoptimizationaggregateknapsack-problem

Pandas aggregate two columns at max


I have a data frame with two columns

df = DataFrame.from_records([
  {"time": 10, "amount": 200},
  {"time": 70, "amount": 1000},
  {"time": 10, "amount": 300},
  {"time": 10, "amount": 100},
])

I want to, given a period of time 80ms, calculate the max amount that is possible, in this case, the output should be 1300 because, in this period, the maximum amount possible is 1300.

Is it possible with Pandas? I thought about using aggregate, but I do not know how to use it


Solution

  • This is a knapsack problem, you can solve it with a dedicated library (e.g., knapsack):

    from knapsack import knapsack
    total, idx = knapsack(df['time'], df['amount']).solve(80)
    
    df.out = df.iloc[idx]
    

    output:

       time  amount
    1    70    1000
    2    10     300
    

    Other examples:

    # with max = 75
       time  amount
    1    70    1000
    
    # with max = 40
       time  amount
    0    10     200
    2    10     300
    3    10     100