Search code examples
pythonlistnumpyaverageweighted

How to get a weighted average of a list of which it's weights is limited by a variable in Python 3.6


I hope the title makes sense. What i'm trying to achieve is getting a weighted average price of shoes which are available at different prices in different amounts. So I have for example:

list_prices = [12,12.7,13.5,14.3]
list_amounts = [85,100,30,54]
BuyAmount = x

I want to know my weighted average price, and the highest price I paid per shoe If I buy x amount of shoes (assuming I want to buy the cheapest first)

This is what I have now (I use numpy):

    if list_amounts[0] >= BuyAmount:
        avgprice = list_prices[0]
        highprice = list_prices[0]

    elif (sum(list_amounts[0: 2])) >= BuyAmount:
        avgprice = np.average(list_prices[0: 2], weights=[list_amounts[0],BuyAmount - list_amounts[0]])
        highprice = list_prices[1]

    elif (sum(list_amounts[0: 3])) >= BuyAmount:
        avgprice = np.average(list_prices[0: 3], weights=[list_amounts[0],list_amounts[1],BuyAmount - (sum(list_amounts[0: 2]))])
        highprice = list_prices[2]

    elif (sum(list_amounts[0: 4])) >= BuyAmount:
        avgprice = np.average(list_prices[0: 4], weights=[list_amounts[0],list_amounts[1],list_amounts[2],BuyAmount - (sum(list_amounts[0: 3]))])
        highprice = list_prices[3]

    print(avgprice)
    print(highprice)

This code works, but is probably overly complex and expansive. Especially since I want to able to handle amount and price lists with 20+ items.

What is a better way to do this?


Solution

  • Here's a generic vectorized solution using cumsum to replace those sliced summations and argmax for getting the appropriate index to be used for setting the slice limits for those IF-case operations -

    # Use cumsum to replace sliced summations - Basically all those 
    # `list_amounts[0]`, `sum(list_amounts[0: 2]))`, `sum(list_amounts[0: 3])`, etc.
    c = np.cumsum(list_amounts)
    
    # Use argmax to decide the slicing limits for the intended slicing operations.
    # So, this would replace the last number in the slices - 
    # list_prices[0: 2], list_prices[0: 3], etc.
    idx = (c >= BuyAmount).argmax()
    
    # Use the slicing limit to get the slice off list_prices needed as the first
    # input to numpy.average
    l = list_prices[:idx+1]
    
    # This step gets us the weights. Now, in the weights we have two parts. E.g.
    # for the third-IF we have : 
    # [list_amounts[0],list_amounts[1],BuyAmount - (sum(list_amounts[0: 2]))]
    # Here, we would slice off list_amounts limited by `idx`.
    # The second part is sliced summation limited by `idx` again.
    w = np.r_[list_amounts[:idx], BuyAmount - c[idx-1]]
    
    # Finally, plug-in the two inputs to np.average and get avgprice output.
    avgprice = np.average(l,weights=w)
    
    # Get idx element off list_prices as the highprice output.
    highprice = list_prices[idx]
    

    We can further optimize to remove the concatenation step ( with np.r_) and get to avgprice, like so -

    slice1_sum = np.multiply(list_prices[:idx], list_amounts[:idx]).sum()
            # or np.dot(list_prices[:idx], list_amounts[:idx])
    slice2_sum = list_prices[idx]*(BuyAmount - c[idx-1])
    weight_sum = np.sum(list_amounts[:idx]) + BuyAmount - c[idx-1]
    avgprice = (slice1_sum+slice2_sum)/weight_sum