Suppose I have this dataframe called 'market'
market = pd.DataFrame({'product': ['milk', 'milk', 'milk', 'bread', 'bread'],
'frequency': [4,2,6,3,5],
'price_each': [3,4,5,10,8]})
market
This will be
product frequency price_each
milk 4 3
milk 2 4
milk 6 5
bread 3 10
bread 5 8
How to calculate this median properly by groupby product and it use frequency?
What I have done (but get wrong result)
market.groupby('product')['price_each'].median()
The real and expected result is
product median of price each
milk 4.5
bread 8
Using numpy.repeat
and numpy.median
:
new_df = market.groupby('product').apply(lambda x: np.median(np.repeat(x['price_each'], x['frequency'])))
print(new_df)
Output:
product
bread 8.0
milk 4.5
dtype: float64