Search code examples
pythondataframegroup-bypandas-groupbymedian

Median with groupby and frequency


Suppose I have this dataframe called 'market'

market = pd.DataFrame({'product': ['milk', 'milk', 'milk', 'bread', 'bread'], 
                   'frequency': [4,2,6,3,5],
                  'price_each': [3,4,5,10,8]})
market

This will be

product frequency price_each
milk    4         3
milk    2         4
milk    6         5
bread   3         10
bread   5         8

How to calculate this median properly by groupby product and it use frequency?

What I have done (but get wrong result)

market.groupby('product')['price_each'].median()

The real and expected result is

product   median of price each
milk      4.5       
bread     8       

Solution

  • Using numpy.repeat and numpy.median:

    new_df = market.groupby('product').apply(lambda x: np.median(np.repeat(x['price_each'], x['frequency'])))
    print(new_df)
    

    Output:

    product
    bread    8.0
    milk     4.5
    dtype: float64