Search code examples
pythonpandasreplacedifference

How could I replace null value In a group?


I created this dataframe I calculated the gap that I was looking but the problem is that some flats have the same price and I get a difference of price of 0. How could I replace the value 0 by the difference with the last lower price of the same group.

for example:

neighboorhood:a, bed:1, bath:1, price:5

neighboorhood:a, bed:1, bath:1, price:5

neighboorhood:a, bed:1, bath:1, price:3

neighboorhood:a, bed:1, bath:1, price:2

I get difference price of 0,2,1,nan and I'm looking for 2,2,1,nan (briefly I don't want to compare 2 flats with the same price)

Thanks in advance and good day.

data=[
    [1,'a',1,1,5],[2,'a',1,1,5],[3,'a',1,1,4],[4,'a',1,1,2],[5,'b',1,2,6],[6,'b',1,2,6],[7,'b',1,2,3]
]
df = pd.DataFrame(data, columns = ['id','neighborhoodname', 'beds', 'baths', 'price']) 

df['difference_price'] = ( df.dropna()
                             .sort_values('price',ascending=False)
                             .groupby(['city','beds','baths'])['price'].diff(-1) )

Solution

  • I think you can remove duplicates first per all columns used for groupby with diff, create new column in filtered data and last use merge with left join to original:

    df1 = (df.dropna()
             .sort_values('price',ascending=False)
             .drop_duplicates(['neighborhoodname','beds','baths', 'price']))
    
    df1['difference_price']  = df1.groupby(['neighborhoodname','beds','baths'])['price'].diff(-1)
    
    df = df.merge(df1[['neighborhoodname','beds','baths','price', 'difference_price']], how='left')
    print (df)
       id neighborhoodname  beds  baths  price  difference_price
    0   1                a     1      1      5               1.0
    1   2                a     1      1      5               1.0
    2   3                a     1      1      4               2.0
    3   4                a     1      1      2               NaN
    4   5                b     1      2      6               3.0
    5   6                b     1      2      6               3.0
    6   7                b     1      2      3               NaN
    

    Or you can use lambda function for back filling 0 values per groups for avoid wrong outputs if one row groups (data moved from another groups):

    df['difference_price'] = (df.sort_values('price',ascending=False)
                                .groupby(['neighborhoodname','beds','baths'])['price']
                                .apply(lambda x: x.diff(-1).replace(0, np.nan).bfill()))
    
    print (df)     
       id neighborhoodname  beds  baths  price  difference_price
    0   1                a     1      1      5               1.0
    1   2                a     1      1      5               1.0
    2   3                a     1      1      4               2.0
    3   4                a     1      1      2               NaN
    4   5                b     1      2      6               3.0
    5   6                b     1      2      6               3.0
    6   7                b     1      2      3               NaN