I have the following pandas dataframe:
d= {'Time': [0,1,2,0,1,2,2,3,4], 'Price': ['Auction', 'Auction','800','900','By Negotiation','700','250','250','Make Offer'],'Item': ['Picasso', 'Picasso', 'Picasso', 'DaVinci', 'DaVinci', 'DaVinci', 'Dali', 'Dali', 'Dali']}
df = pd.DataFrame(data=d)
I would like to create a fourth column 'Listing-history' which would specify the following:
I want to group by Item, and then apply the above logic.
Finding whether a listing is 'first seen' is pretty straight forward using something like the following:
df['Price_coerced_to_numeric'] = pd.to_numeric(df['Price'], errors='coerce')
df['Price_diff'] = df.groupby(['Item'])['Price_coerced_to_numeric'].diff(1)
I suspect there is a way of using pandas apply and transform but I haven't been able to work it out. Any tips much appreciated.
You could use groupby.shift
and numpy.select
:
# replace numbers by "Price"
price = df['Price'].mask(pd.to_numeric(df['Price'], errors='coerce')
.notna(), 'Price')
# get previous price
prev_price = price.groupby(df['Item']).shift()
# identify first row per Item
m1 = ~df['Item'].duplicated()
# identify change in price
m2 = price.ne(prev_price)
# combine conditions
df['Listing-history'] = np.select([m1, m2],
['first seen', prev_price+'->'+price],
'ongoing listing')
Output:
Time Price Item Listing-history
0 0 Auction Picasso first seen
1 1 Auction Picasso ongoing listing
2 2 800 Picasso Auction->Price
3 0 900 DaVinci first seen
4 1 By Negotiation DaVinci Price->By Negotiation
5 2 700 DaVinci By Negotiation->Price
6 2 250 Dali first seen
7 3 250 Dali ongoing listing
8 4 Make Offer Dali Price->Make Offer
If you really want to use groupby.transform
you code refactor the code a bit:
def history(col):
price = col.mask(pd.to_numeric(col, errors='coerce').notna(), 'Price')
prev_price = price.shift()
return ((prev_price+'->'+price)
.where(price.ne(prev_price), 'ongoing listing')
.fillna('first seen')
)
df['Listing-history'] = df.groupby('Item')['Price'].transform(history)
variant if you can have NaNs in the original column:
def history(col):
price = col.mask(pd.to_numeric(col, errors='coerce').notna(), 'Price')
prev_price = price.shift()
out = (prev_price+'->'+price).where(price.ne(prev_price), 'ongoing listing')
out.iat[0] = 'first seen'
return out
df['Listing-history'] = df.groupby('Item')['Price'].transform(history)