I've got a DataFrame
representing a User-Item Matrix as a list with the columns:
user_id item_id rating timestamp
As I want to make a time-aware recommender, I want to add a column with an ordered list (as I have a timestamp) of the last items rated with a 1
(for example) of the user, so I get a DataFrame
like:
user_id item_id rating timestamp prev_items_rated_by_usr_with_1
I'm unable of making it parallely and I need it like that as the dataset is huge.
This works, but I'm not sure if it returns the items ordered by the timestamp and it is incredibly long to execute:
df['new'] = df.apply(lambda row:list(df.loc[df.user_id==row['user_id']].loc[df.timestamp<row['timestamp']].loc[df.rating==1].item_id.unique()), axis = 1)
We can get previous rows data using numpy shift method. We need to import both pandas and numpy:
import pandas as pd
import numpy as np
We can set index so we will sort faster it by every user and timestamp:
df = df.set_index(['user_id', 'timestamp'], drop=False).sort_index()
Then we can calculate new column by checking is previous user the same and previous score is 1:
df['prev_items_rated_by_usr_with_1'] = np.where(df['user_id'] == df['user_id'].shift() &
df['rating'].shift() == 1, True, False)