python pandas dataframe recommendation-engine

Make an accumulative function on an User Item Matrix list based on a time window

I've got a DataFrame representing a User-Item Matrix as a list with the columns:

user_id  item_id  rating  timestamp

As I want to make a time-aware recommender, I want to add a column with an ordered list (as I have a timestamp) of the last items rated with a 1 (for example) of the user, so I get a DataFrame like:

user_id  item_id  rating  timestamp  prev_items_rated_by_usr_with_1

I'm unable of making it parallely and I need it like that as the dataset is huge.

This works, but I'm not sure if it returns the items ordered by the timestamp and it is incredibly long to execute:

df['new'] = df.apply(lambda row:list(df.loc[df.user_id==row['user_id']].loc[df.timestamp<row['timestamp']].loc[df.rating==1].item_id.unique()), axis = 1)

Solution

We can get previous rows data using numpy shift method. We need to import both pandas and numpy:

import pandas as pd
import numpy as np

We can set index so we will sort faster it by every user and timestamp:

df = df.set_index(['user_id', 'timestamp'], drop=False).sort_index()

Then we can calculate new column by checking is previous user the same and previous score is 1:

df['prev_items_rated_by_usr_with_1'] = np.where(df['user_id'] == df['user_id'].shift() &
                                                df['rating'].shift() == 1, True, False)