Search code examples
pythonpandasdataframerecommendation-engine

Make an accumulative function on an User Item Matrix list based on a time window


I've got a DataFrame representing a User-Item Matrix as a list with the columns:

user_id  item_id  rating  timestamp

As I want to make a time-aware recommender, I want to add a column with an ordered list (as I have a timestamp) of the last items rated with a 1 (for example) of the user, so I get a DataFrame like:

user_id  item_id  rating  timestamp  prev_items_rated_by_usr_with_1

I'm unable of making it parallely and I need it like that as the dataset is huge.

This works, but I'm not sure if it returns the items ordered by the timestamp and it is incredibly long to execute:

df['new'] = df.apply(lambda row:list(df.loc[df.user_id==row['user_id']].loc[df.timestamp<row['timestamp']].loc[df.rating==1].item_id.unique()), axis = 1)

Solution

  • We can get previous rows data using numpy shift method. We need to import both pandas and numpy:

    import pandas as pd
    import numpy as np
    

    We can set index so we will sort faster it by every user and timestamp:

    df = df.set_index(['user_id', 'timestamp'], drop=False).sort_index()
    

    Then we can calculate new column by checking is previous user the same and previous score is 1:

    df['prev_items_rated_by_usr_with_1'] = np.where(df['user_id'] == df['user_id'].shift() &
                                                    df['rating'].shift() == 1, True, False)