Search code examples
pythonpandasrecommendation-engine

How to load Pandas dataframe into Surprise dataset?


I am building a recommender system based on user's ratings for 11 different items.

I started with a dictionary (user_dict) of user ratings:

{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4], 
 'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0], 
 'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4], 
 'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0], 
 'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4], 
 'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0], 
 'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4], 
 'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2], 
 'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0], 
 'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],  
 'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2], 
 'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5], 
 'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3], 
 'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0], 
 'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3], 
 'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0], 
 'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3], 
 'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0], 
 'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5], 
 'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0], 
 'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3], 
 'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0], 
 'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0], 
 'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2], 
 'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]}

I then loaded the dictionary into a Pandas dataframe by using this code:

df=  pd.DataFrame(user_dict)
userRatings_df = df.T
print(userRatings_df)

This prints the data like so:

     0  1  2  3  4  5  6  7  8  9  10
U1   3  4  2  5  0  4  1  3  0  0   4
U2   2  3  1  0  3  0  2  0  0  3   0
U3   0  4  0  5  0  4  0  3  0  2   4
U4   0  0  2  1  4  3  2  0  0  2   0
U5   0  0  0  5  0  4  0  3  0  0   4
U6   2  3  4  0  3  0  3  0  3  4   0
U7   0  4  3  5  0  5  0  0  0  0   4
U8   4  3  0  3  4  2  2  0  2  3   2
U9   0  2  0  3  1  0  1  0  0  2   0
U10  0  3  0  4  3  3  0  3  0  4   4
U11  2  2  1  2  1  0  2  0  1  0   2
U12  0  4  4  5  0  0  0  3  0  4   5
U13  3  3  0  2  2  3  2  0  2  0   3
U14  0  3  4  5  0  5  0  0  0  4   0
U15  2  0  0  3  0  2  2  3  0  0   3
U16  4  4  0  4  3  4  0  3  0  3   0
U17  0  2  0  3  1  0  2  0  1  0   3
U18  2  3  1  0  3  2  3  2  0  2   0
U19  0  5  0  4  0  3  0  4  0  0   5
U20  0  0  3  0  3  0  4  0  2  0   0
U21  3  0  2  4  2  3  0  4  2  3   3
U22  4  4  0  5  3  5  0  4  0  3   0
U23  3  0  0  0  3  0  2  0  0  4   0
U24  4  0  3  0  3  0  3  0  0  2   2
U25  0  5  0  3  3  4  0  3  3  4   4

When I attempt to load into into a Surprise dataset I run this code:

reader = Reader(rating_scale=(1,5))

userRatings_data=Dataset.load_from_df(userRatings_df[[1,2,3,4,5,6,7,8,9,10]], 
reader)

I get this error:

ValueError: too many values to unpack (expected 3)

Can anyone help me to fix this error?


Solution

  • The problem is coming from the way you are converting your dictionary into a pandas dataframe. For the Dataset to be able process a pandas dataframe, you will need to have only three columns. First column is supposed to be the user ID, second column is the item ID and the third column is the actual rating. This is how I would build a dataframe which would run in "Dataset":

    DF = pd.DataFrame()
    for key in user_dict.keys():
        df = pd.DataFrame(columns=['User', 'Item', 'Rating'])
        df['Rating'] = pd.Series(user_dict[key])
        df['Item'] = pd.DataFrame(df.index)
        df['User'] = key
    
        DF = pd.concat([DF, df], axis = 0)
    
    DF = DF.reset_index(drop=True)
    

    If you pay attention, I am taking every key from the dictionary, which is essentially a user ID, turn it into a pandas column, along with the ratings and the ratings' indices which will be the column for raw item IDs. Then from every key I build a temporary dataframe which is stacked on top of each other in the final and main dataframe. Hopefully this helps.