Search code examples
pythonpython-3.xpandasrecommendation-enginecollaborative-filtering

How to apply a function on pandas dataframe column


I have a pandas dataframe like this, with user_id, title of the song listened by the user and the number of times that a specific user has listened to that song (listen_count).

enter image description here

Goal to achieve:

I'm new to python and pandas and I'm trying to build a recommender system. I want to transform these implicit feedbacks (listen_count) into explicit ones following the (8) and (9) formulas of this paper.

  • To do this I want to create a function that compute the listening frequency for each song by each user in the dataframe, using this formula: enter image description here where count(i,j) stands for the number of times a certain user has played a certain song (the listen_count value in my dataframe), divided by the total number of plays made by the user on all songs listened by him (the total listen_count for each user)
  • I also want to create a function that implements the formula (9) of the above mentioned paper, but I think it will be simpler if someone can explain me how to solve the previous problem.

Solution

  • You should be able to solve this problem by using DataFrame.groupby(). Assuming that your dataframe is called df, you can try the following(it's hard for me to check if it produces the right result without the data).

    # get the total listen count for each user_id
    df['total_listen_count_per_user'] = df.groupby('user_id')['listen_count'].transform('sum')
    # get the song frequency by dividing the sum of song_listen_counts per song by
    # the total_listen_count for each user
    df['song_frequency']=df.groupby('title')['listen_count'].transform('sum')/df['total_listen_count_per_user']
    

    Here is the reference for DataFrame.transform and DataFrame.groupby