Search code examples
pythonpandasnumpymatrixsparse-matrix

pandas - create sparse matrix for collaborative filtering


I have a pandas dataframe like this:

user_id music_id rating
A a 5
B a 3

and I would like to create a sparse matrix from it, putting music_id as column and user_id as a row like this:

->

a b
A 5
B 3

what would be the way to achieve it by using pandas or numpy for this task?


Solution

  • If we suppose you have the following dataset as describe in your question:

    d = {'user_id':['A','B'],'music_id':['a','a'],'rating':[5,3]}
    df = pd.DataFrame(d)
    

    Then you can do:

    df.set_index(['user_id','music_id']).unstack(level=-1).rating
    

    or, equivalently:

    pd.pivot_table(df,values='rating',index='user_id',columns=['music_id'])