Search code examples
pythonpandasrecommendation-engine

How to remove entries when converting dictionary to Pandas?


I have a dictionary of user ratings stored in a user_dict dictionary which looks like this:

{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4], 
'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0], 
'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4], 
'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0], 
'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4], 
'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0], 
'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4], 
'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2], 
'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0], 
'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],  
'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2], 
'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5], 
'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3], 
'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0], 
'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3], 
'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0], 
'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3], 
'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0], 
'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5], 
'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0], 
'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3], 
'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0], 
'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0], 
'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2], 
'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]

When I load this dictionary into a Pandas dataframe, I wanted the dataframe to have 3 columns: "User", "Agent", "Rating" so ran this code:

DF = pd.DataFrame()
for key in user_dict.keys():
  df = pd.DataFrame(columns=['User', 'Agent', 'Rating'])
  df['Rating'] = pd.Series(user_dict[key])
  df['Agent'] = pd.DataFrame(df.index)
  df['User'] = key

  DF = pd.concat([DF, df], axis = 0)

DF = DF.reset_index(drop=True)

However, I dont want to add any entries where the rating is 0 as this indicates the the user has not rated this "agent". How do I get the program to not add/or delete entries which have a rating of 0?


Solution

  • You can reshape by DataFrame.unstack with DataFrame construcot, then filtering out 0 by compare for not equal, set index names for new columns names and last use Series.reset_index:

    DF = (pd.DataFrame(user_dict)
            .unstack()
            .loc[lambda x: x!= 0]
            .rename_axis(('User','Agent'))
            .reset_index(name='Rating'))
    print (DF)
        User  Agent  Rating
    0     U1      0       3
    1     U1      1       4
    2     U1      2       2
    3     U1      3       5
    4     U1      5       4
    ..   ...    ...     ...
    155  U25      5       4
    156  U25      7       3
    157  U25      8       3
    158  U25      9       4
    159  U25     10       4
    
    [160 rows x 3 columns]
    

    Another idea is filter in last step by DataFrame.query:

    DF = (pd.DataFrame(user_dict)
            .unstack()
            .rename_axis(('User','Agent'))
            .reset_index(name='Rating')
            .query('Rating != 0'))
    print (DF)
        User  Agent  Rating
    0     U1      0       3
    1     U1      1       4
    2     U1      2       2
    3     U1      3       5
    5     U1      5       4
    ..   ...    ...     ...
    269  U25      5       4
    271  U25      7       3
    272  U25      8       3
    273  U25      9       4
    274  U25     10       4
    
    [160 rows x 3 columns]