I have a dictionary of user ratings stored in a user_dict dictionary which looks like this:
{'U1': [3, 4, 2, 5, 0, 4, 1, 3, 0, 0, 4],
'U2': [2, 3, 1, 0, 3, 0, 2, 0, 0, 3, 0],
'U3': [0, 4, 0, 5, 0, 4, 0, 3, 0, 2, 4],
'U4': [0, 0, 2, 1, 4, 3, 2, 0, 0, 2, 0],
'U5': [0, 0, 0, 5, 0, 4, 0, 3, 0, 0, 4],
'U6': [2, 3, 4, 0, 3, 0, 3, 0, 3, 4, 0],
'U7': [0, 4, 3, 5, 0, 5, 0, 0, 0, 0, 4],
'U8': [4, 3, 0, 3, 4, 2, 2, 0, 2, 3, 2],
'U9': [0, 2, 0, 3, 1, 0, 1, 0, 0, 2, 0],
'U10': [0, 3, 0, 4, 3, 3, 0, 3, 0, 4, 4],
'U11': [2, 2, 1, 2, 1, 0, 2, 0, 1, 0, 2],
'U12': [0, 4, 4, 5, 0, 0, 0, 3, 0, 4, 5],
'U13': [3, 3, 0, 2, 2, 3, 2, 0, 2, 0, 3],
'U14': [0, 3, 4, 5, 0, 5, 0, 0, 0, 4, 0],
'U15': [2, 0, 0, 3, 0, 2, 2, 3, 0, 0, 3],
'U16': [4, 4, 0, 4, 3, 4, 0, 3, 0, 3, 0],
'U17': [0, 2, 0, 3, 1, 0, 2, 0, 1, 0, 3],
'U18': [2, 3, 1, 0, 3, 2, 3, 2, 0, 2, 0],
'U19': [0, 5, 0, 4, 0, 3, 0, 4, 0, 0, 5],
'U20': [0, 0, 3, 0, 3, 0, 4, 0, 2, 0, 0],
'U21': [3, 0, 2, 4, 2, 3, 0, 4, 2, 3, 3],
'U22': [4, 4, 0, 5, 3, 5, 0, 4, 0, 3, 0],
'U23': [3, 0, 0, 0, 3, 0, 2, 0, 0, 4, 0],
'U24': [4, 0, 3, 0, 3, 0, 3, 0, 0, 2, 2],
'U25': [0, 5, 0, 3, 3, 4, 0, 3, 3, 4, 4]
When I load this dictionary into a Pandas dataframe, I wanted the dataframe to have 3 columns: "User", "Agent", "Rating" so ran this code:
DF = pd.DataFrame()
for key in user_dict.keys():
df = pd.DataFrame(columns=['User', 'Agent', 'Rating'])
df['Rating'] = pd.Series(user_dict[key])
df['Agent'] = pd.DataFrame(df.index)
df['User'] = key
DF = pd.concat([DF, df], axis = 0)
DF = DF.reset_index(drop=True)
However, I dont want to add any entries where the rating is 0 as this indicates the the user has not rated this "agent". How do I get the program to not add/or delete entries which have a rating of 0?
You can reshape by DataFrame.unstack
with DataFrame
construcot, then filtering out 0
by compare for not equal, set index names for new columns names and last use Series.reset_index
:
DF = (pd.DataFrame(user_dict)
.unstack()
.loc[lambda x: x!= 0]
.rename_axis(('User','Agent'))
.reset_index(name='Rating'))
print (DF)
User Agent Rating
0 U1 0 3
1 U1 1 4
2 U1 2 2
3 U1 3 5
4 U1 5 4
.. ... ... ...
155 U25 5 4
156 U25 7 3
157 U25 8 3
158 U25 9 4
159 U25 10 4
[160 rows x 3 columns]
Another idea is filter in last step by DataFrame.query
:
DF = (pd.DataFrame(user_dict)
.unstack()
.rename_axis(('User','Agent'))
.reset_index(name='Rating')
.query('Rating != 0'))
print (DF)
User Agent Rating
0 U1 0 3
1 U1 1 4
2 U1 2 2
3 U1 3 5
5 U1 5 4
.. ... ... ...
269 U25 5 4
271 U25 7 3
272 U25 8 3
273 U25 9 4
274 U25 10 4
[160 rows x 3 columns]