I have a data frame representing the customers ratings of restaurants. star_rating
is rating of the customer in this data frame.
nb_fave_rating
in the same data frame that represents the total number of favorable reviews for a restaurant. I consider a "favorable" opinion if its number of stars is > = 3
.data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
'user_id': ['56', '13','56','99','99','13','12','88','45'],
'restaurant_id': ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
}
df = pd.DataFrame (data, columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year'])
df['star_rating'] = df['star_rating'].astype(float)
positive_reviews = df[df.star_rating >= 3.0 ].groupby('restaurant_id')
positive_reviews.head()
From here, i dont know to to count the number of positive reviews for a restaurant and add it to the new column of my initial dataframe df
.
The output expected would be something like this.
data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
'user_id': ['56', '13','56','99','99','13','12','88','45'],
'restaurant_id': ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
'nb_fave_rating': ['1', '1','1','1','1','1','1','0','0'],
}
So i tried this and had a bunch of NaNs
df['nb_fave_rating']=df[df.star_rating >= 3.0 ].groupby('restaurant_id').agg({'star_rating': 'count'})
df.head()
Here is a potential solution with groupby
and map
:
#filtering the data with >=3 ratings
filtered_data = df[df['star_rating'] >= 3]
#creating a dict containing the counts of the all the favorable reviews
d = filtered_data.groupby('restaurant_id')['star_rating'].count().to_dict()
#mapping the dictionary to the restaurant_id to generate 'nb_fave_rating'
df['nb_fave_rating'] = df['restaurant_id'].map(d)
#taking care of `NaN` values
df.fillna(0,inplace=True)
#making the column integer (just to match the requirements)
df['nb_fave_rating'] = df['nb_fave_rating'].astype(int)
print(df)
Output:
rating_id user_id restaurant_id star_rating rating_year first_year last_year nb_fave_rating
0 1 56 xxx 2.3 2012 2012 2020 1
1 2 13 xxx 3.7 2012 2012 2020 1
2 3 56 yyy 1.2 2020 2001 2020 1
3 4 99 yyy 5.0 2001 2001 2020 1
4 5 99 xxx 1.0 2020 2012 2020 1
5 6 13 zzz 3.2 2015 2000 2015 1
6 7 12 zzz 1.0 2000 2000 2015 1
7 8 88 eee 2.2 2003 2001 2020 0
8 9 45 eee 0.2 2004 2001 2020 0