I have data similar to the one below where shows are given a review score.
I have also hot-encoded it with values of either 0 or 1 (true or false) based on their listed genres
show | review score | Action & Adventure | Anime Features | Anime Series | British TV Shows | Children & Family Movies |
---|---|---|---|---|---|---|
a | 8 | 1 | 0 | 0 | 0 | 0 |
b | 10 | 0 | 1 | 0 | 0 | 0 |
c | 9 | 0 | 1 | 0 | 0 | 0 |
d | 6 | 0 | 0 | 1 | 0 | 0 |
e | 9 | 0 | 0 | 0 | 1 | 0 |
f | 7 | 0 | 0 | 0 | 0 | 1 |
g | 8 | 0 | 0 | 0 | 0 | 1 |
h | 8 | 1 | 0 | 0 | 0 | 0 |
I am trying to group by and sort them based on their values so that it looks something like the below,
With genre coming first, review score in descending order and also count of the number of shows with that review score
genre | review score | count |
---|---|---|
Action & Adventure | 8 | 2 |
Anime Features | 10 | 1 |
9 | 1 | |
Anime Series | 6 | 1 |
British TV Shows | 9 | 1 |
Children & Family Movies | 8 | 1 |
7 | 1 |
I have tried groupby but due to the amount of columns in play, it seems unfeasible.
You could use melt with a groupby.
df1 = df.melt(id_vars=['show', 'review score']).groupby(['variable', 'review_score'])['value'].sum()
df1 is:
variable review score value
Action & Adventure 6 0
Action & Adventure 7 0
Action & Adventure 8 2
Action & Adventure 9 0
Action & Adventure 10 0
Anime Features 6 0
Anime Features 7 0
Anime Features 8 0
Anime Features 9 1
Anime Features 10 1
Anime Series 6 1
Anime Series 7 0
Anime Series 8 0
Anime Series 9 0
Anime Series 10 0
British TV Shows 6 0
British TV Shows 7 0
British TV Shows 8 0
British TV Shows 9 1
British TV Shows 10 0
Children & Family Movies 6 0
Children & Family Movies 7 1
Children & Family Movies 8 1
Children & Family Movies 9 0
Children & Family Movies 10 0
Then filter df1 for counts > 0
df1 = df1[df1 > 0]
df1 is:
variable review score value
Action & Adventure 8 2
Anime Features 9 1
Anime Features 10 1
Anime Series 6 1
British TV Shows 9 1
Children & Family Movies 7 1
Children & Family Movies 8 1