How to count the number of occurences for a histogram using dataframes
d = {'color': ["blue", "green", "yellow", "red, blue", "green, yellow", "yellow, red, blue"],}
df = pd.DataFrame(data=d)
How do you go from
color |
---|
blue |
green |
yellow |
red, blue |
green, yellow |
yellow, red, blue |
to
color | occurance |
---|---|
blue | 3 |
green | 2 |
yellow | 3 |
Let's try split
by regex ,s\*
for comma with zero or more whitespaces, then explode
into rows and value_counts
to get the count of values:
s = (
df['color'].str.split(r',\s*')
.explode()
.value_counts()
.rename_axis('color')
.reset_index(name='occurance')
)
Or can split
and expand then stack
:
s = (
df['color'].str.split(r',\s*', expand=True)
.stack()
.value_counts()
.rename_axis('color')
.reset_index(name='occurance')
)
s
:
color occurance
0 blue 3
1 yellow 3
2 green 2
3 red 2