Search code examples
pandasdataframehistogram

how to count the occurences of a value


How to count the number of occurences for a histogram using dataframes

d = {'color': ["blue", "green", "yellow", "red, blue", "green, yellow", "yellow, red, blue"],}
df = pd.DataFrame(data=d)

How do you go from

color
blue
green
yellow
red, blue
green, yellow
yellow, red, blue

to

color occurance
blue 3
green 2
yellow 3

Solution

  • Let's try split by regex ,s\* for comma with zero or more whitespaces, then explode into rows and value_counts to get the count of values:

    s = (
        df['color'].str.split(r',\s*')
            .explode()
            .value_counts()
            .rename_axis('color')
            .reset_index(name='occurance')
    )
    

    Or can split and expand then stack:

    s = (
        df['color'].str.split(r',\s*', expand=True)
            .stack()
            .value_counts()
            .rename_axis('color')
            .reset_index(name='occurance')
    )
    

    s:

        color  occurance
    0    blue          3
    1  yellow          3
    2   green          2
    3     red          2