Search code examples
pandashistogram

Pandas plot histogram of value counts per group


I have a dataset:

game_id year
100     2020
100     2020
100     2020
100     2020
227     2022
227     2022
228     2023
228     2023
228     2023
...
300     2023
300     2023
301     2023
301     2023
301     2023

And I'd like to generate one histogram per year of the distribution of unique game_id values (so df['game_id'].value_counts()) using pandas 2.0.3.

I can manually do this using e.g. years = df'groupby('year') and then working with each year using years.get_group(2023).value_counts().hist(), but I feel like there should be a simple one-liner to pass the data to hist() in the correct shape to get a small multiples plot.


Solution

  • Assuming you want a histogram of the counts:

    pd.crosstab(df['game_id'], df['year']).plot.hist(alpha=0.5)
    

    Output:

    enter image description here

    For separate graphs, you can use seaborn.displot:

    import seaborn as sns
    
    sns.displot(data=df.value_counts().reset_index(name='count'),
                x='count', col='year', kind='hist')
    

    Output:

    enter image description here