Search code examples
pythonpandasmatplotlibseabornhistogram

How to make a combined histogram of two grouped columns?


My data
I have these data as attached and I'm trying to make overlap the Home and Away Histogram for each team individually? I'm new to python btw.

So far I made which looks exactly what I want but I want to combine them again by each team:

df_EPL['Away_score'].hist(by=df_EPL['AwayTeam'],figsize = (8,8),color = '#96ddff');

and

df_EPL['Home_score'].hist(by=df_EPL['HomeTeam'],figsize = (8,8),color = '#82c065');

Solution

  • Fake Dataframe creation

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import seaborn as sns
    
    
    np.random.seed(42)
    teams = ['Arsenal', 'Chelsea', 'Liverpool', 'Manchester City', 'Manchester Utd']
    
    df = pd.DataFrame({'HomeTeam': np.repeat(teams, len(teams) - 1)})
    df['AwayTeam'] = [away_team for home_team in teams for away_team in teams if away_team != home_team]
    df['Home_score'] = np.random.randint(0, 5, len(df))
    df['Away_score'] = np.random.randint(0, 5, len(df))
    
               HomeTeam         AwayTeam  Home_score  Away_score
    0           Arsenal          Chelsea           3           1
    1           Arsenal        Liverpool           4           4
    2           Arsenal  Manchester City           2           3
    3           Arsenal   Manchester Utd           4           0
    4           Chelsea          Arsenal           4           0
    5           Chelsea        Liverpool           1           2
    6           Chelsea  Manchester City           2           2
    7           Chelsea   Manchester Utd           2           1
    8         Liverpool          Arsenal           2           3
    9         Liverpool          Chelsea           4           3
    10        Liverpool  Manchester City           3           2
    11        Liverpool   Manchester Utd           2           3
    12  Manchester City          Arsenal           4           3
    13  Manchester City          Chelsea           1           0
    14  Manchester City        Liverpool           3           2
    15  Manchester City   Manchester Utd           1           4
    16   Manchester Utd          Arsenal           3           2
    17   Manchester Utd          Chelsea           4           4
    18   Manchester Utd        Liverpool           0           0
    19   Manchester Utd  Manchester City           3           1
    

    Dataframe re-shape

    You need to re-shape your dataframe in a different format in order to make the plot you want. For this purpose, you can use pandas.melt:

    df = pd.melt(frame = df,
                 id_vars = ['HomeTeam', 'AwayTeam'],
                 var_name = 'H/A',
                 value_name = 'Score')
    
    df = df.drop('AwayTeam', axis = 1).rename(columns = {'HomeTeam': 'Team'}).replace({'Home_score': 'Home', 'Away_score': 'Away'})
    
                   Team   H/A  Score
    0           Arsenal  Home      3
    1           Arsenal  Home      4
    2           Arsenal  Home      2
    3           Arsenal  Home      4
    4           Chelsea  Home      4
    5           Chelsea  Home      1
    6           Chelsea  Home      2
    7           Chelsea  Home      2
    8         Liverpool  Home      2
    9         Liverpool  Home      4
    10        Liverpool  Home      3
    11        Liverpool  Home      2
    12  Manchester City  Home      4
    13  Manchester City  Home      1
    14  Manchester City  Home      3
    15  Manchester City  Home      1
    16   Manchester Utd  Home      3
    17   Manchester Utd  Home      4
    18   Manchester Utd  Home      0
    19   Manchester Utd  Home      3
    20          Arsenal  Away      1
    21          Arsenal  Away      4
    22          Arsenal  Away      3
    23          Arsenal  Away      0
    24          Chelsea  Away      0
    25          Chelsea  Away      2
    26          Chelsea  Away      2
    27          Chelsea  Away      1
    28        Liverpool  Away      3
    29        Liverpool  Away      3
    30        Liverpool  Away      2
    31        Liverpool  Away      3
    32  Manchester City  Away      3
    33  Manchester City  Away      0
    34  Manchester City  Away      2
    35  Manchester City  Away      4
    36   Manchester Utd  Away      2
    37   Manchester Utd  Away      4
    38   Manchester Utd  Away      0
    39   Manchester Utd  Away      1
    

    Plot

    Now dataframe is ready to be plotted. You can use seaborn.FacetGrid to create the grid of subplots, one for each team. Each subplot will have two seaborn.histplot: one for Home_score and one for Away_score:

    g = sns.FacetGrid(df, col = 'Team', hue = 'H/A')
    g.map(sns.histplot, 'Score', bins = np.arange(df['Score'].min() - 0.5, df['Score'].max() + 1.5, 1))
    g.add_legend()
    g.set(xticks = np.arange(df['Score'].min(), df['Score'].max() + 1, 1))
    
    plt.show()
    

    enter image description here